WO2015071804A1

WO2015071804A1 - Ranking prediction candidates of controlled natural languages or business rules depending on document hierarchy

Info

Publication number: WO2015071804A1
Application number: PCT/IB2014/065838
Authority: WO
Inventors: Thierry Kormann; Stephane Hillion
Original assignee: International Business Machines Corporation; Ibm United Kingdom Limited
Priority date: 2013-11-13
Filing date: 2014-11-06
Publication date: 2015-05-21
Also published as: GB201319983D0; GB2520265A

Abstract

Disclosed is a method of ranking textual candidates of controlled natural languages, the textual candidates forming portions of a first context, a plurality of contexts being arranged in a hierarchy. The method comprises the steps of assigning a probability to text fragments forming the textual candidates in the first context in which it is desired to rank the textual candidate; assigning a probability to text fragments forming the textual candidates in contexts in the hierarchy other than the first context; for each of the textual candidates, taking the higher of (i) the weighted assigned probability in the first context and (ii) the weighted assigned probabilities in contexts in the hierarchy other than the first context, wherein the weighting applied to each of the assigned probabilities is inversely related to the hierarchical distance between the first context and contexts in the hierarchy other than the first context.

Description

RANKING PREDICTION CANDIDATES OF CONTROLLED NATURAL LANGUAGES OR BUSINESS RULES DEPENDING ON DOCUMENT HIERARCHY

FIELD OF THE INVENTION

[0001] The present invention relates to ranking textual candidates of controlled natural languages (CNLs) and more particularly, to the ranking of textual candidates by assigning a probability to text fragments forming the textual candidates using a hierarchical-driven ranking mechanism for text completions.

BACKGROUND

[0002] CNLs are subset of natural languages, the subset being capable of being understood by computer systems by restricting both the grammar and the vocabulary in order to reduce or remove ambiguity and complexity. Many computer systems and more specifically editing tools exist for CNLs. These often provide advanced features such as validation, syntax highlighting, or autocomplete. Autocomplete is a feature that

automatically predicts the remaining words or phrases that the user wants to type in without actually typing it completely. This feature is particularly effective when editing text written in highly structured, easy-to-predict languages such as CNLs. However, when a language has an extensive vocabulary, providing relevant textual candidates among the valid predictions computed by the system remains a challenge.

[0003] Another solution consists of ranking textual candidates based on the history of most recently used and/or the most frequently used words or phrases. This method provides interesting results but is mainly effective for repetitive tasks, or tasks that do not involve very frequent context switching.

[0004] A similar solution uses a word prediction algorithm and can use the semantics and the location of the text being entered to rank textual candidates. For instance, given a common text prefix, the completion menu of a code editor shows variables prior class names within a method. This technique provides pertinent rankings but requires in-depth knowledge of the semantic of the entire language. Furthermore, the implementation of such algorithms is hard to achieve.

[0005] A further solution consists of annotating (or categorising) all the phrases of a vocabulary and declaring for each document (or part of it) which category or set of categories is permitted. This method can provide meaningful results but require a difficult and time-consuming initial step. Furthermore, it may be difficult to anticipate user needs and find relevant categories for each sentence.

[0006] United States patent application US 2013/0041857 Al discloses a system and method for the reordering of text predictions. The system and method reorders the text predictions based on modified probability values, wherein the probability values are modified according to the likelihood that a given text prediction will occur in the text inputted by a user. It further discloses that the ordering of predictions is allowed to be influenced by the likelihood that the predicted term or phrase belongs in the current contextual context, that is in the current text sequence entered by a user. 'Nonlocal' context is allowed to be taken into account.

[0007] United States patent application US 2012/0029910 Al discloses a system comprising a user interface configured to receive text input by a user, a text prediction engine comprising a plurality of language models and configured to receive the input text from the user interface and to generate concurrently text predictions using the plurality of language models, and wherein the text prediction engine is further configured to provide text predictions to the user interface for display and user selection. An analogous method and an interface for use with the system and method are also disclosed. The language model can be further configured to apply a topic filter. N-gram statistics yield estimates of prediction candidate probabilities based on local context, but global context also affects candidate probabilities. A topic filter actively identifies the most likely topic for a given piece of writing and reorders the candidate predictions accordingly. The topic filter takes into account the fact that topical context affects term usage. For instance, given the sequence "was awarded a", the likelihood of the following term being either "penalty" or "grant" is highly dependent on whether the topic of discussion is 'soccer' or 'finance' . Local n-gram context often cannot shed light on this, whilst a topic filter that takes the whole of a segment of text into account might be able to.

[0008] United States Patent 6,202,058 Bl discloses information presented to a user via an information access system being ranked according to a prediction of the likely degree of relevance to the user's interests. A profile of interests is stored for each user having access to the system. Items of information to be presented to a user are ranked according to their likely degree of relevance to that user and displayed in order of ranking. The prediction of relevance is carried out by combining data pertaining to the content of each item of information with other data regarding correlations of interests between users. A value indicative of the content of a document can be added to another value which defines user correlation, to produce a ranking score for a document. Alternatively, multiple regression analysis or evolutionary programming can be carried out with respect to various factors pertaining to document content and user correlation, to generate a prediction of relevance. The user correlation data is obtained from feedback information provided by users when they retrieve items of information.

BRIEF SUMMARY OF THE INVENTION

[0009] Embodiments of the invention provides a method of ranking textual candidates of controlled natural languages, the textual candidates forming portions of a first context, a plurality of contexts being arranged in a hierarchy, the method comprising the steps of: assigning a probability to text fragments forming the textual candidates in the first context in which it is desired to rank the textual candidate; assigning a probability to text fragments forming the textual candidates in contexts in the hierarchy other than the first context;

calculating, for each of the textual candidates, a weighted sum of the assigned probability in the first context and the assigned probabilities in contexts in the hierarchy other than the first context, wherein the weighting applied to each of the assigned probabilities is inversely related to the hierarchical distance between the first context and contexts in the hierarchy other than the first context. [0010] Preferably, the step of calculating takes the higher of (i) the weighted assigned probability in the first context and (ii) the weighted assigned probabilities in contexts in the hierarchy other than the first context.

[0011] Preferably, the weighting applied in the step of calculating in a first context to an assigned probability in a context in the hierarchy other than the first context differs from the weighting applied in a step of calculating in a context in the hierarchy other than the first context to an assigned probability in a first context.

[0012] In an embodiment, said contexts are paragraphs within the hierarchy of a document.

[0013] In another embodiment, said contexts are business rule packages within a business rule project.

[0014] In an embodiment, the method further comprises the steps of: receiving textual or non-textual input; and computing a set of textual candidates.

[0015] In an embodiment, a probability is assigned according to one or more of how frequently a text fragment has been used within a context and how recently a text fragment has been used within a context.

[0016] Embodiments of the invention further provide a system for ranking textual candidates of controlled natural languages, the textual candidates forming portions of a first context, a plurality of contexts being arranged in a hierarchy, the system comprising: a processing device for receiving text fragments forming the textual candidates; and a prediction ranker module for assigning a probability to the text fragments in the first context in which it is desired to rank the textual candidate, for assigning a probability to the text fragments in contexts in the hierarchy other than the first context, and for calculating, for each of the textual candidates, a weighted sum of the assigned probability in the first context and the assigned probabilities in contexts in the hierarchy other than the first context, wherein the weighting applied to each of the assigned probabilities is inversely related to the hierarchical distance between the first context and contexts in the hierarchy other than the first context.

[0017] Further embodiments of the invention provide a computer program product for ranking textual candidates of controlled natural languages, the textual candidates forming portions of a first context, a plurality of contexts being arranged in a hierarchy, the computer program product comprising: a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code adapted to perform the method described above when said program is run on a computer.

[0018] Embodiments of the invention provide the advantage that the ranking is done entirely by the method and system without intervention of an expert. Another advantage is that the ranking dynamically takes into account any modifications made to documents. A further advantage is that hierarchically structured systems storing document or text fragments tend to be naturally organised by topics. They provide the appropriate information to compute a meaningful ranking. A yet further advantage is that assigning a probability to textual candidates based on where similar phrases have been used does not require in-depth knowledge of the language. Consequently the approach both relatively simple to implement and works for any C Ls.

BRIEF DESCRIPTION OF THE DRAWINGS

[0019] Preferred embodiments of the present invention will now be described in more detail, by way of example only, with reference to the accompanying drawings, in which:

Figure 1 shows an embodiment of a system for ranking textual candidates;

Figure 2 shows an embodiment of a method of ranking textual candidates;

Figure 3 shows a first embodiment having a rule project with local rankings and the global ranking of predictions within the "Upgrade" package;

Figure 4 shows a table representing local score of textual candidates on a per package basis for use in the embodiment of figure 3;

Figure 5 shows a table representing weights to apply when propagating phrases across packages for use in the embodiment of figure 3; Figure 6 shows the process of how to update the local ranking of predictions;

Figure 7 shows the process of how the system updates weights of entities;

Figure 8 shows a second embodiment having a document with local rankings and the global ranking of predictions within chapter 2, paragraph 2 of the document;

Figure 9 shows a table representing local score of textual candidates on a per paragraph basis for use in the embodiment of figure 8; and

Figure 10 shows a table representing weights to apply when propagating phrases across paragraphs for use in the embodiment of figure 8.

DETAILED DESCRIPTION OF THE EMBODIMENTS

[0020] Embodiments of the present invention will be described hereinafter with reference to the implementation of the invention in a Business Rule Management System (BRMS). A BRMS is a software system enabling organizational policies and the repeatable decisions associated with those policies, such as claim approvals, pricing calculations and eligibility determinations to be defined, deployed, monitored and maintained separately from application code. Business rules include policies, requirements and conditional statements that are used to determine the tactical actions that take place in applications and systems. However, the practical applications of embodiments of the present invention are not limited to this particular described environment. Embodiments of the present invention can find utility in any structured systems using controlled natural languages.

[0021] Embodiments of the present invention also relate to a method for suggesting relevant completions for an input text that is compliant to a CNL and typed to a text-oriented application running at a user computer. More specifically, the method and system provide a hierarchical-driven ranking mechanism for text completions.

[0022] Editing tools provide a way to organise or structure multiple text fragments. There are many ways to organise text fragments using a CNL within a data processing system. In one embodiment, text fragments may be part of a single document. As an example, each paragraph may represent a separate entity and the document layout defines the structure and the relations between entities. In another embodiment, a system may store entities in a file system. How folders and files are organised defines the hierarchy and thus the relations between entities. Another embodiment consists of identifying the relations between entities by leveraging the grammar of the CNL. Each grammar construct helps define the structure and the relations across text fragments.

[0023] Referring to figure 1, which shows an embodiment of a system 150 for ranking textual candidates. Inputs are received by a processing device 152 which computes sets of textual candidates, recognises text fragments, detects that new text fragments have been input and analyses changes to the hierarchy. The results from the processing device 152 are then sent to prediction ranker module 154 which ranks the sets of textual candidates and then makes suggestions to a user as to the most likely candidates, updates local scores (Ls) of phrases and updates weights associated with entities. The way in which these actions are achieved will be described below with reference to figures 2 to 10.

[0024] Referring to the example of a BRMS, a text fragment may be a business rule and an entity may be a rule package. A rule package may contain multiple business rules. Rule packages may be nested and thus define a hierarchy. A rule project has a set of top-level rule packages and represents the root of the hierarchy.

[0025] When the system determines that a user operation has changed a text fragment, such as a business rule, the textual candidates may be computed and ranked prior to be exposed to the user. For example, when a business expert changes a business rule, the rule editor may choose to parse the text and display all possible phrases that can be inserted at the current location.

[0026] Referring to figure 2, which shows an embodiment of a method of ranking textual candidates. The method starts at step 102. At step 104, a processing device receives input. The input may be non-textual input, such as, for example, digital ink input, speech input, or other input. With respect to the embodiment described below, the input is assumed to be text input.

[0027] At step 106, the input is recognised to compute a predicted set of textual candidates. The predicted set of textual candidates may be based on respective prefixes and one or more data sources such as the vocabulary. [0028] At step 108, a prediction ranker module assigns a probability to each of the identified textual candidates in the predicted set of textual candidates. Step 108 will be described below in more detail with reference to figures 3 to 5. In one embodiment, the prediction ranker module ranks textual candidates prior to, at step 110, presenting the resulting sorted list to the user. In another embodiment, textual candidates are ranked in order to preselect, within a list sorted alphabetically, the textual candidate that has been identified as the most relevant one in the current authoring context. The method ends at step 112.

[0029] According to an embodiment of the invention, the prediction ranker module first assigns a score to each textual candidate by only considering the current entity being edited. This score is referred as a local score. The person skilled in the art may determine how to compute a relevant initial score for a given prediction according to any known method. In one embodiment, the local score may be how many times a phrase has been used within an entity. Another embodiment may choose to maintain the more frequently and the more recently used phrases within an entity and combined those values to get an initial score on a per-phrase basis.

[0030] Referring to figure 3, shown are four rule packages (Refund 210, Discount 208, Upgrade 206 and Checkout 204) within a rule project 202, a local score (Ls) for each phrase and computed scores (Cs) 212 for predicting a textual candidate within the "Upgrade" 206 package.

[0031] The "Refund" 210 package contains several business rules and describes whether or not a refund should be done for a customer purchase. The business rules involved use three different phrases of the rule project vocabulary, that is, (i) the gross of invoice refund amount; (ii) the service is authorized; and (iii) the gross charge of the service. For each of the different phrase, a local score is computed. In the example of figure 3, the local score is how many times each phrase has been used within the package. As can be seen from figure 3, the local scores for phrases (i), (ii) and (iii) are 21, 13 and 4 respectively. [0032] The "Discount" 208 package is used for computing a discount for a customer. The phrases used by the business rules in the "Discount" 208 package are different from those in the "Refund" 210 package necessary to know if a refund should be done. The business rules involved use three different phrases of the rule project vocabulary, that is, (i) the category of the customer; (ii) the age of the customer; and (iii) the amount of the shopping cart. For each of the different phrases, a local score is computed. In the example of figure 3, the local score is how many times each phrase has been used within the package. As can be seen from figure 3, the local scores for phrases (i), (ii) and (iii) are 24, 10 and 20 respectively.

[0033] The "Upgrade" 206 package represents a package for managing customer categories. A user begins entering a text fragment in an existing business rule. Based on this context, the prediction ranker module generates for the "Upgrade" 206 package, a computed score (Cs) for each phrase 212 of the vocabulary as described below.

[0034] As outlined above, embodiments of the present invention disclose a method and system to rank textual candidates by leveraging the hierarchical structure in which are organised the various text fragments. Consequently, the prediction ranker module may rank textual candidates differently depending on where a prediction is requested. For instance, the computed score (Cs) of the phrase "the age of the customer" may be high in the

"Discount" 208 package, but low in the "Refund" 210 package.

[0035] As further illustrated, several aspects of the hierarchy may be involved, to varying degrees, in the processing of a computed score (Cs). For a given location in the hierarchy and a given predicted phrase of the vocabulary, embodiments of the present invention compute a final score by considering each local score (Ls) of that phrase, weigh that local score (Ls) according to hierarchical characteristics, and take the higher of the values.

[0036] In one embodiment, the distance between a pair of nodes, such as packages 204, 206, 208, 210 may weigh local scores. As an example, "the amount of the shopping cart" has a local score (Ls) of 20 in the "Discount" 208 package. The computed score (Cs) of this phrase in the "Upgrade" 206 package may be 10 if the weight to go from one node to its next sibling node is 0.5. Furthermore, this phrase also has a local score (Ls) of 8. The prediction ranker module may choose to use the maximum of all local scores (Ls) to get the computed score (Cs) of a textual candidate.

[0037] In another embodiment, the structure and content of any document (sections, paragraphs etc.) may provide a logical sequence of text fragments that can be used to rank predictions. For example, phrases intensively used in a paragraph may get a higher score in the immediately following paragraph, than phrases used in a paragraph a few pages later.

[0038] In another embodiment, the number of different locations into which a phrase is used may also influence the computed score. For instance, a phrase that has been used three times in two separate packages may have a higher score than a phrase used six times in one package.

[0039] In another embodiment, the nature of a paragraph can also influence the probability associated with a textual candidate. For example, sentences or terms used in an introduction and in a conclusion of a document may have a higher probability associated with them than sentences and terms found in regular paragraphs of the document. An introduction is often the first section of a document and a conclusion is often the last. In this embodiment, the hierarchy of the document is leveraged to finely adjust probabilities. The introduction and conclusion are further apart in distance, but have the hierarchical relationship mentioned earlier in the paragraph. The nature of a paragraph can also influence the system. The prediction ranker module 154 can provide for special treatment for paragraphs at predefined locations within the document.

[0040] Figure 4 shows a table representing local score of textual candidates on a per package basis. The rows in the table represent the phrases used. The columns in the table represent the local scores (Ls) of the phrases in the package identified at the top of each column. The row and the column including "ellipsis" characters (...) indicates that other phrases and other packages have been omitted from the table for brevity and clarity. In the table, the phrase "the gross amount of invoice refund amount" has a local score (Ls) in the "Refund" 210 package of 21 and the phrase "amount of the shopping cart" has a local score (Ls) in the "Upgrade" 208 package of 8. The other exemplary local scores (Ls) for other phrases and other packages can be seen in the table.

[0041] Figure 5 represents the weights to use when propagating a phrase from one package to another. The columns in the table represent the source package of the local scores (Ls) of the phrases used. In the table of figure 5 they are shown in chronological order, but any other order may be used. The rows in the table represent the target packages of the local scores (Ls) of the phrases used. The row and the column including "ellipsis" characters (...) indicates that other packages have been omitted from the table for brevity and clarity. In the table, the weight to use when propagating a phrase from the source "Discount" 208 package to the target "Upgrade" 206 package can be seen to be equal to 0.5. Applying the weight to the example of figure 3, the local score (Ls) of the phrase "the amount of the shopping cart" in the source "Discount" 208 package is weighted by a factor of 0.5 to produce a computed score (Cs) in the target "Upgrade" 206 package for that phrase from the source package of 10, that is 0.5 times 20. Also in the table, the weight to use when propagating a phrase in the opposite direction from the source "Upgrade" 206 package to the target "Discount" 208 package can be seen to be equal to 0.4, that is different from the weighting used when propagating in the opposite direction. Applying the weight to the example of figure 3, the local score (Ls) of the phrase "the amount of the shopping cart" in the source "Upgrade" 206 package is weighted by a factor of 0.4 to produce a computed score (Cs) in the target "Discount" 208 package for that phrase from the source package of 3.2, that is 0.4 times 8. The other exemplary weights for other combinations of source and target packages can be seen in the table.

[0042] Various implementations can be realised but as an example, a function returning the computed score of textual candidates may be:

Cs(p, e) = max ( { w(x, e) * Ls(x, p) : x = 1, n } )

where:

Cs(p, e) : a function giving the computed score of prediction 'p' within entity 'e'

w(x, e) : a function returning the weight to apply for predictions propagating from entity Ϋ to target entity 'e' as illustrated in Figure 5 Ls(x, p) : a function returning the local score within entity Y of prediction 'p' as illustrated in Figure 4

n : the total number of entities (e.g. rule packages)

[0043] In the illustrated example, the ranking of textual candidates is obtained by combining, for each prediction, the local score (Ls) within a package, and the weight associated with propagation from this package to the "Upgrade" 206 package. For example, the phrase "the category of the customer" has a computed score (Cs) of 12 because this phrase has a local score (Ls) of 24 in the "Discount" 208 package and the weight to go from the "Discount" 208 package to the "Upgrade" 206 package is 0.5.

[0044] Another example is the computed score (Cs) within the "Upgrade" 206 package of the phrase "the amount of the shopping cart". The computed score is 10 because this is the maximum of 8 (the local score (Ls) within the "Upgrade" 206 package) and 20 (the local score within the "Discount" 208 package) multiplied by 0.5 (the weight from Figure 5 to go from the source "Discount" 208 package to the target "Upgrade" 206 package).

[0045] In the example of figure 3, six textual candidates are shown. The prediction with the highest computed score (Cs) is "the category of the customer" with a computed score (Cs) of 12. This computed score (Cs) is obtained from the local score (Ls) of 24 in the "Discount" 208 package and the weighting of 0.5 applied from the table of figure 5 for propagation from the Discount" 208 package to the "Upgrade" 206 package. The prediction for the phrase "the amount of the shopping cart" has a computed score (Cs) of 10, which is obtained from higher of (i) the local score (Ls) of 8 in the "Upgrade" 206 package and (ii) the local score (Ls) of 20 in the "Discount" 208 package and the weighting of 0.5 applied from the table of figure 5 for propagation from the Discount" 208 package to the "Upgrade" 206 package, giving a computed score (Cs) of 10.

[0046] Even though the phrase "the amount of the shopping cart" has a local score (Ls) of 8 and the phrase "the category of the customer" has a local score (Ls) of 0, the phrase "the category of the customer" has a higher computed score (Cs) because it is used more frequently in a package that is closely related in the hierarchy. This is in spite of the fact that the phrase "the category of the customer" has not been previously used in the "Upgrade" 206 package.

[0047] Once the prediction ranker engine has computed the final ranking based on the computed scores (Cs), textual candidates can be displayed to the user. It should however, be realised that any other appropriate action can be taken. The global ranking of predictions represents the likely degree of relevance to the user's interests of each phrase at a given location in the hierarchy.

[0048] As the user continues to type, the system may recognise an operation that changes either the local score (Ls) or the weights shown in figure 5 associated to entities. Figure 6 shows how the local ranking of textual candidates is updated.

[0049] Referring to figure 6, the method starts at step 502. At step 504, the processing device receives input. As described above with reference to figure 2, the input may be textual or non-textual input. At step 506, a text fragment is recognised and it is detected that a new phrase has been entered. In a typical embodiment, step 506 may involve a parser dedicated to the controlled natural language currently in use. At step 508, the prediction ranker module may be notified and one or more local scores of each phrase, within one or more packages, may be recalculated. The method ends at step 510.

[0050] When the user performs an operation that modifies the hierarchy of how text fragments are organised, the system may need to check whether or not the weights need some adjustments. Figure 7 shows how the weights associated with each entity storing text fragments are updated.

[0051] Referring to figure 7, the method starts at step 602. At step 604, an application may receive an event that indicates that the hierarchy has been changed. For example, the application may be notified when a rule package has been added or when a new section has been inserted into a document. At step 606, the application may be configured to determine what kind of operation has been performed. For example, this step may be particularly useful in identifying what part of the hierarchy needs to be updated. At step 608, the prediction ranker module may update the weight associated with each entity. For example, when inserting a new rule package, the distance between two entities may get bigger and the respective weights associated with each of the entities may need to be adjusted, the method ends at step 610.

[0052] Referring to figure 8, shown are three paragraphs (Chapter 1, Paragraph 2 810, Chapter 2, Paragraph 1 808, Chapter 2, Paragraph 2 806 within a document 802 having Chapters 814, 804, a local score (Ls) for each phrase and computed scores (Cs) 812 for predicting a textual candidate within Chapter 2, Paragraph 2 806.

[0053] Chapter 1, Paragraph 2 810 contains several phrases. The paragraphs involved use three different phrases of the rule project vocabulary, that is, (i) "anger and rage"; (ii) "climate change"; and (iii) "the great recession". For each of the different phrases, a local score (Ls) is computed. In the example of figure 8, the local score (Ls) is how many times each phrase has been used within the paragraph. As can be seen from figure 8, the local scores (Ls) for phrases (i), (ii) and (iii) are 5, 8 and 4 respectively.

[0054] Chapter 2, Paragraph 1 808 also contains several phrases. The phrases used in Chapter 2, Paragraph 1 808 are different from those in the Chapter 1, Paragraph 2 810. The paragraphs involved use three different phrases of the rule project vocabulary, that is, (i) "linear no threshold"; (ii) "how's that working out for you?"; and (iii) "make no mistake about it". For each of the different phrases, a local score (Ls) is computed. In the example of figure 8, the local score (Ls) is how many times each phrase has been used within the paragraph. As can be seen from figure 8, the local scores (Ls) for phrases (i), (ii) and (iii) are 2, 6 and 7 respectively.

[0055] Chapter 2, Paragraph 2 806 also contains several phrases. A user begins entering a text fragment in the paragraph. Based on this context, the prediction ranker module 154 generates for Chapter 2, Paragraph 2 806, a computed score (Cs) for each phrase 812 of the vocabulary as described below.

[0056] As outlined above, embodiments of the present invention disclose a method and system to rank textual candidates by leveraging the hierarchical structure in which are organised the various text fragments. Consequently, the prediction ranker module 154 may rank textual candidates differently depending on where a prediction is requested. For instance, the computed score (Cs) of the phrase "how's that working out for you?" may be high in Chapter 2, Paragraph 1 808, but low in Chapter 1, Paragraph 2 810.

[0057] As further illustrated, several aspects of the hierarchy may be involved, to varying degrees, in the processing of a computed score (Cs). For a given location in the hierarchy and a given predicted phrase of the vocabulary, embodiments of the present invention compute a final score by considering each local score (Ls) of that phrase, weigh that local score (Ls) according to hierarchical characteristics, and take the higher of the values.

[0058] In one embodiment, the distance between a pair of nodes, such as paragraphs 806, 808, 810 may weigh local scores. As an example, "the great recession" has a local score (Ls) of 4 in Chapter 1, Paragraph 2 810. The computed score (Cs) of this phrase in Chapter 2, Paragraph 2 806 may be 2 if the weight to go from one node to its next sibling node is 0.5. However, this phrase also has a local score (Ls) of 3. The prediction ranker module may choose to use the maximum of all local scores (Ls) to get the computed score (Cs) of a textual candidate.

[0059] In another embodiment, the structure and content of any document (sections, paragraphs etc.) may provide a logical sequence of text fragments that can be used to rank predictions. For example, phrases intensively used in a paragraph may get a higher score in the immediately following paragraph, than phrases used in a paragraph a few pages later.

[0060] In another embodiment, the number of different locations into which a phrase is used may also influence the computed score. For instance, a phrase that has been used three times in two separate paragraphs may have a higher score than a phrase used six times in one paragraph.

[0061] Figure 9 shows a table representing local score of textual candidates on a per paragraph basis. In the table, the phrase "the great recession" has a local score (Ls) in the Chapter 1, Paragraph 2 810 of 4 and the phrase "make no mistake about it" has a local score (Ls) in Chapter 2, Paragraph 1 808 of 7. The other exemplary local scores (Ls) for other phrases and other paragraphs can be seen in the table.

[0062] Figure 10 represents the weights to use when propagating a phrase from one paragraph to another. In the table, the weight to use when propagating a phrase from the source Chapter 2, Paragraph 1 808 to the target Chapter 2, Paragraph 2 806 can be seen to be equal to 0.25. Applying the weight to the example of figure 8, the local score (Ls) of the phrase "make no mistake about it" in the source Chapter 2, Paragraph 1 808 is weighted by a factor of 0.25 to produce a computed score (Cs) in the target Chapter 2, Paragraph 2 806 for that phrase from the source paragraph of 1.75, that is 0.25 times 7. Also in the table, the weight to use when propagating a phrase in the opposite direction from the source Chapter 2, Paragraph 2 806 paragraph to the target Chapter 2, Paragraph 1 808 can be seen to be equal to 0.2, that is different from the weighting used when propagating in the opposite direction. Applying the weight to the example of figure 8, the local score (Ls) of the phrase "make no mistake about it" in the source Chapter 2, Paragraph 2 806 is weighted by a factor of 0.2 to produce a computed score (Cs) in the target Chapter 2, Paragraph 1 808 for that phrase from the source paragraph of 0, that is 0.2 times 0. The other exemplary weights for other combinations of source and target paragraphs can be seen in the table.

[0063] Embodiments of the invention can take the form of a computer program accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer usable or computer readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus or device.

[0064] The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk read only memory (CD-ROM), compact disk read/write (CD-RW), and DVD.

Claims

1. A method of ranking textual candidates of controlled natural languages, the textual candidates forming portions of a first context, a plurality of contexts being arranged in a hierarchy, the method comprising the steps of:

assigning a probability to text fragments forming the textual candidates in the first context in which it is desired to rank the textual candidate;

assigning a probability to text fragments forming the textual candidates in contexts in the hierarchy other than the first context;

calculating, for each of the textual candidates, a weighted sum of the assigned probability in the first context and the assigned probabilities in contexts in the hierarchy other than the first context, wherein the weighting applied to each of the assigned probabilities is inversely related to the hierarchical distance between the first context and contexts in the hierarchy other than the first context.

2. A method as claimed in claim 1, wherein the step of calculating takes the higher of (i) the weighted assigned probability in the first context and (ii) the weighted assigned probabilities in contexts in the hierarchy other than the first context.

3. A method as claimed in claim 1, wherein the weighting applied in the step of calculating in a first context to an assigned probability in a context in the hierarchy other than the first context differs from the weighting applied in a step of calculating in a context in the hierarchy other than the first context to an assigned probability in a first context.

4. A method as claimed in claim 1, wherein said contexts are paragraphs within the hierarchy of a document.

5. A method as claimed in claim 1, wherein contexts are business rule packages within the hierarchy of a business rule project.

6. A method as claimed in claim 1, further comprising the steps of:

receiving textual or non-textual input; and computing a set of textual candidates.

7. A method as claimed in claim 1, wherein a probability is assigned according to one or more of how frequently a text fragment has been used within a context and how recently a text fragment has been used within a context.

8. A system for ranking textual candidates of controlled natural languages, the textual candidates forming portions of a first context, a plurality of contexts being arranged in a hierarchy, the system comprising:

a processing device for receiving text fragments forming the textual candidates; and

a prediction ranker module for assigning a probability to the text fragments in the first context in which it is desired to rank the textual candidate, for assigning a probability to the text fragments in contexts in the hierarchy other than the first context, and for calculating, for each of the textual candidates, a weighted sum of the assigned probability in the first context and the assigned probabilities in contexts in the hierarchy other than the first context, wherein the weighting applied to each of the assigned probabilities is inversely related to the hierarchical distance between the first context and contexts in the hierarchy other than the first context.

9. A system as claimed in claim 8, wherein the prediction ranker module calculates the weighted sum by taking the higher of (i) the weighted assigned probability in the first context and (ii) the weighted assigned probabilities in contexts in the hierarchy other than the first context.

10. A system as claimed in claim 8, wherein the weighting applied in the step of calculating in a first context to an assigned probability in a context in the hierarchy other than the first context differs from the weighting applied in a step of calculating in a context in the hierarchy other than the first context to an assigned probability in a first context.

11. A system as claimed in claim 8, wherein said contexts are paragraphs within the hierarchy of a document.

12. A system as claimed in claim 8, wherein contexts are business rule packages within the hierarchy of a business rule project.

13. A system as claimed in claim 8, wherein:

the processing device receives textual or non-textual input; and

the prediction ranker module computes a set of textual candidates.

14. A system as claimed in claim 8, wherein a probability is assigned according to one or more of how frequently a text fragment has been used within a context and how recently a text fragment has been used within a context.

15. A computer program product for ranking textual candidates of controlled natural languages, the textual candidates forming portions of a first context, a plurality of contexts being arranged in a hierarchy, the computer program product comprising:

a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code adapted to perform the method of any one of claim 1 to claim 7 when said program is run on a computer.

16. A method substantially as hereinbefore described, with reference to figures 1 to 10 of the accompanying drawings.