WO2015071804A1 - Ranking prediction candidates of controlled natural languages or business rules depending on document hierarchy - Google Patents

Ranking prediction candidates of controlled natural languages or business rules depending on document hierarchy Download PDF

Info

Publication number
WO2015071804A1
WO2015071804A1 PCT/IB2014/065838 IB2014065838W WO2015071804A1 WO 2015071804 A1 WO2015071804 A1 WO 2015071804A1 IB 2014065838 W IB2014065838 W IB 2014065838W WO 2015071804 A1 WO2015071804 A1 WO 2015071804A1
Authority
WO
WIPO (PCT)
Prior art keywords
context
hierarchy
textual
contexts
probability
Prior art date
Application number
PCT/IB2014/065838
Other languages
French (fr)
Inventor
Thierry Kormann
Stephane Hillion
Original Assignee
International Business Machines Corporation
Ibm United Kingdom Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corporation, Ibm United Kingdom Limited filed Critical International Business Machines Corporation
Publication of WO2015071804A1 publication Critical patent/WO2015071804A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/137Hierarchical processing, e.g. outlines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/131Fragmentation of text files, e.g. creating reusable text-blocks; Linking to fragments, e.g. using XInclude; Namespaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/274Converting codes to words; Guess-ahead of partial word inputs

Definitions

  • the present invention relates to ranking textual candidates of controlled natural languages (CNLs) and more particularly, to the ranking of textual candidates by assigning a probability to text fragments forming the textual candidates using a hierarchical-driven ranking mechanism for text completions.
  • CNLs controlled natural languages
  • CNLs are subset of natural languages, the subset being capable of being understood by computer systems by restricting both the grammar and the vocabulary in order to reduce or remove ambiguity and complexity.
  • Autocomplete is a feature that
  • Another solution consists of ranking textual candidates based on the history of most recently used and/or the most frequently used words or phrases. This method provides interesting results but is mainly effective for repetitive tasks, or tasks that do not involve very frequent context switching.
  • a similar solution uses a word prediction algorithm and can use the semantics and the location of the text being entered to rank textual candidates. For instance, given a common text prefix, the completion menu of a code editor shows variables prior class names within a method. This technique provides pertinent rankings but requires in-depth knowledge of the semantic of the entire language. Furthermore, the implementation of such algorithms is hard to achieve.
  • a further solution consists of annotating (or categorising) all the phrases of a vocabulary and declaring for each document (or part of it) which category or set of categories is permitted.
  • This method can provide meaningful results but require a difficult and time-consuming initial step. Furthermore, it may be difficult to anticipate user needs and find relevant categories for each sentence.
  • United States patent application US 2013/0041857 Al discloses a system and method for the reordering of text predictions.
  • the system and method reorders the text predictions based on modified probability values, wherein the probability values are modified according to the likelihood that a given text prediction will occur in the text inputted by a user. It further discloses that the ordering of predictions is allowed to be influenced by the likelihood that the predicted term or phrase belongs in the current contextual context, that is in the current text sequence entered by a user. 'Nonlocal' context is allowed to be taken into account.
  • United States patent application US 2012/0029910 Al discloses a system comprising a user interface configured to receive text input by a user, a text prediction engine comprising a plurality of language models and configured to receive the input text from the user interface and to generate concurrently text predictions using the plurality of language models, and wherein the text prediction engine is further configured to provide text predictions to the user interface for display and user selection.
  • An analogous method and an interface for use with the system and method are also disclosed.
  • the language model can be further configured to apply a topic filter. N-gram statistics yield estimates of prediction candidate probabilities based on local context, but global context also affects candidate probabilities.
  • a topic filter actively identifies the most likely topic for a given piece of writing and reorders the candidate predictions accordingly.
  • the topic filter takes into account the fact that topical context affects term usage. For instance, given the sequence "was awarded a", the likelihood of the following term being either "penalty” or "grant" is highly dependent on whether the topic of discussion is 'soccer' or 'finance' . Local n-gram context often cannot shed light on this, whilst a topic filter that takes the whole of a segment of text into account might be able to.
  • United States Patent 6,202,058 Bl discloses information presented to a user via an information access system being ranked according to a prediction of the likely degree of relevance to the user's interests.
  • a profile of interests is stored for each user having access to the system. Items of information to be presented to a user are ranked according to their likely degree of relevance to that user and displayed in order of ranking.
  • the prediction of relevance is carried out by combining data pertaining to the content of each item of information with other data regarding correlations of interests between users.
  • a value indicative of the content of a document can be added to another value which defines user correlation, to produce a ranking score for a document.
  • multiple regression analysis or evolutionary programming can be carried out with respect to various factors pertaining to document content and user correlation, to generate a prediction of relevance.
  • the user correlation data is obtained from feedback information provided by users when they retrieve items of information.
  • Embodiments of the invention provides a method of ranking textual candidates of controlled natural languages, the textual candidates forming portions of a first context, a plurality of contexts being arranged in a hierarchy, the method comprising the steps of: assigning a probability to text fragments forming the textual candidates in the first context in which it is desired to rank the textual candidate; assigning a probability to text fragments forming the textual candidates in contexts in the hierarchy other than the first context;
  • the step of calculating takes the higher of (i) the weighted assigned probability in the first context and (ii) the weighted assigned probabilities in contexts in the hierarchy other than the first context.
  • the weighting applied in the step of calculating in a first context to an assigned probability in a context in the hierarchy other than the first context differs from the weighting applied in a step of calculating in a context in the hierarchy other than the first context to an assigned probability in a first context.
  • said contexts are paragraphs within the hierarchy of a document.
  • said contexts are business rule packages within a business rule project.
  • the method further comprises the steps of: receiving textual or non-textual input; and computing a set of textual candidates.
  • a probability is assigned according to one or more of how frequently a text fragment has been used within a context and how recently a text fragment has been used within a context.
  • Embodiments of the invention further provide a system for ranking textual candidates of controlled natural languages, the textual candidates forming portions of a first context, a plurality of contexts being arranged in a hierarchy, the system comprising: a processing device for receiving text fragments forming the textual candidates; and a prediction ranker module for assigning a probability to the text fragments in the first context in which it is desired to rank the textual candidate, for assigning a probability to the text fragments in contexts in the hierarchy other than the first context, and for calculating, for each of the textual candidates, a weighted sum of the assigned probability in the first context and the assigned probabilities in contexts in the hierarchy other than the first context, wherein the weighting applied to each of the assigned probabilities is inversely related to the hierarchical distance between the first context and contexts in the hierarchy other than the first context.
  • Embodiments of the invention provide the advantage that the ranking is done entirely by the method and system without intervention of an expert. Another advantage is that the ranking dynamically takes into account any modifications made to documents. A further advantage is that hierarchically structured systems storing document or text fragments tend to be naturally organised by topics. They provide the appropriate information to compute a meaningful ranking. A yet further advantage is that assigning a probability to textual candidates based on where similar phrases have been used does not require in-depth knowledge of the language. Consequently the approach both relatively simple to implement and works for any C Ls.
  • Figure 1 shows an embodiment of a system for ranking textual candidates
  • Figure 2 shows an embodiment of a method of ranking textual candidates
  • Figure 3 shows a first embodiment having a rule project with local rankings and the global ranking of predictions within the "Upgrade" package
  • Figure 4 shows a table representing local score of textual candidates on a per package basis for use in the embodiment of figure 3;
  • Figure 5 shows a table representing weights to apply when propagating phrases across packages for use in the embodiment of figure 3;
  • Figure 6 shows the process of how to update the local ranking of predictions;
  • Figure 7 shows the process of how the system updates weights of entities
  • Figure 8 shows a second embodiment having a document with local rankings and the global ranking of predictions within chapter 2, paragraph 2 of the document;
  • Figure 9 shows a table representing local score of textual candidates on a per paragraph basis for use in the embodiment of figure 8.
  • Figure 10 shows a table representing weights to apply when propagating phrases across paragraphs for use in the embodiment of figure 8.
  • BRMS Business Rule Management System
  • a BRMS is a software system enabling organizational policies and the repeatable decisions associated with those policies, such as claim approvals, pricing calculations and eligibility determinations to be defined, deployed, monitored and maintained separately from application code.
  • Business rules include policies, requirements and conditional statements that are used to determine the tactical actions that take place in applications and systems.
  • Embodiments of the present invention can find utility in any structured systems using controlled natural languages.
  • Embodiments of the present invention also relate to a method for suggesting relevant completions for an input text that is compliant to a CNL and typed to a text-oriented application running at a user computer. More specifically, the method and system provide a hierarchical-driven ranking mechanism for text completions.
  • Editing tools provide a way to organise or structure multiple text fragments.
  • organise text fragments may be part of a single document.
  • each paragraph may represent a separate entity and the document layout defines the structure and the relations between entities.
  • a system may store entities in a file system. How folders and files are organised defines the hierarchy and thus the relations between entities.
  • Another embodiment consists of identifying the relations between entities by leveraging the grammar of the CNL. Each grammar construct helps define the structure and the relations across text fragments.
  • FIG 1 shows an embodiment of a system 150 for ranking textual candidates.
  • Inputs are received by a processing device 152 which computes sets of textual candidates, recognises text fragments, detects that new text fragments have been input and analyses changes to the hierarchy.
  • the results from the processing device 152 are then sent to prediction ranker module 154 which ranks the sets of textual candidates and then makes suggestions to a user as to the most likely candidates, updates local scores (Ls) of phrases and updates weights associated with entities.
  • Ls local scores
  • a text fragment may be a business rule and an entity may be a rule package.
  • a rule package may contain multiple business rules. Rule packages may be nested and thus define a hierarchy.
  • a rule project has a set of top-level rule packages and represents the root of the hierarchy.
  • the textual candidates may be computed and ranked prior to be exposed to the user. For example, when a business expert changes a business rule, the rule editor may choose to parse the text and display all possible phrases that can be inserted at the current location.
  • FIG 2 shows an embodiment of a method of ranking textual candidates.
  • the method starts at step 102.
  • a processing device receives input.
  • the input may be non-textual input, such as, for example, digital ink input, speech input, or other input. With respect to the embodiment described below, the input is assumed to be text input.
  • the input is recognised to compute a predicted set of textual candidates.
  • the predicted set of textual candidates may be based on respective prefixes and one or more data sources such as the vocabulary.
  • a prediction ranker module assigns a probability to each of the identified textual candidates in the predicted set of textual candidates. Step 108 will be described below in more detail with reference to figures 3 to 5.
  • the prediction ranker module ranks textual candidates prior to, at step 110, presenting the resulting sorted list to the user.
  • textual candidates are ranked in order to preselect, within a list sorted alphabetically, the textual candidate that has been identified as the most relevant one in the current authoring context. The method ends at step 112.
  • the prediction ranker module first assigns a score to each textual candidate by only considering the current entity being edited. This score is referred as a local score.
  • the person skilled in the art may determine how to compute a relevant initial score for a given prediction according to any known method.
  • the local score may be how many times a phrase has been used within an entity.
  • Another embodiment may choose to maintain the more frequently and the more recently used phrases within an entity and combined those values to get an initial score on a per-phrase basis.
  • FIG. 3 shown are four rule packages (Refund 210, Discount 208, Upgrade 206 and Checkout 204) within a rule project 202, a local score (Ls) for each phrase and computed scores (Cs) 212 for predicting a textual candidate within the "Upgrade" 206 package.
  • Refund 210 Discount 208
  • Upgrade 206 Upgrade 206
  • Checkout 204 a local score for each phrase
  • Cs computed scores
  • the "Refund” 210 package contains several business rules and describes whether or not a refund should be done for a customer purchase.
  • the business rules involved use three different phrases of the rule project vocabulary, that is, (i) the gross of invoice refund amount; (ii) the service is authorized; and (iii) the gross charge of the service.
  • a local score is computed. In the example of figure 3, the local score is how many times each phrase has been used within the package. As can be seen from figure 3, the local scores for phrases (i), (ii) and (iii) are 21, 13 and 4 respectively.
  • the "Discount" 208 package is used for computing a discount for a customer.
  • the phrases used by the business rules in the "Discount” 208 package are different from those in the "Refund” 210 package necessary to know if a refund should be done.
  • the business rules involved use three different phrases of the rule project vocabulary, that is, (i) the category of the customer; (ii) the age of the customer; and (iii) the amount of the shopping cart.
  • a local score is computed. In the example of figure 3, the local score is how many times each phrase has been used within the package. As can be seen from figure 3, the local scores for phrases (i), (ii) and (iii) are 24, 10 and 20 respectively.
  • the "Upgrade” 206 package represents a package for managing customer categories. A user begins entering a text fragment in an existing business rule. Based on this context, the prediction ranker module generates for the "Upgrade” 206 package, a computed score (Cs) for each phrase 212 of the vocabulary as described below.
  • Cs computed score
  • embodiments of the present invention disclose a method and system to rank textual candidates by leveraging the hierarchical structure in which are organised the various text fragments. Consequently, the prediction ranker module may rank textual candidates differently depending on where a prediction is requested. For instance, the computed score (Cs) of the phrase "the age of the customer" may be high in the
  • a computed score For a given location in the hierarchy and a given predicted phrase of the vocabulary, embodiments of the present invention compute a final score by considering each local score (Ls) of that phrase, weigh that local score (Ls) according to hierarchical characteristics, and take the higher of the values.
  • the distance between a pair of nodes may weigh local scores.
  • “the amount of the shopping cart” has a local score (Ls) of 20 in the "Discount” 208 package.
  • the computed score (Cs) of this phrase in the "Upgrade” 206 package may be 10 if the weight to go from one node to its next sibling node is 0.5.
  • this phrase also has a local score (Ls) of 8.
  • the prediction ranker module may choose to use the maximum of all local scores (Ls) to get the computed score (Cs) of a textual candidate.
  • the structure and content of any document may provide a logical sequence of text fragments that can be used to rank predictions. For example, phrases intensively used in a paragraph may get a higher score in the immediately following paragraph, than phrases used in a paragraph a few pages later.
  • the number of different locations into which a phrase is used may also influence the computed score. For instance, a phrase that has been used three times in two separate packages may have a higher score than a phrase used six times in one package.
  • the nature of a paragraph can also influence the probability associated with a textual candidate.
  • sentences or terms used in an introduction and in a conclusion of a document may have a higher probability associated with them than sentences and terms found in regular paragraphs of the document.
  • An introduction is often the first section of a document and a conclusion is often the last.
  • the hierarchy of the document is leveraged to finely adjust probabilities.
  • the introduction and conclusion are further apart in distance, but have the hierarchical relationship mentioned earlier in the paragraph.
  • the nature of a paragraph can also influence the system.
  • the prediction ranker module 154 can provide for special treatment for paragraphs at predefined locations within the document.
  • Figure 4 shows a table representing local score of textual candidates on a per package basis.
  • the rows in the table represent the phrases used.
  • the columns in the table represent the local scores (Ls) of the phrases in the package identified at the top of each column.
  • the row and the column including "ellipsis" characters indicates that other phrases and other packages have been omitted from the table for brevity and clarity.
  • the phrase "the gross amount of invoice refund amount” has a local score (Ls) in the "Refund” 210 package of 21 and the phrase “amount of the shopping cart” has a local score (Ls) in the "Upgrade” 208 package of 8.
  • the other exemplary local scores (Ls) for other phrases and other packages can be seen in the table.
  • Figure 5 represents the weights to use when propagating a phrase from one package to another.
  • the columns in the table represent the source package of the local scores (Ls) of the phrases used. In the table of figure 5 they are shown in chronological order, but any other order may be used.
  • the rows in the table represent the target packages of the local scores (Ls) of the phrases used.
  • the row and the column including "ellipsis" characters indicates that other packages have been omitted from the table for brevity and clarity.
  • the weight to use when propagating a phrase from the source "Discount” 208 package to the target "Upgrade” 206 package can be seen to be equal to 0.5.
  • the local score (Ls) of the phrase “the amount of the shopping cart” in the source “Discount” 208 package is weighted by a factor of 0.5 to produce a computed score (Cs) in the target "Upgrade” 206 package for that phrase from the source package of 10, that is 0.5 times 20.
  • the weight to use when propagating a phrase in the opposite direction from the source "Upgrade” 206 package to the target "Discount” 208 package can be seen to be equal to 0.4, that is different from the weighting used when propagating in the opposite direction.
  • the local score (Ls) of the phrase "the amount of the shopping cart” in the source "Upgrade” 206 package is weighted by a factor of 0.4 to produce a computed score (Cs) in the target "Discount” 208 package for that phrase from the source package of 3.2, that is 0.4 times 8.
  • Cs computed score
  • a function returning the computed score of textual candidates may be:
  • n the total number of entities (e.g. rule packages)
  • the ranking of textual candidates is obtained by combining, for each prediction, the local score (Ls) within a package, and the weight associated with propagation from this package to the "Upgrade” 206 package.
  • the phrase "the category of the customer” has a computed score (Cs) of 12 because this phrase has a local score (Ls) of 24 in the "Discount” 208 package and the weight to go from the "Discount” 208 package to the "Upgrade” 206 package is 0.5.
  • Cs computed score within the "Upgrade” 206 package of the phrase "the amount of the shopping cart”.
  • the computed score is 10 because this is the maximum of 8 (the local score (Ls) within the "Upgrade” 206 package) and 20 (the local score within the "Discount” 208 package) multiplied by 0.5 (the weight from Figure 5 to go from the source "Discount” 208 package to the target "Upgrade” 206 package).
  • the prediction with the highest computed score (Cs) is "the category of the customer” with a computed score (Cs) of 12.
  • This computed score (Cs) is obtained from the local score (Ls) of 24 in the "Discount” 208 package and the weighting of 0.5 applied from the table of figure 5 for propagation from the Discount” 208 package to the "Upgrade” 206 package.
  • the prediction for the phrase "the amount of the shopping cart” has a computed score (Cs) of 10, which is obtained from higher of (i) the local score (Ls) of 8 in the "Upgrade” 206 package and (ii) the local score (Ls) of 20 in the "Discount” 208 package and the weighting of 0.5 applied from the table of figure 5 for propagation from the Discount” 208 package to the "Upgrade” 206 package, giving a computed score (Cs) of 10.
  • the prediction ranker engine has computed the final ranking based on the computed scores (Cs), textual candidates can be displayed to the user. It should however, be realised that any other appropriate action can be taken.
  • the global ranking of predictions represents the likely degree of relevance to the user's interests of each phrase at a given location in the hierarchy.
  • the system may recognise an operation that changes either the local score (Ls) or the weights shown in figure 5 associated to entities.
  • Figure 6 shows how the local ranking of textual candidates is updated.
  • the method starts at step 502.
  • the processing device receives input.
  • the input may be textual or non-textual input.
  • a text fragment is recognised and it is detected that a new phrase has been entered.
  • step 506 may involve a parser dedicated to the controlled natural language currently in use.
  • the prediction ranker module may be notified and one or more local scores of each phrase, within one or more packages, may be recalculated. The method ends at step 510.
  • Figure 7 shows how the weights associated with each entity storing text fragments are updated.
  • an application may receive an event that indicates that the hierarchy has been changed. For example, the application may be notified when a rule package has been added or when a new section has been inserted into a document.
  • the application may be configured to determine what kind of operation has been performed. For example, this step may be particularly useful in identifying what part of the hierarchy needs to be updated.
  • the prediction ranker module may update the weight associated with each entity. For example, when inserting a new rule package, the distance between two entities may get bigger and the respective weights associated with each of the entities may need to be adjusted, the method ends at step 610.
  • Chapter 1 Paragraph 2 810 contains several phrases.
  • the paragraphs involved use three different phrases of the rule project vocabulary, that is, (i) “anger and rage”; (ii) “climate change”; and (iii) "the great recession”.
  • a local score (Ls) is computed.
  • the local score (Ls) is how many times each phrase has been used within the paragraph.
  • the local scores (Ls) for phrases (i), (ii) and (iii) are 5, 8 and 4 respectively.
  • Chapter 2 Paragraph 1 808 also contains several phrases.
  • the phrases used in Chapter 2, Paragraph 1 808 are different from those in the Chapter 1, Paragraph 2 810.
  • the paragraphs involved use three different phrases of the rule project vocabulary, that is, (i) "linear no threshold”; (ii) "how's that working out for you?"; and (iii) "make no mistake about it".
  • a local score (Ls) is computed.
  • the local score (Ls) is how many times each phrase has been used within the paragraph.
  • the local scores (Ls) for phrases (i), (ii) and (iii) are 2, 6 and 7 respectively.
  • Chapter 2 806 also contains several phrases. A user begins entering a text fragment in the paragraph. Based on this context, the prediction ranker module 154 generates for Chapter 2, Paragraph 2 806, a computed score (Cs) for each phrase 812 of the vocabulary as described below.
  • Cs computed score
  • embodiments of the present invention disclose a method and system to rank textual candidates by leveraging the hierarchical structure in which are organised the various text fragments. Consequently, the prediction ranker module 154 may rank textual candidates differently depending on where a prediction is requested. For instance, the computed score (Cs) of the phrase "how's that working out for you?" may be high in Chapter 2, Paragraph 1 808, but low in Chapter 1, Paragraph 2 810.
  • a computed score For a given location in the hierarchy and a given predicted phrase of the vocabulary, embodiments of the present invention compute a final score by considering each local score (Ls) of that phrase, weigh that local score (Ls) according to hierarchical characteristics, and take the higher of the values.
  • the distance between a pair of nodes may weigh local scores.
  • "the great recession” has a local score (Ls) of 4 in Chapter 1, Paragraph 2 810.
  • the computed score (Cs) of this phrase in Chapter 2, Paragraph 2 806 may be 2 if the weight to go from one node to its next sibling node is 0.5.
  • this phrase also has a local score (Ls) of 3.
  • the prediction ranker module may choose to use the maximum of all local scores (Ls) to get the computed score (Cs) of a textual candidate.
  • the structure and content of any document may provide a logical sequence of text fragments that can be used to rank predictions. For example, phrases intensively used in a paragraph may get a higher score in the immediately following paragraph, than phrases used in a paragraph a few pages later.
  • the number of different locations into which a phrase is used may also influence the computed score. For instance, a phrase that has been used three times in two separate paragraphs may have a higher score than a phrase used six times in one paragraph.
  • Figure 9 shows a table representing local score of textual candidates on a per paragraph basis.
  • the phrase "the great recession” has a local score (Ls) in the Chapter 1, Paragraph 2 810 of 4 and the phrase “make no mistake about it” has a local score (Ls) in Chapter 2, Paragraph 1 808 of 7.
  • the other exemplary local scores (Ls) for other phrases and other paragraphs can be seen in the table.
  • Figure 10 represents the weights to use when propagating a phrase from one paragraph to another.
  • the weight to use when propagating a phrase from the source Chapter 2, Paragraph 1 808 to the target Chapter 2, Paragraph 2 806 can be seen to be equal to 0.25.
  • the local score (Ls) of the phrase "make no mistake about it" in the source Chapter 2 Paragraph 1 808 is weighted by a factor of 0.25 to produce a computed score (Cs) in the target Chapter 2, Paragraph 2 806 for that phrase from the source paragraph of 1.75, that is 0.25 times 7.
  • the weight to use when propagating a phrase in the opposite direction from the source Chapter 2, Paragraph 2 806 paragraph to the target Chapter 2, Paragraph 1 808 can be seen to be equal to 0.2, that is different from the weighting used when propagating in the opposite direction.
  • the local score (Ls) of the phrase "make no mistake about it" in the source Chapter 2 Paragraph 2 806 is weighted by a factor of 0.2 to produce a computed score (Cs) in the target Chapter 2, Paragraph 1 808 for that phrase from the source paragraph of 0, that is 0.2 times 0.
  • the other exemplary weights for other combinations of source and target paragraphs can be seen in the table.
  • Embodiments of the invention can take the form of a computer program accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system.
  • a computer usable or computer readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus or device.
  • the medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium.
  • Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read only memory (ROM), a rigid magnetic disk and an optical disk.
  • Current examples of optical disks include compact disk read only memory (CD-ROM), compact disk read/write (CD-RW), and DVD.

Abstract

Disclosed is a method of ranking textual candidates of controlled natural languages, the textual candidates forming portions of a first context, a plurality of contexts being arranged in a hierarchy. The method comprises the steps of assigning a probability to text fragments forming the textual candidates in the first context in which it is desired to rank the textual candidate; assigning a probability to text fragments forming the textual candidates in contexts in the hierarchy other than the first context; for each of the textual candidates, taking the higher of (i) the weighted assigned probability in the first context and (ii) the weighted assigned probabilities in contexts in the hierarchy other than the first context, wherein the weighting applied to each of the assigned probabilities is inversely related to the hierarchical distance between the first context and contexts in the hierarchy other than the first context.

Description

RANKING PREDICTION CANDIDATES OF CONTROLLED NATURAL LANGUAGES OR BUSINESS RULES DEPENDING ON DOCUMENT HIERARCHY
FIELD OF THE INVENTION
[0001] The present invention relates to ranking textual candidates of controlled natural languages (CNLs) and more particularly, to the ranking of textual candidates by assigning a probability to text fragments forming the textual candidates using a hierarchical-driven ranking mechanism for text completions.
BACKGROUND
[0002] CNLs are subset of natural languages, the subset being capable of being understood by computer systems by restricting both the grammar and the vocabulary in order to reduce or remove ambiguity and complexity. Many computer systems and more specifically editing tools exist for CNLs. These often provide advanced features such as validation, syntax highlighting, or autocomplete. Autocomplete is a feature that
automatically predicts the remaining words or phrases that the user wants to type in without actually typing it completely. This feature is particularly effective when editing text written in highly structured, easy-to-predict languages such as CNLs. However, when a language has an extensive vocabulary, providing relevant textual candidates among the valid predictions computed by the system remains a challenge.
[0003] Another solution consists of ranking textual candidates based on the history of most recently used and/or the most frequently used words or phrases. This method provides interesting results but is mainly effective for repetitive tasks, or tasks that do not involve very frequent context switching.
[0004] A similar solution uses a word prediction algorithm and can use the semantics and the location of the text being entered to rank textual candidates. For instance, given a common text prefix, the completion menu of a code editor shows variables prior class names within a method. This technique provides pertinent rankings but requires in-depth knowledge of the semantic of the entire language. Furthermore, the implementation of such algorithms is hard to achieve.
[0005] A further solution consists of annotating (or categorising) all the phrases of a vocabulary and declaring for each document (or part of it) which category or set of categories is permitted. This method can provide meaningful results but require a difficult and time-consuming initial step. Furthermore, it may be difficult to anticipate user needs and find relevant categories for each sentence.
[0006] United States patent application US 2013/0041857 Al discloses a system and method for the reordering of text predictions. The system and method reorders the text predictions based on modified probability values, wherein the probability values are modified according to the likelihood that a given text prediction will occur in the text inputted by a user. It further discloses that the ordering of predictions is allowed to be influenced by the likelihood that the predicted term or phrase belongs in the current contextual context, that is in the current text sequence entered by a user. 'Nonlocal' context is allowed to be taken into account.
[0007] United States patent application US 2012/0029910 Al discloses a system comprising a user interface configured to receive text input by a user, a text prediction engine comprising a plurality of language models and configured to receive the input text from the user interface and to generate concurrently text predictions using the plurality of language models, and wherein the text prediction engine is further configured to provide text predictions to the user interface for display and user selection. An analogous method and an interface for use with the system and method are also disclosed. The language model can be further configured to apply a topic filter. N-gram statistics yield estimates of prediction candidate probabilities based on local context, but global context also affects candidate probabilities. A topic filter actively identifies the most likely topic for a given piece of writing and reorders the candidate predictions accordingly. The topic filter takes into account the fact that topical context affects term usage. For instance, given the sequence "was awarded a", the likelihood of the following term being either "penalty" or "grant" is highly dependent on whether the topic of discussion is 'soccer' or 'finance' . Local n-gram context often cannot shed light on this, whilst a topic filter that takes the whole of a segment of text into account might be able to.
[0008] United States Patent 6,202,058 Bl discloses information presented to a user via an information access system being ranked according to a prediction of the likely degree of relevance to the user's interests. A profile of interests is stored for each user having access to the system. Items of information to be presented to a user are ranked according to their likely degree of relevance to that user and displayed in order of ranking. The prediction of relevance is carried out by combining data pertaining to the content of each item of information with other data regarding correlations of interests between users. A value indicative of the content of a document can be added to another value which defines user correlation, to produce a ranking score for a document. Alternatively, multiple regression analysis or evolutionary programming can be carried out with respect to various factors pertaining to document content and user correlation, to generate a prediction of relevance. The user correlation data is obtained from feedback information provided by users when they retrieve items of information.
BRIEF SUMMARY OF THE INVENTION
[0009] Embodiments of the invention provides a method of ranking textual candidates of controlled natural languages, the textual candidates forming portions of a first context, a plurality of contexts being arranged in a hierarchy, the method comprising the steps of: assigning a probability to text fragments forming the textual candidates in the first context in which it is desired to rank the textual candidate; assigning a probability to text fragments forming the textual candidates in contexts in the hierarchy other than the first context;
calculating, for each of the textual candidates, a weighted sum of the assigned probability in the first context and the assigned probabilities in contexts in the hierarchy other than the first context, wherein the weighting applied to each of the assigned probabilities is inversely related to the hierarchical distance between the first context and contexts in the hierarchy other than the first context. [0010] Preferably, the step of calculating takes the higher of (i) the weighted assigned probability in the first context and (ii) the weighted assigned probabilities in contexts in the hierarchy other than the first context.
[0011] Preferably, the weighting applied in the step of calculating in a first context to an assigned probability in a context in the hierarchy other than the first context differs from the weighting applied in a step of calculating in a context in the hierarchy other than the first context to an assigned probability in a first context.
[0012] In an embodiment, said contexts are paragraphs within the hierarchy of a document.
[0013] In another embodiment, said contexts are business rule packages within a business rule project.
[0014] In an embodiment, the method further comprises the steps of: receiving textual or non-textual input; and computing a set of textual candidates.
[0015] In an embodiment, a probability is assigned according to one or more of how frequently a text fragment has been used within a context and how recently a text fragment has been used within a context.
[0016] Embodiments of the invention further provide a system for ranking textual candidates of controlled natural languages, the textual candidates forming portions of a first context, a plurality of contexts being arranged in a hierarchy, the system comprising: a processing device for receiving text fragments forming the textual candidates; and a prediction ranker module for assigning a probability to the text fragments in the first context in which it is desired to rank the textual candidate, for assigning a probability to the text fragments in contexts in the hierarchy other than the first context, and for calculating, for each of the textual candidates, a weighted sum of the assigned probability in the first context and the assigned probabilities in contexts in the hierarchy other than the first context, wherein the weighting applied to each of the assigned probabilities is inversely related to the hierarchical distance between the first context and contexts in the hierarchy other than the first context.
[0017] Further embodiments of the invention provide a computer program product for ranking textual candidates of controlled natural languages, the textual candidates forming portions of a first context, a plurality of contexts being arranged in a hierarchy, the computer program product comprising: a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code adapted to perform the method described above when said program is run on a computer.
[0018] Embodiments of the invention provide the advantage that the ranking is done entirely by the method and system without intervention of an expert. Another advantage is that the ranking dynamically takes into account any modifications made to documents. A further advantage is that hierarchically structured systems storing document or text fragments tend to be naturally organised by topics. They provide the appropriate information to compute a meaningful ranking. A yet further advantage is that assigning a probability to textual candidates based on where similar phrases have been used does not require in-depth knowledge of the language. Consequently the approach both relatively simple to implement and works for any C Ls.
BRIEF DESCRIPTION OF THE DRAWINGS
[0019] Preferred embodiments of the present invention will now be described in more detail, by way of example only, with reference to the accompanying drawings, in which:
Figure 1 shows an embodiment of a system for ranking textual candidates;
Figure 2 shows an embodiment of a method of ranking textual candidates;
Figure 3 shows a first embodiment having a rule project with local rankings and the global ranking of predictions within the "Upgrade" package;
Figure 4 shows a table representing local score of textual candidates on a per package basis for use in the embodiment of figure 3;
Figure 5 shows a table representing weights to apply when propagating phrases across packages for use in the embodiment of figure 3; Figure 6 shows the process of how to update the local ranking of predictions;
Figure 7 shows the process of how the system updates weights of entities;
Figure 8 shows a second embodiment having a document with local rankings and the global ranking of predictions within chapter 2, paragraph 2 of the document;
Figure 9 shows a table representing local score of textual candidates on a per paragraph basis for use in the embodiment of figure 8; and
Figure 10 shows a table representing weights to apply when propagating phrases across paragraphs for use in the embodiment of figure 8.
DETAILED DESCRIPTION OF THE EMBODIMENTS
[0020] Embodiments of the present invention will be described hereinafter with reference to the implementation of the invention in a Business Rule Management System (BRMS). A BRMS is a software system enabling organizational policies and the repeatable decisions associated with those policies, such as claim approvals, pricing calculations and eligibility determinations to be defined, deployed, monitored and maintained separately from application code. Business rules include policies, requirements and conditional statements that are used to determine the tactical actions that take place in applications and systems. However, the practical applications of embodiments of the present invention are not limited to this particular described environment. Embodiments of the present invention can find utility in any structured systems using controlled natural languages.
[0021] Embodiments of the present invention also relate to a method for suggesting relevant completions for an input text that is compliant to a CNL and typed to a text-oriented application running at a user computer. More specifically, the method and system provide a hierarchical-driven ranking mechanism for text completions.
[0022] Editing tools provide a way to organise or structure multiple text fragments. There are many ways to organise text fragments using a CNL within a data processing system. In one embodiment, text fragments may be part of a single document. As an example, each paragraph may represent a separate entity and the document layout defines the structure and the relations between entities. In another embodiment, a system may store entities in a file system. How folders and files are organised defines the hierarchy and thus the relations between entities. Another embodiment consists of identifying the relations between entities by leveraging the grammar of the CNL. Each grammar construct helps define the structure and the relations across text fragments.
[0023] Referring to figure 1, which shows an embodiment of a system 150 for ranking textual candidates. Inputs are received by a processing device 152 which computes sets of textual candidates, recognises text fragments, detects that new text fragments have been input and analyses changes to the hierarchy. The results from the processing device 152 are then sent to prediction ranker module 154 which ranks the sets of textual candidates and then makes suggestions to a user as to the most likely candidates, updates local scores (Ls) of phrases and updates weights associated with entities. The way in which these actions are achieved will be described below with reference to figures 2 to 10.
[0024] Referring to the example of a BRMS, a text fragment may be a business rule and an entity may be a rule package. A rule package may contain multiple business rules. Rule packages may be nested and thus define a hierarchy. A rule project has a set of top-level rule packages and represents the root of the hierarchy.
[0025] When the system determines that a user operation has changed a text fragment, such as a business rule, the textual candidates may be computed and ranked prior to be exposed to the user. For example, when a business expert changes a business rule, the rule editor may choose to parse the text and display all possible phrases that can be inserted at the current location.
[0026] Referring to figure 2, which shows an embodiment of a method of ranking textual candidates. The method starts at step 102. At step 104, a processing device receives input. The input may be non-textual input, such as, for example, digital ink input, speech input, or other input. With respect to the embodiment described below, the input is assumed to be text input.
[0027] At step 106, the input is recognised to compute a predicted set of textual candidates. The predicted set of textual candidates may be based on respective prefixes and one or more data sources such as the vocabulary. [0028] At step 108, a prediction ranker module assigns a probability to each of the identified textual candidates in the predicted set of textual candidates. Step 108 will be described below in more detail with reference to figures 3 to 5. In one embodiment, the prediction ranker module ranks textual candidates prior to, at step 110, presenting the resulting sorted list to the user. In another embodiment, textual candidates are ranked in order to preselect, within a list sorted alphabetically, the textual candidate that has been identified as the most relevant one in the current authoring context. The method ends at step 112.
[0029] According to an embodiment of the invention, the prediction ranker module first assigns a score to each textual candidate by only considering the current entity being edited. This score is referred as a local score. The person skilled in the art may determine how to compute a relevant initial score for a given prediction according to any known method. In one embodiment, the local score may be how many times a phrase has been used within an entity. Another embodiment may choose to maintain the more frequently and the more recently used phrases within an entity and combined those values to get an initial score on a per-phrase basis.
[0030] Referring to figure 3, shown are four rule packages (Refund 210, Discount 208, Upgrade 206 and Checkout 204) within a rule project 202, a local score (Ls) for each phrase and computed scores (Cs) 212 for predicting a textual candidate within the "Upgrade" 206 package.
[0031] The "Refund" 210 package contains several business rules and describes whether or not a refund should be done for a customer purchase. The business rules involved use three different phrases of the rule project vocabulary, that is, (i) the gross of invoice refund amount; (ii) the service is authorized; and (iii) the gross charge of the service. For each of the different phrase, a local score is computed. In the example of figure 3, the local score is how many times each phrase has been used within the package. As can be seen from figure 3, the local scores for phrases (i), (ii) and (iii) are 21, 13 and 4 respectively. [0032] The "Discount" 208 package is used for computing a discount for a customer. The phrases used by the business rules in the "Discount" 208 package are different from those in the "Refund" 210 package necessary to know if a refund should be done. The business rules involved use three different phrases of the rule project vocabulary, that is, (i) the category of the customer; (ii) the age of the customer; and (iii) the amount of the shopping cart. For each of the different phrases, a local score is computed. In the example of figure 3, the local score is how many times each phrase has been used within the package. As can be seen from figure 3, the local scores for phrases (i), (ii) and (iii) are 24, 10 and 20 respectively.
[0033] The "Upgrade" 206 package represents a package for managing customer categories. A user begins entering a text fragment in an existing business rule. Based on this context, the prediction ranker module generates for the "Upgrade" 206 package, a computed score (Cs) for each phrase 212 of the vocabulary as described below.
[0034] As outlined above, embodiments of the present invention disclose a method and system to rank textual candidates by leveraging the hierarchical structure in which are organised the various text fragments. Consequently, the prediction ranker module may rank textual candidates differently depending on where a prediction is requested. For instance, the computed score (Cs) of the phrase "the age of the customer" may be high in the
"Discount" 208 package, but low in the "Refund" 210 package.
[0035] As further illustrated, several aspects of the hierarchy may be involved, to varying degrees, in the processing of a computed score (Cs). For a given location in the hierarchy and a given predicted phrase of the vocabulary, embodiments of the present invention compute a final score by considering each local score (Ls) of that phrase, weigh that local score (Ls) according to hierarchical characteristics, and take the higher of the values.
[0036] In one embodiment, the distance between a pair of nodes, such as packages 204, 206, 208, 210 may weigh local scores. As an example, "the amount of the shopping cart" has a local score (Ls) of 20 in the "Discount" 208 package. The computed score (Cs) of this phrase in the "Upgrade" 206 package may be 10 if the weight to go from one node to its next sibling node is 0.5. Furthermore, this phrase also has a local score (Ls) of 8. The prediction ranker module may choose to use the maximum of all local scores (Ls) to get the computed score (Cs) of a textual candidate.
[0037] In another embodiment, the structure and content of any document (sections, paragraphs etc.) may provide a logical sequence of text fragments that can be used to rank predictions. For example, phrases intensively used in a paragraph may get a higher score in the immediately following paragraph, than phrases used in a paragraph a few pages later.
[0038] In another embodiment, the number of different locations into which a phrase is used may also influence the computed score. For instance, a phrase that has been used three times in two separate packages may have a higher score than a phrase used six times in one package.
[0039] In another embodiment, the nature of a paragraph can also influence the probability associated with a textual candidate. For example, sentences or terms used in an introduction and in a conclusion of a document may have a higher probability associated with them than sentences and terms found in regular paragraphs of the document. An introduction is often the first section of a document and a conclusion is often the last. In this embodiment, the hierarchy of the document is leveraged to finely adjust probabilities. The introduction and conclusion are further apart in distance, but have the hierarchical relationship mentioned earlier in the paragraph. The nature of a paragraph can also influence the system. The prediction ranker module 154 can provide for special treatment for paragraphs at predefined locations within the document.
[0040] Figure 4 shows a table representing local score of textual candidates on a per package basis. The rows in the table represent the phrases used. The columns in the table represent the local scores (Ls) of the phrases in the package identified at the top of each column. The row and the column including "ellipsis" characters (...) indicates that other phrases and other packages have been omitted from the table for brevity and clarity. In the table, the phrase "the gross amount of invoice refund amount" has a local score (Ls) in the "Refund" 210 package of 21 and the phrase "amount of the shopping cart" has a local score (Ls) in the "Upgrade" 208 package of 8. The other exemplary local scores (Ls) for other phrases and other packages can be seen in the table.
[0041] Figure 5 represents the weights to use when propagating a phrase from one package to another. The columns in the table represent the source package of the local scores (Ls) of the phrases used. In the table of figure 5 they are shown in chronological order, but any other order may be used. The rows in the table represent the target packages of the local scores (Ls) of the phrases used. The row and the column including "ellipsis" characters (...) indicates that other packages have been omitted from the table for brevity and clarity. In the table, the weight to use when propagating a phrase from the source "Discount" 208 package to the target "Upgrade" 206 package can be seen to be equal to 0.5. Applying the weight to the example of figure 3, the local score (Ls) of the phrase "the amount of the shopping cart" in the source "Discount" 208 package is weighted by a factor of 0.5 to produce a computed score (Cs) in the target "Upgrade" 206 package for that phrase from the source package of 10, that is 0.5 times 20. Also in the table, the weight to use when propagating a phrase in the opposite direction from the source "Upgrade" 206 package to the target "Discount" 208 package can be seen to be equal to 0.4, that is different from the weighting used when propagating in the opposite direction. Applying the weight to the example of figure 3, the local score (Ls) of the phrase "the amount of the shopping cart" in the source "Upgrade" 206 package is weighted by a factor of 0.4 to produce a computed score (Cs) in the target "Discount" 208 package for that phrase from the source package of 3.2, that is 0.4 times 8. The other exemplary weights for other combinations of source and target packages can be seen in the table.
[0042] Various implementations can be realised but as an example, a function returning the computed score of textual candidates may be:
Cs(p, e) = max ( { w(x, e) * Ls(x, p) : x = 1, n } )
where:
Cs(p, e) : a function giving the computed score of prediction 'p' within entity 'e'
w(x, e) : a function returning the weight to apply for predictions propagating from entity Ϋ to target entity 'e' as illustrated in Figure 5 Ls(x, p) : a function returning the local score within entity Y of prediction 'p' as illustrated in Figure 4
n : the total number of entities (e.g. rule packages)
[0043] In the illustrated example, the ranking of textual candidates is obtained by combining, for each prediction, the local score (Ls) within a package, and the weight associated with propagation from this package to the "Upgrade" 206 package. For example, the phrase "the category of the customer" has a computed score (Cs) of 12 because this phrase has a local score (Ls) of 24 in the "Discount" 208 package and the weight to go from the "Discount" 208 package to the "Upgrade" 206 package is 0.5.
[0044] Another example is the computed score (Cs) within the "Upgrade" 206 package of the phrase "the amount of the shopping cart". The computed score is 10 because this is the maximum of 8 (the local score (Ls) within the "Upgrade" 206 package) and 20 (the local score within the "Discount" 208 package) multiplied by 0.5 (the weight from Figure 5 to go from the source "Discount" 208 package to the target "Upgrade" 206 package).
[0045] In the example of figure 3, six textual candidates are shown. The prediction with the highest computed score (Cs) is "the category of the customer" with a computed score (Cs) of 12. This computed score (Cs) is obtained from the local score (Ls) of 24 in the "Discount" 208 package and the weighting of 0.5 applied from the table of figure 5 for propagation from the Discount" 208 package to the "Upgrade" 206 package. The prediction for the phrase "the amount of the shopping cart" has a computed score (Cs) of 10, which is obtained from higher of (i) the local score (Ls) of 8 in the "Upgrade" 206 package and (ii) the local score (Ls) of 20 in the "Discount" 208 package and the weighting of 0.5 applied from the table of figure 5 for propagation from the Discount" 208 package to the "Upgrade" 206 package, giving a computed score (Cs) of 10.
[0046] Even though the phrase "the amount of the shopping cart" has a local score (Ls) of 8 and the phrase "the category of the customer" has a local score (Ls) of 0, the phrase "the category of the customer" has a higher computed score (Cs) because it is used more frequently in a package that is closely related in the hierarchy. This is in spite of the fact that the phrase "the category of the customer" has not been previously used in the "Upgrade" 206 package.
[0047] Once the prediction ranker engine has computed the final ranking based on the computed scores (Cs), textual candidates can be displayed to the user. It should however, be realised that any other appropriate action can be taken. The global ranking of predictions represents the likely degree of relevance to the user's interests of each phrase at a given location in the hierarchy.
[0048] As the user continues to type, the system may recognise an operation that changes either the local score (Ls) or the weights shown in figure 5 associated to entities. Figure 6 shows how the local ranking of textual candidates is updated.
[0049] Referring to figure 6, the method starts at step 502. At step 504, the processing device receives input. As described above with reference to figure 2, the input may be textual or non-textual input. At step 506, a text fragment is recognised and it is detected that a new phrase has been entered. In a typical embodiment, step 506 may involve a parser dedicated to the controlled natural language currently in use. At step 508, the prediction ranker module may be notified and one or more local scores of each phrase, within one or more packages, may be recalculated. The method ends at step 510.
[0050] When the user performs an operation that modifies the hierarchy of how text fragments are organised, the system may need to check whether or not the weights need some adjustments. Figure 7 shows how the weights associated with each entity storing text fragments are updated.
[0051] Referring to figure 7, the method starts at step 602. At step 604, an application may receive an event that indicates that the hierarchy has been changed. For example, the application may be notified when a rule package has been added or when a new section has been inserted into a document. At step 606, the application may be configured to determine what kind of operation has been performed. For example, this step may be particularly useful in identifying what part of the hierarchy needs to be updated. At step 608, the prediction ranker module may update the weight associated with each entity. For example, when inserting a new rule package, the distance between two entities may get bigger and the respective weights associated with each of the entities may need to be adjusted, the method ends at step 610.
[0052] Referring to figure 8, shown are three paragraphs (Chapter 1, Paragraph 2 810, Chapter 2, Paragraph 1 808, Chapter 2, Paragraph 2 806 within a document 802 having Chapters 814, 804, a local score (Ls) for each phrase and computed scores (Cs) 812 for predicting a textual candidate within Chapter 2, Paragraph 2 806.
[0053] Chapter 1, Paragraph 2 810 contains several phrases. The paragraphs involved use three different phrases of the rule project vocabulary, that is, (i) "anger and rage"; (ii) "climate change"; and (iii) "the great recession". For each of the different phrases, a local score (Ls) is computed. In the example of figure 8, the local score (Ls) is how many times each phrase has been used within the paragraph. As can be seen from figure 8, the local scores (Ls) for phrases (i), (ii) and (iii) are 5, 8 and 4 respectively.
[0054] Chapter 2, Paragraph 1 808 also contains several phrases. The phrases used in Chapter 2, Paragraph 1 808 are different from those in the Chapter 1, Paragraph 2 810. The paragraphs involved use three different phrases of the rule project vocabulary, that is, (i) "linear no threshold"; (ii) "how's that working out for you?"; and (iii) "make no mistake about it". For each of the different phrases, a local score (Ls) is computed. In the example of figure 8, the local score (Ls) is how many times each phrase has been used within the paragraph. As can be seen from figure 8, the local scores (Ls) for phrases (i), (ii) and (iii) are 2, 6 and 7 respectively.
[0055] Chapter 2, Paragraph 2 806 also contains several phrases. A user begins entering a text fragment in the paragraph. Based on this context, the prediction ranker module 154 generates for Chapter 2, Paragraph 2 806, a computed score (Cs) for each phrase 812 of the vocabulary as described below.
[0056] As outlined above, embodiments of the present invention disclose a method and system to rank textual candidates by leveraging the hierarchical structure in which are organised the various text fragments. Consequently, the prediction ranker module 154 may rank textual candidates differently depending on where a prediction is requested. For instance, the computed score (Cs) of the phrase "how's that working out for you?" may be high in Chapter 2, Paragraph 1 808, but low in Chapter 1, Paragraph 2 810.
[0057] As further illustrated, several aspects of the hierarchy may be involved, to varying degrees, in the processing of a computed score (Cs). For a given location in the hierarchy and a given predicted phrase of the vocabulary, embodiments of the present invention compute a final score by considering each local score (Ls) of that phrase, weigh that local score (Ls) according to hierarchical characteristics, and take the higher of the values.
[0058] In one embodiment, the distance between a pair of nodes, such as paragraphs 806, 808, 810 may weigh local scores. As an example, "the great recession" has a local score (Ls) of 4 in Chapter 1, Paragraph 2 810. The computed score (Cs) of this phrase in Chapter 2, Paragraph 2 806 may be 2 if the weight to go from one node to its next sibling node is 0.5. However, this phrase also has a local score (Ls) of 3. The prediction ranker module may choose to use the maximum of all local scores (Ls) to get the computed score (Cs) of a textual candidate.
[0059] In another embodiment, the structure and content of any document (sections, paragraphs etc.) may provide a logical sequence of text fragments that can be used to rank predictions. For example, phrases intensively used in a paragraph may get a higher score in the immediately following paragraph, than phrases used in a paragraph a few pages later.
[0060] In another embodiment, the number of different locations into which a phrase is used may also influence the computed score. For instance, a phrase that has been used three times in two separate paragraphs may have a higher score than a phrase used six times in one paragraph.
[0061] Figure 9 shows a table representing local score of textual candidates on a per paragraph basis. In the table, the phrase "the great recession" has a local score (Ls) in the Chapter 1, Paragraph 2 810 of 4 and the phrase "make no mistake about it" has a local score (Ls) in Chapter 2, Paragraph 1 808 of 7. The other exemplary local scores (Ls) for other phrases and other paragraphs can be seen in the table.
[0062] Figure 10 represents the weights to use when propagating a phrase from one paragraph to another. In the table, the weight to use when propagating a phrase from the source Chapter 2, Paragraph 1 808 to the target Chapter 2, Paragraph 2 806 can be seen to be equal to 0.25. Applying the weight to the example of figure 8, the local score (Ls) of the phrase "make no mistake about it" in the source Chapter 2, Paragraph 1 808 is weighted by a factor of 0.25 to produce a computed score (Cs) in the target Chapter 2, Paragraph 2 806 for that phrase from the source paragraph of 1.75, that is 0.25 times 7. Also in the table, the weight to use when propagating a phrase in the opposite direction from the source Chapter 2, Paragraph 2 806 paragraph to the target Chapter 2, Paragraph 1 808 can be seen to be equal to 0.2, that is different from the weighting used when propagating in the opposite direction. Applying the weight to the example of figure 8, the local score (Ls) of the phrase "make no mistake about it" in the source Chapter 2, Paragraph 2 806 is weighted by a factor of 0.2 to produce a computed score (Cs) in the target Chapter 2, Paragraph 1 808 for that phrase from the source paragraph of 0, that is 0.2 times 0. The other exemplary weights for other combinations of source and target paragraphs can be seen in the table.
[0063] Embodiments of the invention can take the form of a computer program accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer usable or computer readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus or device.
[0064] The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk read only memory (CD-ROM), compact disk read/write (CD-RW), and DVD.

Claims

1. A method of ranking textual candidates of controlled natural languages, the textual candidates forming portions of a first context, a plurality of contexts being arranged in a hierarchy, the method comprising the steps of:
assigning a probability to text fragments forming the textual candidates in the first context in which it is desired to rank the textual candidate;
assigning a probability to text fragments forming the textual candidates in contexts in the hierarchy other than the first context;
calculating, for each of the textual candidates, a weighted sum of the assigned probability in the first context and the assigned probabilities in contexts in the hierarchy other than the first context, wherein the weighting applied to each of the assigned probabilities is inversely related to the hierarchical distance between the first context and contexts in the hierarchy other than the first context.
2. A method as claimed in claim 1, wherein the step of calculating takes the higher of (i) the weighted assigned probability in the first context and (ii) the weighted assigned probabilities in contexts in the hierarchy other than the first context.
3. A method as claimed in claim 1, wherein the weighting applied in the step of calculating in a first context to an assigned probability in a context in the hierarchy other than the first context differs from the weighting applied in a step of calculating in a context in the hierarchy other than the first context to an assigned probability in a first context.
4. A method as claimed in claim 1, wherein said contexts are paragraphs within the hierarchy of a document.
5. A method as claimed in claim 1, wherein contexts are business rule packages within the hierarchy of a business rule project.
6. A method as claimed in claim 1, further comprising the steps of:
receiving textual or non-textual input; and computing a set of textual candidates.
7. A method as claimed in claim 1, wherein a probability is assigned according to one or more of how frequently a text fragment has been used within a context and how recently a text fragment has been used within a context.
8. A system for ranking textual candidates of controlled natural languages, the textual candidates forming portions of a first context, a plurality of contexts being arranged in a hierarchy, the system comprising:
a processing device for receiving text fragments forming the textual candidates; and
a prediction ranker module for assigning a probability to the text fragments in the first context in which it is desired to rank the textual candidate, for assigning a probability to the text fragments in contexts in the hierarchy other than the first context, and for calculating, for each of the textual candidates, a weighted sum of the assigned probability in the first context and the assigned probabilities in contexts in the hierarchy other than the first context, wherein the weighting applied to each of the assigned probabilities is inversely related to the hierarchical distance between the first context and contexts in the hierarchy other than the first context.
9. A system as claimed in claim 8, wherein the prediction ranker module calculates the weighted sum by taking the higher of (i) the weighted assigned probability in the first context and (ii) the weighted assigned probabilities in contexts in the hierarchy other than the first context.
10. A system as claimed in claim 8, wherein the weighting applied in the step of calculating in a first context to an assigned probability in a context in the hierarchy other than the first context differs from the weighting applied in a step of calculating in a context in the hierarchy other than the first context to an assigned probability in a first context.
11. A system as claimed in claim 8, wherein said contexts are paragraphs within the hierarchy of a document.
12. A system as claimed in claim 8, wherein contexts are business rule packages within the hierarchy of a business rule project.
13. A system as claimed in claim 8, wherein:
the processing device receives textual or non-textual input; and
the prediction ranker module computes a set of textual candidates.
14. A system as claimed in claim 8, wherein a probability is assigned according to one or more of how frequently a text fragment has been used within a context and how recently a text fragment has been used within a context.
15. A computer program product for ranking textual candidates of controlled natural languages, the textual candidates forming portions of a first context, a plurality of contexts being arranged in a hierarchy, the computer program product comprising:
a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code adapted to perform the method of any one of claim 1 to claim 7 when said program is run on a computer.
16. A method substantially as hereinbefore described, with reference to figures 1 to 10 of the accompanying drawings.
PCT/IB2014/065838 2013-11-13 2014-11-06 Ranking prediction candidates of controlled natural languages or business rules depending on document hierarchy WO2015071804A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB1319983.1A GB2520265A (en) 2013-11-13 2013-11-13 Ranking Textual Candidates of controlled natural languages
GB1319983.1 2013-11-13

Publications (1)

Publication Number Publication Date
WO2015071804A1 true WO2015071804A1 (en) 2015-05-21

Family

ID=49818526

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2014/065838 WO2015071804A1 (en) 2013-11-13 2014-11-06 Ranking prediction candidates of controlled natural languages or business rules depending on document hierarchy

Country Status (2)

Country Link
GB (1) GB2520265A (en)
WO (1) WO2015071804A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018111702A1 (en) * 2016-12-15 2018-06-21 Microsoft Technology Licensing, Llc Word order suggestion taking into account frequency and formatting information

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPWO2018190128A1 (en) * 2017-04-11 2020-02-27 ソニー株式会社 Information processing apparatus and information processing method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6377965B1 (en) * 1997-11-07 2002-04-23 Microsoft Corporation Automatic word completion system for partially entered data
US20040268236A1 (en) * 2003-06-27 2004-12-30 Xerox Corporation System and method for structured document authoring
US7657423B1 (en) * 2003-10-31 2010-02-02 Google Inc. Automatic completion of fragments of text

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8117540B2 (en) * 2005-05-18 2012-02-14 Neuer Wall Treuhand Gmbh Method and device incorporating improved text input mechanism
EP2109046A1 (en) * 2008-04-07 2009-10-14 ExB Asset Management GmbH Predictive text input system and method involving two concurrent ranking means
GB0905457D0 (en) * 2009-03-30 2009-05-13 Touchtype Ltd System and method for inputting text into electronic devices
GB201003628D0 (en) * 2010-03-04 2010-04-21 Touchtype Ltd System and method for inputting text into electronic devices
US8738356B2 (en) * 2011-05-18 2014-05-27 Microsoft Corp. Universal text input

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6377965B1 (en) * 1997-11-07 2002-04-23 Microsoft Corporation Automatic word completion system for partially entered data
US20040268236A1 (en) * 2003-06-27 2004-12-30 Xerox Corporation System and method for structured document authoring
US7657423B1 (en) * 2003-10-31 2010-02-02 Google Inc. Automatic completion of fragments of text

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CANDACE L BULLWINKLE: "PICNICS, KITTENS AND WIGS: USING SCENARIOS FOR THE SENTENCE COMPLETION TASK", IJCAI 1975, vol. 386, 1 January 1975 (1975-01-01), pages 383, XP055176509 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018111702A1 (en) * 2016-12-15 2018-06-21 Microsoft Technology Licensing, Llc Word order suggestion taking into account frequency and formatting information

Also Published As

Publication number Publication date
GB201319983D0 (en) 2013-12-25
GB2520265A (en) 2015-05-20

Similar Documents

Publication Publication Date Title
US11720572B2 (en) Method and system for content recommendation
Gambhir et al. Recent automatic text summarization techniques: a survey
US9519634B2 (en) Systems and methods for determining lexical associations among words in a corpus
US8346795B2 (en) System and method for guiding entity-based searching
JP5243167B2 (en) Information retrieval system
EP2542951B1 (en) System and method for inputting text into electronic devices
CN1871603B (en) System and method for processing a query
Verberne et al. Evaluation and analysis of term scoring methods for term extraction
US20160140123A1 (en) Generating a query statement based on unstructured input
US20110282858A1 (en) Hierarchical Content Classification Into Deep Taxonomies
JPH1173417A (en) Method for identifying text category
KR20080066946A (en) Adaptive task framework
US20110055228A1 (en) Cooccurrence dictionary creating system, scoring system, cooccurrence dictionary creating method, scoring method, and program thereof
WO2020161505A1 (en) Improved method and system for text based searching
Patel et al. Extractive Based Automatic Text Summarization.
Roul et al. A new automatic multi-document text summarization using topic modeling
Midhunchakkaravarthy et al. Feature fatigue analysis of product usability using Hybrid ant colony optimization with artificial bee colony approach
US20220207087A1 (en) Optimistic facet set selection for dynamic faceted search
WO2015071804A1 (en) Ranking prediction candidates of controlled natural languages or business rules depending on document hierarchy
KR20070039771A (en) Method and system for recommending query based search index
Keith et al. Performance impact of stop lists and morphological decomposition on word–word corpus-based semantic space models
Neri-Mendoza et al. Unsupervised extractive multi-document text summarization using a genetic algorithm
JP2011076194A (en) System, method and program for creating topic concrete expression dictionary
Fernandes et al. Lightweight context-based web-service composition model for mobile devices
Freitas et al. Approximate and selective reasoning on knowledge graphs: A distributional semantics approach

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14796910

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14796910

Country of ref document: EP

Kind code of ref document: A1