Oracle Text Reference Release 9.2 Part Number A96518-01 |
|
This chapter describes operator precedence and provides description, syntax, and examples for every CONTAINS operator. The following topics are covered:
Operator precedence determines the order in which the components of a query expression are evaluated. Text query operators can be divided into two sets of operators that have their own order of evaluation. These two groups are described below as Group 1 and Group 2.
In all cases, query expressions are evaluated in order from left to right according to the precedence of their operators. Operators with higher precedence are applied first. Operators of equal precedence are applied in order of their appearance in the expression from left to right.
Within query expressions, the Group 1 operators have the following order of evaluation from highest precedence to lowest:
Within query expressions, the Group 2 operators have the following order of evaluation from highest to lowest:
Other operators not listed under Group 1 or Group 2 are procedural. These operators have no sense of precedence attached to them. They include the SQE and thesaurus operators.
In the first example, because AND
has a higher precedence than OR
, the query returns all documents that contain w1 and all documents that contain both w2 and w3.
In the second example, the query returns all documents that contain both w1 and w2 and all documents that contain w3.
In the third example, the fuzzy operator is first applied to w1, then the AND
operator is applied to arguments w3 and w4, then the OR
operator is applied to term w2 and the results of the AND
operation, and finally, the score from the fuzzy operation on w1 is added to the score from the OR operation.
The fourth example shows that the equivalence operator has higher precedence than the AND
operator.
The fifth example shows that the AND
operator has lower precedence than the WITHIN
operator.
Precedence is altered by grouping characters as follows:
In all languages, an ABOUT
query increases the number of relevant documents returned from the same query without this operator. Oracle scores results for an ABOUT
query with the most relevant document receiving the highest score.
In English and French, use the ABOUT
operator to query on concepts. The system looks up concept information in the theme component of the index.
Note: You need not have a theme component in the index to issue ABOUT queries in English. However, having a theme component in the index yields the best results for |
Oracle retrieves documents that contain concepts that are related to your query word or phrase. For example, if you issue an ABOUT
query on California, the system might return documents that contain the terms Los Angeles and San Francisco, which are cities in California.The document need not contain the term California to be returned in this ABOUT
query.
The word or phrase specified in your ABOUT
query need not exactly match the themes stored in the index. Oracle normalizes the word or phrase before performing lookup in the index.
You can use the ABOUT
operator with the CONTAINS
and CATSEARCH
SQL operators.
The ABOUT
operator uses the supplied knowledge base in English and French to interpret the phrase you enter. Your ABOUT
query therefore is limited to knowing and interpreting the concepts in the knowledge base.
You can improve the results of your ABOUT
queries by adding your application-specific terminology to the knowledge base.
ABOUT
queries give the best results when your query is formulated with proper case. This is because the normalization of your query is based on the knowledge catalog which is case-sensitive.
However, you need not type your query in exact case to obtain results from an ABOUT
query. The system does its best to interpret your query. For example, if you enter a query of CISCO and the system does not find this in the knowledge catalog, the system might use Cisco as a related concept for look-up.
ABOUT
query cannot be more than 4000 characters.WITHIN
operator with ABOUT
operator like 'ABOUT (xyz) WITHIN abc'.ABOUT
with any operator involving offset information, such as NEAR
or WITHIN
.To search for documents that are about soccer, use the following syntax:
'about(soccer)'
You can further refine the query to include documents about soccer rules in international competition by entering the phrase as the query term:
'about(soccer rules in international competition)'
In this English example, Oracle returns all documents that have themes of soccer, rules, or international competition.
In terms of scoring, documents which have all three themes will generally score higher than documents that have only one or two of the themes.
You can also query on unstructured phrases, such as the following:
'about(japanese banking investments in indonesia)'
You can use other operators, such as AND
or NOT
, to combine ABOUT
queries with word queries.
For example, you can issue the following combined ABOUT
and word query:
'about(dogs) and cat'
You can combine an ABOUT
query with another ABOUT
query as follows:
'about(dogs) not about(labradors)'
You can issue ABOUT queries with CATSEARCH using the query template method with grammar set to CONTEXT as follows:
select pk||' ==> '||text from test where catsearch(text, '<query> <textquery grammar="context"> about(California) </textquery> <score datatype="integer"/> </query>','')>0 order by pk;
Use the ACCUM
operator to search for documents that contain at least one occurrence of any of the query terms. The accumulate operator ranks documents according to the total term weight of a document.
The following example returns documents that contain either soccer, Brazil, or cup and assigns the highest scores to the documents that contain all three terms:
'soccer, Brazil, cup'
The following example also returns documents that contain either soccer, Brazil, or cup. However, the weight operator ensures that documents with Brazil score higher than documents that contain only soccer and cup.
'soccer, Brazil*3, cup'
ACCUM
scores documents based on two criteria:
Term weight refers to the weight you place on a query term. A query such as x,y,z has term weights of 1 for each term. A query of x, 3*y, z, has term weights of 1, 3, and 1 for the individual terms.
Accumulate scoring guarantees that if a document A matches p terms with a total term weight of m, and document B matches q terms with a total term weight of m+1, document B is guaranteed to have a higher relevance score than document A, regardless of the numbers p and q.
If two documents have the same weight M, the higher relevance score goes to the document with the higher weighted average term score.
This following table illustrates accumulate scoring:
Each row in the table shows the score for an accumulate query. The first four rows show the scores for query x,y,z for documents A, B, C, D. The next two rows show the scores for query x, y*3,z for documents E and F. Assume that x, y and z stand for three different words. The query for document E and F has a weight of 3 on the second query term to arbitrarily make it the most important query term.
The total document term weight is shown for each document. For example, document A has a matching weight of one since only one query term matches the document. Similarly document C has a weight of 3 since all query terms with weight 1 match the document.
The table shows that documents that have higher query term weights are always scored higher than those that contain lower query term weights. For example, document C always scores higher than documents A, B, and D, since document C has the highest query term weight. Similarly, document F scores higher than document E, since F has a higher matching weight.
For documents that have equal term weights, such as document B and D, the higher score goes to the document with the higher weighted average term score, which is document D.
Use the AND
operator to search for documents that contain at least one occurrence of each of the query terms.
Syntax | Description |
---|---|
term1 and term2 |
Returns documents that contain term1 and term2. Returns the minimum score of its operands. All query terms must occur; lower score taken. |
To obtain all the documents that contain the terms blue and black and red, issue the following query:
'blue & black & red'
In an AND
query, the score returned is the score of the lowest query term. In this example, if the three individual scores for the terms blue, black, and red is 10, 20 and 30 within a document, the document scores 10.
Use the broader term operators (BT
, BTG
, BTP
, BTI
) to expand a query to include the term that has been defined in a thesaurus as the broader or higher level term for a specified term. They can also expand the query to include the broader term for the broader term and the broader term for that broader term, and so on up through the thesaurus hierarchy.
Specify the operand for the broader term operator. Oracle expands term to include the broader term entries defined for the term in the thesaurus specified by thes. For example, if you specify BTG(dog), the expansion includes only those terms that are defined as broader term generic for dog. You cannot specify expansion operators in the term
argument.
The number of broader terms included in the expansion is determined by the value for level.
Specify a qualifier for term, if term is a homograph (word or phrase with multiple meanings, but the same spelling) that appears in two or more nodes in the same hierarchy branch of thes.
If a qualifier is not specified for a homograph in a broader term query, the query expands to include the broader terms of all the homographic terms.
Specify the number of levels traversed in the thesaurus hierarchy to return the broader terms for the specified term. For example, a level of 1 in a BT query returns the broader term entry, if one exists, for the specified term. A level of 2 returns the broader term entry for the specified term, as well as the broader term entry, if one exists, for the broader term.
The level argument is optional and has a default value of one (1). Zero or negative values for the level argument return only the original query term.
Specify the name of the thesaurus used to return the expansions for the specified term. The thes argument is optional and has a default value of DEFAULT
. A thesaurus named DEFAULT
must exist in the thesaurus tables if you use this default value.
The following query returns all documents that contain the term tutorial or the BT
term defined for tutorial in the DEFAULT
thesaurus:
'BT(tutorial)'
When you specify a thesaurus name, you must also specify level as in:
'BT(tutorial, 2, mythes)'
If machine is a broader term for crane (building equipment) and bird is a broader term for crane (waterfowl) and no qualifier is specified for a broader term query, the query
BT(crane)
expands to:
'{crane} or {machine} or {bird}'
If waterfowl is specified as a qualifier for crane in a broader term query, the query
BT(crane{(waterfowl)})
expands to the query:
'{crane} or {bird}'
Note: When specifying a qualifier in a broader or narrower term query, the qualifier and its notation (parentheses) must be escaped, as is shown in this example. |
You can browse a thesaurus using procedures in the CTX_THES
package.
See Also:
For more information on browsing the broader terms in your thesaurus, see CTX_THES.BT in Chapter 12, "CTX_THES Package". |
Use the EQUIV
operator to specify an acceptable substitution for a word in a query.
Syntax | Description |
---|---|
term1 equiv term2 |
Specifies that term2 is an acceptable substitution for term1. Score calculated as the sum of all occurrences of both terms. |
The following example returns all documents that contain either the phrase alsatians are big dogs or labradors are big dogs:
'labradors=alsatians are big dogs'
The EQUIV
operator has higher precedence than all other operators except the expansion operators (fuzzy, soundex, stem).
Use the fuzzy operator to expand queries to include words that are spelled similarly to the specified term. This type of expansion is helpful for finding more accurate results when there are frequent misspellings in your document set.
The new fuzzy syntax enables you to rank the result set so that documents that contain words with high similarity to the query word are scored higher than documents with lower similarity. You can also limit the number of expanded terms.
Unlike stem expansion, the number of words generated by a fuzzy expansion depends on what is in the index. Results can vary significantly according to the contents of the index.
Oracle Text supports fuzzy definitions for English, German, Italian, Dutch, Spanish, and OCR.
If the fuzzy expansion returns a stopword, the stopword is not included in the query or highlighted by CTX_DOC.HIGHLIGHT
or CTX_DOC.MARKUP
.
If base-letter conversion is enabled for a text column and the query expression contains a fuzzy operator, Oracle operates on the base-letter form of the query.
fuzzy(term, score, numresults, weight)
Consider the CONTAINS
query:
...CONTAINS(TEXT, 'fuzzy(government, 70, 6, weight)', 1) > 0;
This query expands to the first six fuzzy variations of government in the index that have a similarity score over 70.
In addition, documents in the result set are weighted according to their similarity to government. Documents containing words most similar to government receive the highest score.
You can skip unnecessary parameters using the appropriate number of commas. For example:
'fuzzy(government,,,weight)'
The old fuzzy syntax from previous releases is still supported. This syntax is as follows:
Parameter | Description |
---|---|
?term |
Expands term to include all terms with similar spellings as the specified term. |
Use this operator to find all XML
documents that contain a specified section path. You can also use this operator to do section equality testing.
Your index must be created with the PATH_SECTION_GROUP
for this operator to work.
The query
HASPATH(A/B/C)
finds and returns a score of 100 for the document
<A><B><C>dog</C></B></A>
without the query having to reference dog at all.
The query
dog INPATH A
finds
<A>dog</A>
but it also finds
<A>dog park</A>
To limit the query to the term dog and nothing else, you can use a section equality test with the HASPATH
operator. For example,
HASPATH(A="dog")
finds and returns a score of 100 only for the first document, and not the second.
Because of how XML section data is recorded, false matches might occur with XML sections that are completely empty as follows:
<A><B><C></C></B><D><E></E></D></A>
A query of HASPATH(A/B/E)
or HASPATH(A/D/C)
falsely matches this document. This type of false matching can be avoided by inserting text between empty tags.
Use this operator to do path searching in XML documents. This operator is like the WITHIN
operator except that the right-hand side is a parentheses enclosed path, rather than a single section name.
Your index must be created with the PATH_SECTION_GROUP
for the INPATH
operator to work.
The INPATH
operator has the following syntax:
Syntax | Description |
---|---|
term INPATH (A) |
Returns documents that have term within the top-level tags <A> and </A>. The A tag must be a top-level tag, which is the document-type tag. |
Syntax | Description |
---|---|
term INPATH (//A) |
Returns documents that have term in the <A> tag at any level. This query is the same as 'term WITHIN A' |
Syntax | Description |
---|---|
term INPATH (A/B) |
Returns documents where term appears in a B element which is a direct child of a top-level A element. For example, a document containing is returned. |
Syntax | Description |
---|---|
term INPATH(A//B) |
Returns documents where term appears in a B element which is some descendant (any level) of a top-level A element. |
Syntax | Description |
---|---|
term INPATH (//A/@B) |
Returns documents where term appears in the B attribute of an A element at any level. Attributes must be bound to a direct parent. |
Syntax | Description |
---|---|
term INPATH (A[B = "value"])) |
Returns documents where term appears in an A tag which has a B tag whose value is value. |
Syntax | Description |
---|---|
term INPATH (A[NOT(B)]) |
Finds documents where term appears in a top-level A element which does not have a B element as an immediate child. |
You can nest the entire INPATH
expression in another INPATH
expression as follows:
(dog INPATH (//A/B/C) INPATH (D)
When you do so, the two INPATH
paths are completely independent. The outer INPATH
path does not change the context node of the inner INPATH
path. For example:
(dog INPATH (A)) INPATH (D)
never finds any documents, because the inner INPATH
is looking for dog within the top-level tag A, and the outer INPATH
constrains that to document with top-level tag D. A document can have only one top-level tag, so this expression never finds any documents.
Tags and attribute names in path searching are case-sensitive. That is,
dog INPATH (A)
finds <A>dog</A>
but does not find <a>dog</a>
. Instead use
dog INPATH (a)
To find all documents that contain the term dog in the top-level tag <A>:
dog INPATH (/A)
or
dog INPATH(A)
To find all documents that contain the term dog in the <A> tag at any level:
dog INPATH(//A)
This query finds the following documents:
<A>dog</A>
and
<C><B><A>dog</A></B></C>
To find all documents that contain the term dog in a B element that is a direct child of a top-level A element:
dog INPATH(A/B)
This query finds the following XML document:
<A><B>My dog is friendly.</B><A>
but does not find:
<C><B>My dog is friendly.</B></C>
You can test the value of tags. For example, the query:
dog INPATH(A[B="dog"])
Finds the following document:
<A><B>dog</B></A>
But does not find:
<A><B>My dog is friendly.</B></A>
You can search the content of attributes. For example, the query:
dog INPATH(//A/@B)
Finds the document
<C><A B="snoop dog"> </A> </C>
You can test the value of attributes. For example, the query
California INPATH (//A[@B = "home address"])
Finds the document:
<A B="home address">San Francisco, California, USA</A>
But does not find:
<A B="work address">San Francisco, California, USA</A>
You can test if a path exists with the HASPATH
operator. For example, the query:
HASPATH(A/B/C)
finds and returns a score of 100 for the document
<A><B><C>dog</C></B></A>
without the query having to reference dog at all.
The following is an example of an INPATH
equality test.
dog INPATH (A[@B = "foo"])
The following limitations apply for these expressions:
dog INPATH (A[@B= "pot of gold"])
matches the following sections:
<A B="POT OF GOLD">dog</A>
and
<A B="pot of gold">dog</A>
because lexer is case-insensitive by default.
<A B="POT BLACK GOLD">dog</A>
because OF is a default stopword in English and the query matches any word in that position.
<A B="POT_OF_GOLD">dog</A>
because the underscore character is not a join character by default.
Use the MINUS
operator to search for documents that contain one query term and you want the presence of a second query term to cause the document to be ranked lower. The MINUS
operator is useful for lowering the score of documents that contain unwanted noise terms.
Syntax | Description |
---|---|
term1 minus term2 |
Returns documents that contain term1. Calculates score by subtracting the score of term2 from the score of term1. Only documents with positive score are returned. |
Suppose a query on the term cars always returned high scoring documents about Ford cars. You can lower the scoring of the Ford documents by using the expression:
'cars - Ford'
In essence, this expression returns documents that contain the term cars and possibly Ford. However, the score for a returned document is the score of cars minus the score of Ford.
Use the narrower term operators (NT
, NTG
, NTP
, NTI
) to expand a query to include all the terms that have been defined in a thesaurus as the narrower or lower level terms for a specified term. They can also expand the query to include all of the narrower terms for each narrower term, and so on down through the thesaurus hierarchy.
Specify the operand for the narrower term operator. term
is expanded to include the narrower term entries defined for the term in the thesaurus specified by thes
. The number of narrower terms included in the expansion is determined by the value for level
. You cannot specify expansion operators in the term
argument.
Specify a qualifier for term, if term is a homograph (word or phrase with multiple meanings, but the same spelling) that appears in two or more nodes in the same hierarchy branch of thes.
If a qualifier is not specified for a homograph in a narrower term query, the query expands to include all of the narrower terms of all homographic terms.
Specify the number of levels traversed in the thesaurus hierarchy to return the narrower terms for the specified term. For example, a level of 1 in an NT
query returns all the narrower term entries, if any exist, for the specified term. A level of 2 returns all the narrower term entries for the specified term, as well as all the narrower term entries, if any exist, for each narrower term.
The level argument is optional and has a default value of one (1). Zero or negative values for the level argument return only the original query term.
Specify the name of the thesaurus used to return the expansions for the specified term. The thes argument is optional and has a default value of DEFAULT
. A thesaurus named DEFAULT
must exist in the thesaurus tables if you use this default value.
The following query returns all documents that contain either the term cat or any of the NT
terms defined for cat in the DEFAULT
thesaurus:
'NT(cat)'
If you specify a thesaurus name, you must also specify level as in:
'NT(cat, 2, mythes)'
The following query returns all documents that contain either fairy tale or any of the narrower instance terms for fairy tale as defined in the DEFAULT
thesaurus:
'NTI(fairy tale)'
That is, if the terms cinderella and snow white are defined as narrower term instances for fairy tale, Oracle returns documents that contain fairy tale, cinderella, or snow white.
Each hierarchy in a thesaurus represents a distinct, separate branch, corresponding to the four narrower term operators. In a narrower term query, Oracle only expands the query using the branch corresponding to the specified narrower term operator.
You can browse a thesaurus using procedures in the CTX_THES
package.
See Also:
For more information on browsing the narrower terms in your thesaurus, see CTX_THES.NT in Chapter 12, "CTX_THES Package". |
Use the NEAR
operator to return a score based on the proximity of two or more query terms. Oracle returns higher scores for terms closer together and lower scores for terms farther apart in a document.
Syntax |
---|
NEAR((word1, word2,..., wordn) [, max_span [, order]]) |
Specify the terms in the query separated by commas. The query terms can be single words or phrases.
Optionally specify the size of the biggest clump. The default is 100. Oracle returns an error if you specify a number greater than 100.
A clump is the smallest group of words in which all query terms occur. All clumps begin and end with a query term.
For near queries with two terms, max_span is the maximum distance allowed between the two terms. For example, to query on dog and cat where dog is within 6 words of cat, issue the following query:
'near((dog, cat), 6)'
Specify TRUE
for Oracle to search for terms in the order you specify. The default is FALSE
.
For example, to search for the words monday, tuesday, and wednesday in that order with a maximum clump size of 20, issue the following query:
'near((monday, tuesday, wednesday), 20, TRUE)'
Oracle might return different scores for the same document when you use identical query expressions that have the order flag set differently. For example, Oracle might return different scores for the same document when you issue the following queries:
'near((dog, cat), 50, FALSE)' 'near((dog, cat), 50, TRUE)'
The scoring for the NEAR
operator combines frequency of the terms with proximity of terms. For each document that satisfies the query, Oracle returns a score between 1 and 100 that is proportional to the number of clumps in the document and inversely proportional to the average size of the clumps. This means many small clumps in a document result in higher scores, since small clumps imply closeness of terms.
The number of terms in a query also affects score. Queries with many terms, such as seven, generally need fewer clumps in a document to score 100 than do queries with few terms, such as two.
A clump is the smallest group of words in which all query terms occur. All clumps begin and end with a query term. You can define clump size with the max_span parameter as described in this section.
You can use the NEAR
operator with other operators such as AND
and OR
. Scores are calculated in the regular way.
For example, to find all documents that contain the terms tiger, lion, and cheetah where the terms lion and tiger are within 10 words of each other, issue the following query:
'near((lion, tiger), 10) AND cheetah'
The score returned for each document is the lower score of the near operator and the term cheetah.
You can also use the equivalence operator to substitute a single term in a near query:
'near((stock crash, Japan=Korea), 20)'
This query asks for all documents that contain the phrase stock crash within twenty words of Japan or Korea.
You can write near queries using the syntax of previous ConText releases. For example, to find all documents where lion occurs near tiger, you can write:
'lion near tiger'
or with the semi-colon as follows:
'lion;tiger'
This query is equivalent to the following query:
'near((lion, tiger), 100, FALSE)'
Note: Only the syntax of the |
When you use highlighting and your query contains the near operator, all occurrences of all terms in the query that satisfy the proximity requirements are highlighted. Highlighted terms can be single words or phrases.
For example, assume a document contains the following text:
Chocolate and vanilla are my favorite ice cream flavors. I like chocolate served in a waffle cone, and vanilla served in a cup with carmel syrup.
If the query is near((chocolate, vanilla)), 100, FALSE), the following is highlighted:
<<Chocolate>> and <<vanilla>> are my favorite ice cream flavors. I like <<chocolate>> served in a waffle cone, and <<vanilla>> served in a cup with carmel syrup.
However, if the query is near((chocolate, vanilla)), 4, FALSE), only the following is highlighted:
<<Chocolate>> and <<vanilla>> are my favorite ice cream flavors. I like chocolate served in a waffle cone, and vanilla served in a cup with carmel syrup.
See Also:
For more information about the procedures you can use for highlighting, see Chapter 8, "CTX_DOC Package". |
You can use the NEAR
operator with the WITHIN
operator for section searching as follows:
'near((dog, cat), 10) WITHIN Headings'
When evaluating expressions such as these, Oracle looks for clumps that lie entirely within the given section.
In this example, only those clumps that contain dog and cat that lie entirely within the section Headings are counted. That is, if the term dog lies within Headings and the term cat lies five words from dog, but outside of Headings, this pair of words does not satisfy the expression and is not counted.
Use the NOT
operator to search for documents that contain one query term and not another.
Syntax | Description |
---|---|
term1 not term2 |
Returns documents that contain term1 and not term2. |
To obtain the documents that contain the term animals but not dogs, use the following expression:
'animals ~ dogs'
Similarly, to obtain the documents that contain the term transportation but not automobiles or trains, use the following expression:
'transportation not (automobiles or trains)'
Use the OR
operator to search for documents that contain at least one occurrence of any of the query terms.
Syntax | Description |
---|---|
term1 or term2 |
Returns documents that contain term1 or term2. Returns the maximum score of its operands. At least one term must exist; higher score taken. |
For example, to obtain the documents that contain the term cats or the term dogs, use either of the following expressions:
'cats | dogs' 'cats OR dogs'
In an OR
query, the score returned is the score for the highest query term. In the example, if the scores for cats and dogs is 30 and 40 within a document, the document scores 40.
Use the preferred term operator (PT
) to replace a term in a query with the preferred term that has been defined in a thesaurus for the term.
Syntax | Description |
---|---|
PT(term[,thes]) |
Replaces the specified word in a query with the preferred term for term. |
Specify the operand for the preferred term operator. term is replaced by the preferred term defined for the term in the specified thesaurus. However, if no PT
entries are defined for the term, term is not replaced in the query expression and term is the result of the expansion.
You cannot specify expansion operators in the term
argument.
Specify the name of the thesaurus used to return the expansions for the specified term. The thes argument is optional and has a default value of DEFAULT
. As a result, a thesaurus named DEFAULT
must exist in the thesaurus tables before using any of the thesaurus operators.
The term automobile has a preferred term of car in a thesaurus. A PT
query for automobile returns all documents that contain the word car. Documents that contain the word automobile are not returned.
You can browse a thesaurus using procedures in the CTX_THES
package.
See Also:
For more information on browsing the preferred terms in your thesaurus, see CTX_THES.PT in Chapter 12, "CTX_THES Package". |
Use the related term operator (RT
) to expand a query to include all related terms that have been defined in a thesaurus for the term.
Syntax | Description |
---|---|
RT(term[,thes]) |
Expands a query to include all the terms defined in the thesaurus as a related term for term. |
Specify the operand for the related term operator. term is expanded to include term and all the related entries defined for term in thes.
You cannot specify expansion operators in the term
argument.
Specify the name of the thesaurus used to return the expansions for the specified term. The thes argument is optional and has a default value of DEFAULT
. As a result, a thesaurus named DEFAULT
must exist in the thesaurus tables before using any of the thesaurus operators.
The term dog has a related term of wolf. A RT query for dog returns all documents that contain the word dog and wolf.
You can browse a thesaurus using procedures in the CTX_THES
package
See Also:
For more information on browsing the related terms in your thesaurus, see CTX_THES.RT in Chapter 12, "CTX_THES Package". |
Use the soundex (!) operator to expand queries to include words that have similar sounds; that is, words that sound like other words. This function allows comparison of words that are spelled differently, but sound alike in English.
Syntax | Description |
---|---|
!term |
Expands a query to include all terms that sound the same as the specified term (English-language text only). |
SELECT ID, COMMENT FROM EMP_RESUME WHERE CONTAINS (COMMENT, '!SMYTHE') > 0 ; ID COMMENT -- ------------ 23 Smith is a hard worker who..
Soundex works best for languages that use a 7-bit character set, such as English. It can be used, with lesser effectiveness, for languages that use an 8-bit character set, such as many Western European languages.
If you have base-letter conversion specified for a text column and the query expression contains a soundex operator, Oracle operates on the base-letter form of the query.
Use the stem ($) operator to search for terms that have the same linguistic root as the query term.
Stemming performance can be improved by using the index_stems attribute of the BASIC_LEXER preference.
The Oracle Text stemmer, licensed from Xerox Corporation's XSoft Division, supports the following languages: English, French, Spanish, Italian, German, and Dutch.
Syntax | Description |
---|---|
$term |
Expands a query to include all terms having the same stem or root word as the specified term. |
Input | Expands To |
---|---|
$scream |
scream screaming screamed |
$distinguish |
distinguish distinguished distinguishes |
$guitars |
guitars guitar |
$commit |
commit committed |
$cat |
cat cats |
$sing |
sang sung sing |
If stem returns a word designated as a stopword, the stopword is not included in the query or highlighted by CTX_QUERY.HIGHLIGHT
or CTX_QUERY.MARKUP
.
Use the SQE operator to call a stored query expression created with the CTX_QUERY.STORE_SQE
procedure.
Stored query expressions can be used for creating predefined bins for organizing and categorizing documents or to perform iterative queries, in which an initial query is refined using one or more additional queries.
Syntax | Description |
---|---|
SQE(SQE_name) |
Returns the results for the stored query expression SQE_name. |
To create an SQE named disasters, use CTX_QUERY.STORE_SQE
as follows:
begin ctx_query.store_sqe('disasters', 'hurricane or earthquake or blizzard'); end;
This stored query expression returns all documents that contain either hurricane, earthquake or blizzard.
This SQE can then be called within a query expression as follows:
SELECT SCORE(1), docid FROM news WHERE CONTAINS(resume, 'sqe(disasters)', 1)> 0 ORDER BY SCORE(1);
Use the synonym operator (SYN
) to expand a query to include all the terms that have been defined in a thesaurus as synonyms for the specified term.
Syntax | Description |
---|---|
SYN(term[,thes]) |
Expands a query to include all the terms defined in the thesaurus as synonyms for term. |
Specify the operand for the synonym operator. term is expanded to include term and all the synonyms defined for term in thes.
You cannot specify expansion operators in the term
argument.
Specify the name of the thesaurus used to return the expansions for the specified term. The thes argument is optional and has a default value of DEFAULT
. A thesaurus named DEFAULT
must exist in the thesaurus tables if you use this default value.
The following query expression returns all documents that contain the term dog or any of the synonyms defined for dog in the DEFAULT
thesaurus:
'SYN(dog)'
Expansion of compound phrases for a term in a synonym query are returned as AND
conjunctives.
For example, the compound phrase temperature + measurement + instruments is defined in a thesaurus as a synonym for the term thermometer. In a synonym query for thermometer, the query is expanded to:
{thermometer} OR ({temperature}&{measurement}&{instruments})
You can browse your thesaurus using procedures in the CTX_THES
package.
See Also:
For more information on browsing the synonym terms in your thesaurus, see CTX_THES.SYN in Chapter 12, "CTX_THES Package". |
Use the threshold operator (>) in two ways:
The threshold operator at the expression level eliminates documents in the result set that score below a threshold number.
The threshold operator at the query term level selects a document based on how a term scores in the document.
At the expression level, to search for documents that contain relational databases and to return only documents that score greater than 75, use the following expression:
'relational databases > 75'
At the query term level, to select documents that have at least a score of 30 for lion and contain tiger, use the following expression:
'(lion > 30) and tiger'
Use the translation term operator (TR
) to expand a query to include all defined foreign language equivalent terms.
Syntax | Description |
---|---|
TR(term[, lang, [thes]]) |
Expands term to include all the foreign equivalents that are defined for term. |
Specify the operand for the translation term operator. term is expanded to include all the foreign language entries defined for term in thes.You cannot specify expansion operators in the term
argument.
Optionally, specify which foreign language equivalents to return in the expansion. The language you specify must match the language as defined in thes. If you omit this parameter, the system expands to use all defined foreign language terms.
Optionally, specify the name of the thesaurus used to return the expansions for the specified term. The thes argument has a default value of DEFAULT
. As a result, a thesaurus named DEFAULT
must exist in the thesaurus tables before you can use any of the thesaurus operators.
Consider a thesaurus MY_THES
with the following entries for cat:
cat SPANISH: gato FRENCH: chat
To search for all documents that contain cat and the spanish translation of cat, issue the following query:
'tr(cat, spanish, my_thes)'
This query expands to:
'{cat}|{gato}|{chat}'
You can browse a thesaurus using procedures in the CTX_THES
package.
See Also:
For more information on browsing the related terms in your thesaurus, see CTX_THES.TR in Chapter 12, "CTX_THES Package". |
Use the translation term operator (TR
) to expand a query to include all the defined foreign equivalents of the query term, the synonyms of query term, and the foreign equivalents of the synonyms.
Syntax | Description |
---|---|
TRSYN(term[, lang, [thes]]) |
Expands term to include foreign equivalents of term, the synonyms of term, and the foreign equivalents of the synonyms. |
Specify the operand for this operator. term is expanded to include all the foreign language entries and synonyms defined for term in thes.You cannot specify expansion operators in the term
argument.
Optionally, specify which foreign language equivalents to return in the expansion. The language you specify must match the language as defined in thes. If you omit this parameter, the system expands to use all defined foreign language terms.
Optionally, specify the name of the thesaurus used to return the expansions for the specified term. The thes argument has a default value of DEFAULT
. As a result, a thesaurus named DEFAULT
must exist in the thesaurus tables before you can use any of the thesaurus operators.
Consider a thesaurus MY_THES
with the following entries for cat:
cat SPANISH: gato FRENCH: chat SYN lion SPANISH: leon
To search for all documents that contain cat, the spanish equivalent of cat, the synonym of cat, and the spanish equivalent of lion, issue the following query:
'trsyn(cat, spanish, my_thes)'
This query expands to:
'{cat}|{gato}|{lion}|{leon}'
You can browse a thesaurus using procedures in the CTX_THES
package.
See Also:
For more information on browsing the translation and synonym terms in your thesaurus, see CTX_THES.TRSYN in Chapter 12, "CTX_THES Package". |
Use the top term operator (TT
) to replace a term in a query with the top term that has been defined for the term in the standard hierarchy (BT
, NT
) in a thesaurus. Top terms in the generic (BTG
, NTG
), partitive (BTP
, NTP
), and instance (BTI
, NTI
) hierarchies are not returned.
Syntax | Description |
---|---|
TT(term[,thes]) |
Replaces the specified word in a query with the top term in the standard hierarchy ( |
Specify the operand for the top term operator. term is replaced by the top term defined for the term in the specified thesaurus. However, if no TT
entries are defined for term, term is not replaced in the query expression and term is the result of the expansion.
You cannot specify expansion operators in the term
argument.
Specify the name of the thesaurus used to return the expansions for the specified term. The thes argument is optional and has a default value of DEFAULT
. A thesaurus named DEFAULT
must exist in the thesaurus tables if you use this default value.
The term dog has a top term of animal in the standard hierarchy of a thesaurus. A TT
query for dog returns all documents that contain the phrase animal. Documents that contain the word dog are not returned.
You can browse your thesaurus using procedures in the CTX_THES
package.
See Also:
For more information on browsing the top terms in your thesaurus, see CTX_THES.TT in Chapter 12, "CTX_THES Package". |
The weight operator multiplies the score by the given factor, topping out at 100 when the score exceeds 100. For example, the query cat, dog*2 sums the score of cat with twice the score of dog, topping out at 100 when the score is greater than 100.
In expressions that contain more than one query term, use the weight operator to adjust the relative scoring of the query terms. You can reduce the score of a query term by using the weight operator with a number less than 1; you can increase the score of a query term by using the weight operator with a number greater than 1 and less than 10.
The weight operator is useful in accumulate, OR
, or AND
queries when the expression has more than one query term. With no weighting on individual terms, the score cannot tell you which of the query terms occurs the most. With term weighting, you can alter the scores of individual terms and hence make the overall document ranking reflect the terms you are interested in.
Syntax | Description |
---|---|
term*n |
Returns documents that contain term. Calculates score by multiplying the raw score of term by n, where n is a number from 0.1 to 10. |
You have a collection of sports articles. You are interested in the articles about soccer, in particular Brazilian soccer. It turns out that a regular query on soccer or Brazil returns many high ranking articles on US soccer. To raise the ranking of the articles on Brazilian soccer, you can issue the following query:
'soccer or Brazil*3'
Table 3-1 illustrates how the weight operator can change the ranking of three hypothetical documents A, B, and C, which all contain information about soccer. The columns in the table show the total score of four different query expressions on the three documents.
soccer | Brazil | soccer or Brazil | soccer or Brazil*3 | |
---|---|---|---|---|
A |
20 |
10 |
20 |
30 |
B |
10 |
30 |
30 |
90 |
C |
50 |
20 |
50 |
60 |
The score in the third column containing the query soccer or Brazil is the score of the highest scoring term. The score in the fourth column containing the query soccer or Brazil*3 is the larger of the score of the first column soccer and of the score Brazil multiplied by three, Brazil*3.
With the initial query of soccer or Brazil, the documents are ranked in the order C B A. With the query of soccer or Brazil*3, the documents are ranked B C A, which is the preferred ranking.
Wildcard characters can be used in query expressions to expand word searches into pattern searches. The wildcard characters are:
Note: When a wildcard expression translates to a stopword, the stopword is not included in the query and not highlighted by |
Right truncation involves placing the wildcard on the right-hand-side of the search string.
For example, the following query expression finds all terms beginning with the pattern scal:
'scal%'
Left truncation involves placing the wildcard on the left-hand-side of the search string.
To find words such as king, wing or sing, you can write your query as follows:
'_ing'
You can write this query more generally as:
'%ing'
You can also combine left-truncated and right-truncated searches to create double-truncated searches. The following query finds all documents that contain words that contain the substring %benz%
'%benz%'
You can improve wildcard query performance by adding a substring or prefix index.
When your wildcard queries are left- and double-truncated, you can improve query performance by creating a substring index. Substring indexes improve query performance for all types of left-truncated wildcard searches such as %ed, _ing, or %benz%.
When your wildcard queries are right-truncated, you can improve performance by creating a prefix index. A prefix index improves query performance for wildcard searches such as to%.
See Also:
For more information about creating substring and prefix indexes, see "BASIC_WORDLIST" in Chapter 2. |
You can use the WITHIN
operator to narrow a query down into document sections. Document sections can be one of the following:
To find all the documents that contain the term San Francisco within the section Headings, write your query as follows:
'San Francisco WITHIN Headings'
To find all the documents that contain the term sailing and contain the term San Francisco within the section Headings, write your query in one of two ways:
'(San Francisco WITHIN Headings) and sailing' 'sailing and San Francisco WITHIN Headings'
To find all documents that contain the terms dog and cat within the same section Headings, write your query as follows:
'(dog and cat) WITHIN Headings'
This query is logically different from:
'dog WITHIN Headings and cat WITHIN Headings'
This query finds all documents that contain dog and cat where the terms dog and cat are in Headings sections, regardless of whether they occur in the same Headings section or different sections.
To find all documents in which dog is near cat within the section Headings, write your query as follows:
'dog near cat WITHIN Headings'
Note: The near operator has higher precedence than the |
You can nest the within operator to search zone sections within zone sections.
For example, assume that a document set had the zone section AUTHOR
nested within the zone BOOK
section. You write a nested WITHIN
query to find all occurrences of scott within the AUTHOR
section of the BOOK
section as follows:
'(scott WITHIN AUTHOR) WITHIN BOOK'
The syntax for querying within a field section is the same as querying within a zone section. The syntax for most of the examples given in the previous section, "Querying Within Zone Sections", apply to field sections.
However, field sections behave differently from zone sections in terms of
WITHIN
queries cannot distinguish repeated field sections.WITHIN
query with a field section.The following sections describe these differences.
When a field section is created with the visible flag set to FALSE
in CTX_DDL.ADD_FIELD_SECTION
, the text within a field section can only be queried using the WITHIN
operator.
For example, assume that TITLE
is a field section defined with visible flag set to FALSE. Then the query dog without the WITHIN
operator will not find a document containing:
<TITLE>The dog</TITLE> I like my pet.
To find such a document, you can use the WITHIN
operator as follows:
'dog WITHIN TITLE'
Alternatively, you can set the visible flag to TRUE
when you define TITLE
as a field section with CTX_DDL.ADD_FIELD_SECTION
.
See Also:
For more information about creating field sections, see ADD_FIELD_SECTION in Chapter 7, "CTX_DDL Package". |
WITHIN
queries cannot distinguish repeated field sections in a document. For example, consider the document with the repeated section <author>
:
<author> Charles Dickens </author> <author> Martin Luther King </author>
Assuming that <author>
is defined as a field section, a query such as (charles and martin) within author returns the document, even though these words occur in separate tags.
To have WITHIN
queries distinguish repeated sections, define the sections as zone sections.
You cannot issue a nested WITHIN
query with field sections. Doing so raises an error.
Querying within sentence or paragraph boundaries is useful to find combinations of words that occur in the same sentence or paragraph. To query sentence or paragraphs, you must first add the special section to your section group before you index. You do so with CTX_DDL.ADD_SPECIAL_SECTION
.
To find documents that contain dog and cat within the same sentence:
'(dog and cat) WITHIN SENTENCE'
To find documents that contain dog and cat within the same paragraph:
'(dog and cat) WITHIN PARAGRAPH'
To find documents that contain sentences with the word dog but not cat:
'(dog not cat) WITHIN SENTENCE'
You can query within attribute sections when you index with either XML_SECTION_GROUP
or AUTOMATIC_SECTION_GROUP
as your section group type.
Assume you have an XML document as follows:
<book title="Tale of Two Cities">It was the best of times.</book>
You can define the section title@book
to be the attribute section title
. You can do so with the CTX_DLL.ADD_ATTR_SECTION
procedure or dynamically after indexing with ALTER
INDEX
.
To search on Tale within the attribute section title
, you issue the following query:
'Tale WITHIN title'
The following constraints apply to querying within attribute sections:
<book title="Tale of Two Cities">It was the best of times.</book>
A query on Tale by itself does not produce a hit on the document unless qualified with WITHIN title@book
. (This behavior is like field sections when you set the visible flag set to false.)
WITHIN
query.Now is the time for all good <word type="noun"> men </word> to come to the aid.
Then this document would hit on the regular query good men, ignoring the intervening attribute text.
WITHIN
queries can distinguish repeated attribute sections. This behavior is like zone sections but unlike field sections. For example, you have a document as follows:
<book title="Tale of Two Cities">It was the best of times.</book> <book title="Of Human Bondage">The sky broke dull and gray.</book>
Assume that book
is a zone section and book@author
is an attribute section. Consider the query:
'(Tale and Bondage) WITHIN book@author'
This query does not hit the document, because tale and bondage are in different occurrences of the attribute section book@author
.
The WITHIN
operator requires you to know the name of the section you search. A list of defined sections can be obtained using the CTX_SECTIONS or CTX_USER_SECTIONS views.
For special and zone sections, the terms of the query must be fully enclosed in a particular occurrence of the section for the document to satisfy the query. This is not a requirement for field sections.
For example, consider the query where bold is a zone section:
'(dog and cat) WITHIN bold'
This query finds:
<B>dog cat</B>
but it does not find:
<B>dog</B><B>cat</B>
This is because dog and cat must be in the same bold section.
This behavior is especially useful for special sections, where
'(dog and cat) WITHIN sentence'
means find dog and cat within the same sentence.
Field sections on the other hand are meant for non-repeating, embedded meta-data such as a title section. Queries within field sections cannot distinguish between occurrences. All occurrences of a field section are considered to be parts of a single section. For example, the query:
(dog and cat) WITHIN title
can find a document like this:
<TITLE>dog</TITLE><TITLE>cat</TITLE>
In return for this field section limitation and for the overlap and nesting limitations, field section queries are generally faster than zone section queries, especially if the section occurs in every document, or if the search term is common.
The WITHIN
operator has the following limitations:
WITHIN
clause in a phrase. For example, you cannot write: term1 WITHIN section term2WITHIN
with expansion operators, such as $ ! and *.WITHIN
is a reserved word, you must escape the word with braces to search on it.
|
Copyright © 1998, 2002 Oracle Corporation. All Rights Reserved. |
|