7
Searching XML Data with Oracle Text

This chapter explains the use of Oracle Text functionality in indexing and querying XML data. It contains the following sections:

Searching XML Data with Oracle Text
Introducing Oracle Text
Assumptions Made in This Chapter's Examples
Oracle Text Users and Roles
Querying with the CONTAINS Operator
Using the WITHIN Operator to Narrow Query Down to Document Sections
Introducing SECTION_GROUPS
INPATH or HASPATH Operators Search Using XPath-Like Expressions
Building a Query Application with Oracle Text
Step 1. Create a Section Group Preference
Step 2. Set the Preference's Attributes
Step 3. Create an Index Using the Section Preference Created in Step 2
Step 4. Create Your Query Syntax
Presenting the Results of Your Query
XMLType Indexing
Using Oracle Text with Oracle XML DB
Full-Text Search Functions in XPath Using ora:contains
Oracle XML DB: Creating a Policy for ora:contains()
Oracle XML DB: Using CTXXPATH Indexes for existsNode()
Using Oracle Text: Advanced Techniques
Case Study: Searching XML-Based Conference Proceedings

Frequently Asked Questions About Oracle Text

Note:

In Oracle9i, you can use the WITHIN or INPATH operators. INPATH was introduced in Oracle9i Release 1 (9.0.1) to handle XPath searching in XML documents. Everything you can do with the WITHIN operator, you can also do using INPATH. INPATH is the recommended syntax in Oracle9i Release 1 (9.0.1) and higher when searching XML data.

Searching XML Data with Oracle Text

This chapter describes the following aspects of Oracle Text:

How to create a section group and index your XML document(s)
How to build an XML query application with Oracle Text, to search and retrieve data from your XML document(s)
Using Oracle Text to search XMLType data

Introducing Oracle Text

Note:

Oracle Text is a strictly server-based implementation.

See Also:

http://otn.oracle.com/products/text

Oracle Text (aka interMedia Text) can be used to search XML documents. It extends Oracle9i by indexing any text or document stored in Oracle. It can also search documents in the file system and URLs.

Oracle Text enables the following:

Content-based queries, such as, finding text and documents which contain particular words, using familiar, standard SQL.
File-based text applications to use Oracle9i to manage text and documents in an integrated fashion with traditional relational information.
Concept searching of English language documents.
Theme analysis of English language documents using the theme/gist package.
Highlighting hit words. With Oracle Text, you can render a document in different ways. For example, you can present documents with query terms highlighted, either the "words" of a word query or the "themes" of an ABOUT query in English. Use the CTX_DOC.MARKUP or HIGHLIGHT procedures for this.
Highlighting hit words. With Oracle Text, you can render a document in different ways. For example, you can present documents with query terms highlighted, either the "words" of a word query or the "themes" of an ABOUT query in English. Also, you can present XML documents with the INPATH/HASPATH query element(s) highlighted. Use CTX_DOC.MARKUP or HIGHLIGHT procedures for this.
With Oracle Text you can use PL/SQL packages for document presentation and thesaurus maintenance.

You can query XML data stored in the database directly, without using Oracle Text. However, Oracle Text is useful for boosting query performance.

See Also :

Accessing Oracle Text

Oracle Text is a standard feature that comes with every Oracle9i Standard, Enterprise, and Personal edition license. It needs to be selected during installation. No special installation instructions are required.

Oracle Text is essentially a set of schema objects owned by CTXSYS. These objects are linked to the Oracle kernel. The schema objects are present when you perform an Oracle9i installation.

Oracle Text Now Supports XMLType

You can now perform Oracle Text searches on tables containing XMLType columns.

Further Oracle Text Examples

You can find more examples for Oracle Text and for creating section group indexes at the following site: http://otn.oracle.com/products/text

Assumptions Made in This Chapter's Examples

XML text is aVARCHAR2 or CLOB type in an Oracle9i database table with character semantics. Oracle Text can also deal with documents in a file system or in URLs, but we are not considering these document types in this chapter.

To simplify the examples included in this chapter they use a subset of the Oracle Text options and make the following assumptions:

All XML data here is represented using US-ASCII, a 7 bit character set.
Issues about whether a character such as "*" is treated as white space or as part of a word are not included.
Storage characteristics of the Oracle schema object that implement the Oracle Text index are not considered.
They focus on the SECTION GROUP parameter in the CREATE INDEX or ALTER INDEX statement. Other parameter types available for CREATE INDEX and ALTER INDEX, are DATASTORE, FILTER, LEXER, STOPLIST, and WORDLIST.

Here is an example of using SECTION GROUP in CREATE INDEX:
```
CREATE INDEX my_index 
  ON my_table ( my_column ) 
  INDEXTYPE IS ctxsys.context 
  PARAMETERS ( 'SECTION GROUP my_section_group' ) ; 
```
Specifically, the examples focus on using AUTO_SECTION_GROUP and XML_SECTION_GROUP, and PATH_SECTION_GROUP.
Tagged or marked up data. In this chapter, the examples focus on how to handle XML data. Oracle Text handles many other kinds of data besides XML data.
See Also:

Oracle Text Application Developer's Guide

Oracle Text Reference, for more information on these parameter types.

Oracle Text Users and Roles

With Oracle Text you can use the following users/roles:

user CTXSYS to administer users
role CTXAPP to create and delete Oracle Text preferences and use Oracle Text PL/SQL packages

User CTXSYS

UserCTXSYS is created at install time. Administer Oracle Text users as this user. User CTXSYS has the following privileges:

Modify system-defined preferences
Drop and modify other user preferences
Call procedures in the CTX_ADM PL/SQL package to start servers and set system-parameters
Start a ctxsrv server
Query all system-defined views
Perform all the tasks of a user with the CTXAPP role

Role CTXAPP

Any user can create an Oracle Text index and issue a Text query. For additional tasks, use the CTXAPP role. This is a system-defined role that enables you to perform the following tasks:

Create and delete Oracle Text preferences
Use Oracle Text PL/SQL packages, such as the CTX_DDL package

Querying with the CONTAINS Operator

Oracle Text's main purpose is to provide an implementation for the CONTAINS operator. The CONTAINS operator can be used in the WHERE clause of a SELECT statement to specify the query expression for a Text query.

CONTAINS Syntax

Here is the CONTAINS syntax:

...WHERE CONTAINS([schema.]column,text_query VARCHAR2,[label NUMBER])

where:

Table 7-1 CONTAINS Operator: Syntax Description

Syntax	Description
[schema.] column	Specifies the text column to be searched on. This column must have a Text index associated with it.
text_query	Specifies the query expression that defines your search in column.
label	Optionally specifies the label that identifies the score generated by the `CONTAINS` operator.

For each row selected, CONTAINS returns a number between 0 and 100 that indicates how relevant the document row is to the query. The number 0 means that Oracle found no matches in the row. You can obtain this score with the SCORE operator.

Note:

You must use the SCORE operator with a label to obtain this number.

Example 7-1 Using a Simple SELECT Statement with CONTAINS

The following example illustrates how the CONTAINS operator is used in a SELECT statement:

SELECT id FROM my_table
   WHERE
   CONTAINS (my_column, 'receipts') > 0

The'receipts' parameter of the CONTAINS operator is called the "Text Query Expression".

Note:

The SQL statement with the CONTAINS operator requires an Oracle Text index in order to run.

Example 7-2 Using the Score Operator with a Label to Obtain the Relevance

The following example searches for all documents in the text column that contain the word Oracle. The score for each row is selected with the SCORE operator using a label of 1:

SELECT SCORE(1), title from newsindex 
           WHERE CONTAINS(text, 'oracle', 1) > 0 ORDER BY SCORE(1) DESC;

The CONTAINS operator must always be followed by the > 0 syntax. This specifies that the score value calculated by the CONTAINS operator must be greater than zero for the row selected.

When the SCORE operator is called, such as in a SELECT clause, the operator must reference the label value as shown in the example.

Using the WITHIN Operator to Narrow Query Down to Document Sections

When documents have internal structure such as in HTML and XML, you can define document sections using embedded tags before you index. This enables you to query within the sections using the WITHIN operator.

Note:

This is only true for XML_SECTION_GROUP, but not true for AUTO_ or PATH_SECTION_GROUP.

Introducing SECTION_GROUPS

You can query within attribute sections when you index with either XML_SECTION_GROUP, AUTO_SECTION_GROUP, or PATH_SECTION_GROUP your section group type. Consider the following XML document:

<book title="Tale of Two Cities">It was the best of times.</book>

XML_SECTION_GROUP

If you use XML_SECTION_GROUP, you can specify any of the following sections:

Zone sections
Field sections
Attribute section
Special sections

This chapter only focuses on Zone, Field, and Attribute sections. For more information on Special sections see Oracle Text Reference and Oracle Text Application Developer's Guide.

Zone Sections: CTX_DLL.ADD_ZONE_SECTION Procedure

The syntax for this is:

CTX_DDL.ADD_ZONE_SECTION(
  group_name     in    varchar2,
  section_name   in    varchar2,
  tag            in    varchar2);

To define a chapter as a Zone section, create an XML_SECTION_GROUP and define the Zone section as follows:

EXEC ctx_ddl_create_section_group('myxmlgroup', 'XML_SECTION_GROUP');
EXEC ctx_ddl.add_zone_section('myxmlgroup', 'chapter', 'chapter');

When you define Zone section as such and index the document set, you can query the XML chapter Zone section as follows:

'Cities within chapter'

Field Sections: CTX_DLL.ADD_FIELD_SECTION Procedure

The syntax for this is:

CTX_DDL.ADD_FIELD_SECTION(
  group_name     in    varchar2,
  section_name   in    varchar2,
  tag            in    varchar2);

To define a abstract as a Field section, create an XML_SECTION_GROUP and define the Field section as follows:

EXEC ctx_ddl_create_section_group('myxmlgroup', 'XML_SECTION_GROUP');
EXEC ctx_ddl.add_field_section('myxmlgroup', 'abstract', 'abstract');

When you define Field section as such and index the document set, you can query the XML abstract Field section as follows:

'Cities within abstract'

Attribute Section: CTX_DLL.ADD_ATTR_SECTION Procedure

The syntax for this is:

CTX_DDL.ADD_ATTR_SECTION(
  group_name     in    varchar2,
  section_name   in    varchar2,
  tag            in    varchar2);

To define the booktitle attribute as an Attribute section, create an XML_SECTION_GROUP and define the Attribute section as follows:

EXEC ctx_ddl_create_section_group('myxmlgroup', 'XML_SECTION_GROUP');
EXEC ctx_ddl.add_attr_section('myxmlgroup', 'booktitle', 'book@title');

When you define the Attribute section as such and index the document set, you can query the XML booktitle attribute text as follows:

'Cities within booktitle'

Constraints for Querying Attribute or Field Sections

The following constraints apply to querying within Attribute or Field sections:

Regular queries on attribute text will not work unless qualified in a WITHIN clause. Using the following XML document:
```
<book title="Tale of Two Cities">It was the best of times.</book>
```
querying on Tale will not work unless qualified with 'WITHIN title@book'.
You cannot use Attribute or Field sections in a nested WITHIN query.
Phrases ignore attribute text. For example, if the original document looked like:
```
....Now is the time for all good <word type="noun"> men </word> to come to 
the aid......
```
The search would result in a regular query's, "good men", and ignore the intervening attribute text.

AUTO_ SECTION_GROUP/ PATH_SECTION_GROUP for INPATH and HASPATH

When you use the AUTO_SECTION_GROUP or PATH_SECTION_GROUP to index XML documents, Oracle9i automatically creates sections.

To search on Tale within the Attribute section booktitle, include the following WITHIN clause in your SELECT statement:

If you are using XML_SECTION_GROUP:

... WHERE CONTAINS ('Tale INPATH booktitle')>0;

If you are using PATH_SECTION_GROUP
```
... WHERE CONTAINS ('Tale INPATH title@book')>0;
```
See Also:
"Distinguishing Tags Across DocTypes" .

Dynamically Adding Sections or Stop Section Using ALTER INDEX

The syntax for ALTER INDEX is:

ALTER INDEX [schema.]index REBUILD [ONLINE] [PARAMETERS (paramstring)];

where

paramstring = 'replace [datastore datastore_pref] 
                       [filter filter_pref] 
                       [lexer lexer_pref] 
                       [wordlist wordlist_pref] 
                       [storage storage_pref] 
                       [stoplist stoplist] 
                       [section group section_group]
                          [memory memsize]
|    ... 
|    add zone section section_name tag tag
|    add field section section_name tag tag [(VISIBLE | INVISIBLE)]
|    add attr section section_name tag tag@attr
|    add stop section tag'

The added section applies only to documents indexed after this operation. Thus for the change to take effect, you must manually re-index any existing documents that contain the tag. The index is not rebuilt by this statement.

WITHIN Syntax for Section Querying

Here is the WITHIN syntax for querying sections:

...WHERE CONTAINS(text,'XML WITHIN title') >0;...

This searches for expression text within a section. If you are using XML_SECTION_GROUP the following restrictions apply to the pre-defined zone, field, or attribute section:

If section is a zone, expression can contain one or more WITHIN operators (nested WITHIN) whose section is a zone or special section.
If section is a field or attribute section, expression cannot contain another WITHIN operator.

You can combine and nest WITHIN clauses. For finer grained searches of XML sections, you can use WITHIN clauses inside CONTAINS select statements.

WITHIN Operator Limitations

The WITHIN operator has the following limitations:

You cannot embed the WITHIN clause in a phrase. For example, you cannot write: term1 WITHIN section term2
You cannot combine WITHIN with expansion operators, such as $ ! and *.
Since WITHIN is a reserved word, you must escape the word with braces to search on it.

See Also:
Oracle Text Reference

INPATH or HASPATH Operators Search Using XPath-Like Expressions

Path Indexing and Path Querying with Oracle Text

In Oracle9i Oracle Text introduced a new section type and new query operators which support an XPath-like query language. Indexes of type context with XML path searching are able to perform very complex section searches on XML documents. Here are the basic concepts of path indexing and path querying.

Path Indexing

Section searching is enabled by defining section groups. To use XML path searching, the Oracle Text index must be created with the new section group, PATH_SECTION_GROUP as follows:

begin
   ctx_ddl.create_section_group('mypathgroup','PATH_SECTION_GROUP'); 
end;

To create the Oracle Text index use this command:

create index order_idx on library_catalog(text)
 indextype is ctxsys.context 
 parameters ('SECTION GROUP mypathgroup');

Path Querying

The Oracle Text path query language is based on W3C XPath. For Oracle9i Release 1 (9.0.1) and higher, you can use the INPATH and HASPATH operators to express path queries.

Using INPATH Operator for Path Searching in XML Documents

You can use INPATH operator to perform path searching in XML documents. Table 7-2 summarizes the ways you can use the INPATH operator for path searching.

Table 7-2 Path Searching XML Documents Using the INPATH Operator

Path Search Feature	Syntax	Description
Simple Tag Searching	virginia INPATH (//STATE)	Finds all documents where the word "virginia" appears between <STATE> and </STATE>. The STATE element can appear at any level of the document structure.
Case-sensitivity	virginia INPATH (STATE) virginia INPATH (State)	Tags and attribute names in path searching are case-sensitive. virginia `INPATH` STATE -- finds <STATE>virginia</STATE> but NOT <State>virginia</State>. To find the latter you must do virginia `INPATH` State.
Top-Level Tag Searching	virginia INPATH (Legal) virginia INPATH (/Legal) For example, the following query finds Quijote where it occurs between <order> and </order>: select id from library_catalog where contains(text,'Quijote INPATH(order)') > 0; Here <order> must be the top level tag.	Finds all documents where "virginia" appears in a Legal element which is the top-level tag.'Legal' MUST be the top-level tag of the document.'virginia' may appear anywhere in this tag regardless of other intervening tags. For example: <?xml version="1.0" standalone="yes"?> <!-- <?xml-stylesheet type="text/xsl" href="./xsl/vacourtfiling(html).xsl"?> --> <Legal> <CourtFiling> <Filing ID="f001" FilingType="Civil"> <LeadDocument> <CaseCaption> <CourtInformation> <Location> <Address> <AddressState>VIRGINIA</AddressState> </Address> ... </Legal>
Any Level Tag Searching	virginia INPATH (//Address) For example, a double slash indicates "any number of levels" down. The following query finds Quijote inside a <title> tag that occurs at the top level or any lower level: select id from library_catalog where contains(text,'Quijote INPATH(//title)') > 0;	'Virginia' can appear anywhere within an 'Address' tag, which may appear within any other tags. for example: <?xml version="1.0" standalone="yes"?> <!-- <?xml-stylesheet type="text/xsl" href="./xsl/vacourtfiling(html).xsl"?> --> <Legal> <CourtFiling> <Filing ID="f001" FilingType="Civil"> <LeadDocument> <CaseCaption> <CourtInformation> <Location> <Address> <AddressState> VIRGINIA </AddressState>... </Legal>
Direct Parentage Path Searching	virginia INPATH (//CourtInformation/Location) for example: select id from library_catalog where contains(text,'virginia INPATH(order/item)') > 0;	Finds all documents where "virginia" appears in a `Location` element which is a direct child of a `CourtInformation` element. For example: <?xml version="1.0" standalone="yes"?> <!-- <?xml-stylesheet type="text/xsl" href="./xsl/vacourtfiling(html).xsl"?> --> <Legal> <CourtFiling> <Filing ID="f001" FilingType="Civil"> <LeadDocument> <CaseCaption> <CourtInformation> <Location> <Address> <AddressState> VIRGINIA </AddressState> </Address>... </CourtInformation>
Single-Level Wildcard Searching	virginia INPATH(A//B) 'virginia INPATH (//CaseCaption//Location)'	Finds all documents where "virginia" appears in a B element which is a grandchild of an A element. For instance, <A><D><B>virginia</B></D></A>. The intermediate element does not need to be an indexed XML tag. For example: <?xml version="1.0" standalone="yes"?> <!-- <?xml-stylesheet type="text/xsl" href="./xsl/vacourtfiling(html).xsl"?> --> <Legal> <CourtFiling> <Filing ID="f001" FilingType="Civil"> <LeadDocument> <CaseCaption> <CourtInformation> <Location> <Address> <AddressState>VIRGINIA</AddressState>... </Legal>
Multi-level Wildcard Searching	'virginia INPATH (Legal//Filing//*/CourtInformation)'	'Legal' must be a top-level tag, and there must be exactly one tag-level between 'Legal' and 'Filing', and two between 'Filing' and 'CourtInformation'. 'Virginia' may then appear anywhere within 'CourtInformation'. For example: <?xml version="1.0" standalone="yes"?> <!-- <?xml-stylesheet type="text/xsl" href="./xsl/vacourtfiling(html).xsl"?> --> <Legal> <CourtFiling> <Filing ID="f001" FilingType="Civil"> <LeadDocument> <CaseCaption> <CourtInformation> <Location> <Address> <AddressState>VIRGINIA</AddressState> </Address> </Location> <CourtName> IN THE CIRCUIT COURT OF LOUDOUN COUNTY </CourtName> </CourtInformation>....
Descendant Searching	virginia INPATH(A//B)	Finds all documents where "virginia" appears in a B element which is some descendant (any level) of an A element.
Attribute Searching	virginia INPATH(A/@B)	Finds all documents where "virginia" appears in the B attribute of an A element. You can search within an attribute value using the syntax <tag>/@<attribute>: select id from library_catalog where contains(text,'dvd `INPATH`(//item/@type)') > 0; AND and OR You can use boolean AND and OR to combine existence or equality predicates in a test. select id from library_catalog where contains(text,'Levy or Cervantes `INPATH`(//title)') >0;
Descendant/Attribute Existence Testing	virginia INPATH (A[B]) You can search for documents using the any-level tag searching: select id from library_catalog where contains (text,'Quijote INPATH(/order/title)') > 0; You can also use the "" as a single level wildcard. The matches exactly one level.: select id from library_catalog where contains (text,'Cervantes INPATH(/order/*/author)') > 0;	Finds all documents where "virginia" appears in an A element which has a B element as a direct child. virginia INPATH A[.//B] -- Finds all documents where "virginia" appears in an A element which has a B element as a descendant (any level). virginia INPATH A[@B] -- Finds all documents where "virginia" appears in an A element which has a B attribute
Attribute Value Testing Within Equality	virginia INPATH A[@B = "foo"] That means that: virginia INPATH (A[@B = "pot of gold"]), would, with the default lexer and stoplist, match any of the following: <A B="POT OF GOLD">virginia</A> By default, lexing is case-independent, so "pot" matches "POT", <A B="POT BLACK GOLD">virginia</A> By default, "of" is a stopword, and, in a query, would match any word in that position, <A B=" Pot OF Gold ">virginia</A>	Finds all documents where "virginia" appears in an A element which has a B attribute whose value is "foo". Only equality is supported as a test. Range operators and functions are not supported. The left-hand-side of the equality MUST be an attribute or tag. Literals here are not allowed. The right-hand-side must be a literal. Tags and attributes here are not allowed. Within equality (See "Using INPATH Operator for Path Searching in XML Documents" ) is used to evaluate the test. Whitespace is mainly ignored in text indexing. Again, lexing is case-independent: <A B="pot_of_gold">virginia</A> Underscore is a non-alphabetic character, and is not a join character by default. As a result, it is treated more or less as whitespace and breaks up that string into three words. Example: select id from library_catalog where contains(text,'(Bob the Builder) `INPATH`(//item[@type="dvd"])') > 0; The following will not return rows: select id from library_catalog where contains(text,'(Bob the Builder) `INPATH`(//item[@type="book"])') > 0;
Numeric Equality	virginia INPATH (A[@B = 5])	Numeric literals are allowed. But they are treated as text. The within equality is used to evaluate. This means that the query does NOT match. That is, <A B="5.0">virginia</A> does not match A[@B=5] where "5.0", a decimal is not considered the same as 5, an integer.
Conjunctive Testing	virginia INPATH (A[B AND C]) virginia INPATH (A[B AND @C = "foo"])...	Predicates can be conjunctively combined.
Combining Path and Node Tests	virginia INPATH (A[@B = "foo"]/C/D) virginia INPATH(A//B[@C]/D[E])...	Node tests can be applied to any node in the path.

Using HASPATH Operator for Path Searching in XML Documents

Use the HASPATH operator to find all XML documents that contain a specified section path. HASPATH is used when you want to test for path existence. It is also very useful for section equality testing. To find all XML documents where an order has an item within it:

select id from library_catalog 
    where contains(text,'HASPATH(order/item)') > 0;

will return all documents where the top-level tag is a order element which has a item element as a direct child.

In Oracle9i, Oracle Text introduces a new section type and new query operators which support an XPath-like query language. Indexes of type context with XML path searching are able to perform very complex section searches on XML documents. Here are more examples of path querying using INPATH and HASPATH. Assuming the following XML document:

<?xml version="1.0"?> 
  <order> 
    <item type="book">
       <title>Crypto</title>
       <author>Levi</author>
    </item> 
    <item type="dvd">
       <title> Bob the Builder</title>
       <author>Auerbach</author>
    </item> 
    <item type="book">
       <title>Don Quijote</title>
       <author>Cervantes</author>
    </item> 
  </order>

In general, use INPATH and HASPATH operators only when your index has been created with PATH_SECTION_GROUP. Use of PATH_SECTION_GROUP enables path searching. Path searching extends the syntax of the WITHIN operator so that the section name operand (right-hand-side) is a path instead of a section name.

Using HASPATH Operator for Path Existence Searching

Note:

The HASPATH operator functions in a similar fashion to the existsNode() in XMLType.

Only use the HASPATH operator when your index has been created with the PATH_SECTION_GROUP. The syntax for the HASPATH operator is:

WHERE CONTAINS(column, 'HASPATH(path)'...): Here HASPATH searches an XML document set and returns a score of 100 for all documents where path exists. Parent and child paths are separated with the / character, for example, A/B/C. For example, the query:
```
...WHERE CONTAINS (col,'HASPATH(A/B/C)')>0;
```
finds and returns a score of 100 for the document:
```
<A><B><C>Virginia</C></B></A>
```
without having to reference Virginia at all.
WHERE CONTAINS(column, 'HASPATH(A="value")'...): Here the HASPATH clause searches an XML document set and returns a score of 100 for all documents that have element A with content value and only that value. HASPATH is used to test equality. This is the "Section Equality Testing" feature of the HASPATH operator. The query:
```
...WHERE CONTAINS virginia INPATH A
```
finds <A>virginia</A>, but it also finds <A>virginia state</A>. To limit the query to the term virginia and nothing else, you can use a section equality test with the HASPATH operator. For example:
```
... WHERE CONTAINS (col,'HASPATH(A="virginia")'
```
finds and returns a score of 100 only for the first document, and not the second.

Tag Value Equality Testing

You can do tag value equality test with HASPATH:

select id from library_catalog

where CONTAINS(text,'HASPATH (//author="Auerbach")') >0;

Building a Query Application with Oracle Text

To build a Oracle Text query application carry out the following steps:

Create a section preference group. Before you create a section group and Oracle text index you must first determine the role you will need and grant the appropriate privilege. See "Oracle Text Users and Roles" , and grant the appropriate privilege.

After creating and preparing your data, you are ready to perform the next step. See "Step 1. Create a Section Group Preference".
Add sections or stop_sections
Create an Oracle Text index based on the section group you created. Using the section preference created, you then create an Oracle Text index. See Building a Query Application with Oracle Text.
Build your query application using the CONTAINS operator. Now you can finish building your query application. See "Building a Query Application with Oracle Text".

What Role Do You Need?

First determine the role you need. See Oracle Text Reference and "Oracle Text Users and Roles" , and grant the appropriate privilege as follows:

CONNECT system/manager
GRANT ctxapp to scott;
CONNECT scott/tiger

Step 1. Create a Section Group Preference

The first thing you must do is create a preference. This section describes how to create section preferences using PATH_SECTION_GROUP, XML_SECTION_GROUP, and AUTO_SECTION_GROUP. Table 7-3 describes the groups and summarizes their features.

Table 7-3 Comparing Oracle Text Section Groups

Section Group Description

Section Group	Description
XML_SECTION_GROUP	Use this group type for indexing XML documents and for defining sections in XML documents.
AUTO_SECTION_GROUP	Use this group type to automatically create a zone section for each start-tag/end-tag pair in an XML document. The section names derived from XML tags are case-sensitive as in XML. Attribute sections are created automatically for XML tags that have attributes. Attribute sections are named in the form attribute@tag. Stop sections, empty tags, processing instructions, and comments are not indexed. The following limitations apply to automatic section groups: You cannot add zone, field or special sections to an automatic section group. Automatic sectioning does not index XML document types (root elements.) However, you can define stop-sections with document type. The length of the indexed tags including prefix and namespace cannot exceed 64 characters. Tags longer than this are not indexed.
PATH_SECTION_GROUP	Use this group type to index XML documents. Behaves like the `AUTO_SECTION_GROUP`. With this section group you can do path searching with the `INPATH` and `HASPATH` operators. Queries are case-sensitive for tag and attribute names. How is PATH_SECTION_GROUP Similar to AUTO_SECTION_GROUP? Documents are assumed to be XML, Every tag and every attribute is indexed by default, Stop sections can be added to prevent certain tags from being indexed, Only stop sections can be added `-- ZONE, FIELD`, and `SPECIAL` sections cannot be added, When indexing XML document collections, you do not need to explicitly define sections as Oracle automatically does this for you. How Does PATH_SECTION_GROUP Differ From AUTO_SECTION_GROUP? Path Searching is allowed at query time (see "Case Study: Searching XML-Based Conference Proceedings" and "You can use INPATH operator to perform path searching in XML documents. Table 7-2 summarizes the ways you can use the INPATH operator for path searching." ) with the new `INPATH` and `HASPATH` operators, Tag and attribute names are case-sensitive in queries.

XML_SECTION_GROUP

Use this group type for indexing XML documents and for defining sections in XML documents.

AUTO_SECTION_GROUP

Use this group type to automatically create a zone section for each start-tag/end-tag pair in an XML document. The section names derived from XML tags are case-sensitive as in XML. Attribute sections are created automatically for XML tags that have attributes. Attribute sections are named in the form attribute@tag. Stop sections, empty tags, processing instructions, and comments are not indexed. The following limitations apply to automatic section groups:

You cannot add zone, field or special sections to an automatic section group.
Automatic sectioning does not index XML document types (root elements.) However, you can define stop-sections with document type.
The length of the indexed tags including prefix and namespace cannot exceed 64 characters. Tags longer than this are not indexed.

PATH_SECTION_GROUP

Use this group type to index XML documents. Behaves like the AUTO_SECTION_GROUP. With this section group you can do path searching with the INPATH and HASPATH operators. Queries are case-sensitive for tag and attribute names.

How is PATH_SECTION_GROUP Similar to AUTO_SECTION_GROUP?

Documents are assumed to be XML, Every tag and every attribute is indexed by default, Stop sections can be added to prevent certain tags from being indexed, Only stop sections can be added -- ZONE, FIELD, and SPECIAL sections cannot be added, When indexing XML document collections, you do not need to explicitly define sections as Oracle automatically does this for you.

How Does PATH_SECTION_GROUP Differ From AUTO_SECTION_GROUP?

Path Searching is allowed at query time (see "Case Study: Searching XML-Based Conference Proceedings" and "You can use INPATH operator to perform path searching in XML documents. Table 7-2 summarizes the ways you can use the INPATH operator for path searching." ) with the new INPATH and HASPATH operators, Tag and attribute names are case-sensitive in queries.

Note:

If you are using the AUTO_SECTION_GROUP or PATH_SECTION_GROUP to index an XML document collection, you need not explicitly define sections since the system does this for you during indexing.

Deciding Which Section Group to Use

How do you determine which section groups is best for your application? This depends on your application. Table 7-4 lists some general guidelines to help you decide which of the XML_, AUTO_, or PATH_ section groups to use when indexing your XML documents, and why.

Table 7-4 Guidelines for Choosing XML_, AUTO_, or PATH_ Section Groups

Application Criteria	XML_section_...	AUTO_section_...	PATH_section_...
You are using XPATH search features	--	--	Yes
You know the layout and structure of your XML documents, and you can predefine the sections on which users are most likely to search.	Yes	--	--
You do not know which tags users are most likely to search.	--	Yes	--
Query performance, in general	Fastest	Little slower than XML_section_...	Little slower than AUTO_section_...
Indexing performance, in general	Fastest	Little slower than XML_section_...	Little slower than AUTO_section_...
Index size	Smallest	Little larger than XML_section_...	Little larger than AUTO_section_...
Other features	Mappings can be defined so that tags in one or different DTDs can be mapped to one section. Good for DTD evolution and data aggregation.	Simplest. No need to define mapping, add_stop_section can be used to ignore some sections.	Designed for more sophisticated XPATH- like queries

Creating a Section Preference with XML_SECTION_GROUP

The following command creates a section group called, xmlgroup, with the XML_SECTION_GROUP group type:

EXEC ctx_ddl.create_section_group('myxmlgroup', 'XML_SECTION_GROUP');

Creating a Section Preference with AUTO_SECTION_GROUP

You can set up your indexing operation to automatically create sections from XML documents using the section group AUTO_SECTION_GROUP. Here, Oracle creates zone sections for XML tags. Attribute sections are created for those tags that have attributes, and these attribute sections are named in the form "tag@attribute."

The following command creates a section group called autogroup with the AUTO_SECTION_GROUP group type. This section group automatically creates sections from tags in XML documents.

EXEC ctx_ddl.create_section_group('autogroup', 'AUTO_SECTION_GROUP');

Note:

You can add attribute sections only to XML section groups. When you use AUTO_SECTION_GROUP, attribute sections are created automatically. Attribute sections created automatically are named in the form tag@attribute.

Creating a Section Preference with PATH_SECTION_GROUP

To enable path section searching, index your XML document with PATH_SECTION_GROUP. For example:

EXEC ctx_ddl.create_section_group('xmlpathgroup', 'PATH_SECTION_GROUP');

Step 2. Set the Preference's Attributes

To set the preference's attributes for XML_SECTION_GROUP, use the following procedures:

Add_Zone_Section
Add_Attr_Section
Add_Field_Section
Add_Special_Section

To set the preference's attributes for AUTO_SECTION_GROUP and PATH_SECTION_GROUP, use the following procedures:

Add_Stop_Section

There are corresponding CTX_DDL.DROP sections and CTX_DDL.REMOVE section commands.

2.1 XML_SECTION_GROUP: Using CTX_DDL.add_zone_section

The syntax for CTX_DDL.add_zone_section follows:

CTX_DDL.Add_Zone_Section (
 group_name      => 'my_section_group' /* whatever you called it in the 
preceding section */
 section_name    => 'author' /* what you want to call this section */
 tag             => 'my_tag' /* what represents it in XML */ );

where 'my_tag' implies opening with <my_tag> and closing with </my_tag>.

add_zone_section Guidelines

add_zone_section guidelines are listed here:

Call CTX_DDL.Add_Zone_Section for each tag in your XML document that you need to search on.

2.2 XML_SECTION_GROUP: Using CTX_DDL.Add_Attr_Section

The syntax for CTX_DDL.ADD_ATTR_SECTION follows:

CTX_DDL.Add_Attr_Section ( /* call this as many times as you need to describe
                             the attribute sections */
 group_name      => 'my_section_group' /* whatever you called it in the 
preceding section */
 section_name    => 'author' /* what you want to call this section */
 tag             => 'my_tag' /* what represents it in XML */ );

where 'my_tag' implies opening with <my_tag> and closing with </my_tag>.

Add_Attr_Section Guidelines

Add_Attr_Section guidelines are listed here:

Consider meta_data attribute author:

<meta_data author = "John Smith" title="How to get to Mars">

ADD_ATTR_SECTION adds an attribute section to an XML section group. This procedure is useful for defining attributes in XML documents as sections. This enables searching XML attribute text with the WITHIN operator.

The section_name:

Is the name used for WITHIN queries on the attribute text.
Cannot contain the colon (:) or dot (.) characters.
Must be unique within group_name.
Is case-insensitive.
Can be no more than 64 bytes.

The tag specifies the name of the attribute in tag@attr format. This is case-sensitive.

Note:

In the ADD_ATTR_SECTION procedure, you can have many tags all represented by the same section name at query time. Explained in another way, the names used as the arguments of the keyword WITHIN can be different from the actual XML tag names. That is many tags can be mapped to the same name at query time. This feature enhances query usability.

2.3 XML_SECTION_GROUP: Using CTX_DDL.Add_Field_Section

The syntax for CTX_DDL.Add_Field_Section follows:

CTX_DDL.Add_Field_Section (
 group_name     => 'my_section_group' /* whatever you called it in the preceding 
section */
 section_name   => 'qq' /* what you want to call this section */
 tag            => 'my_tag' /* what represents it in XML */ );
 visible        => TRUE or FALSE );

Add_Field_Section Guidelines

Add_Field_Section guidelines are listed here:

Searches using Field_Sections are faster than those using Zone_Section.
Visible attribute: This is available in Add_Field_Section but not available in the Add_Zone_section. If VISIBLE is set to TRUE then the text within the Field section will be indexed as part of the enclosing document. For example:
```
<state> Virginia </state>
CTX_DDL.Add_Field_Section (
```
```
 group_name     => 'my_section_group' 
 section_name   => 'state'
 tag            => 'state'
 visible        => TRUE or FALSE );
```
If visible is set to TRUE, then searching on Virginia without specifying the state Field section produces a hit.

If visible is set to FALSE, then searching on Virginia without specifying the state Field section does not produce a hit.

How Attr_Section Differs from Field_Section

Attribute section differs from Field section in the following ways:

Attribute text is considered invisible, hence the following clause:
```
WHERE CONTAINS (..., '... jeeves',...)...
```
does NOT find the document. This is similar to when Field sections have visible set to FALSE. Unlike Field sections, however, Attribute section within searches can distinguish between occurrences. Consider the document:
```
<comment author="jeeves">
    I really like Oracle Text
</comment>
<comment author="bertram">
    Me too
</comment>
```
the query:
```
WHERE CONTAINS (...,'(cryil and bertram) WITHIN author', ...)...
```
will NOT find the document, because "jeeves" and "bertram" do not occur within the SAME attribute text.

Attribute section names cannot overlap with zone or field section names although you can map more than one tag@attr to a single section name. Attribute sections do not support default values. Given the document:
```
<!DOCTYPE foo [
    <!ELEMENT foo (bar)>
    <!ELEMENT bar (#PCDATA)>
```
```
    <!ATTLIST bar
        rev CDATA "8i">

]>
<foo>
    <bar>whatever</bar>
</foo>
```
and attribute section:
```
ctx_ddl.add_attr_section('mysg','barrev','bar@rev');
```
the query:

8i within barrev does not hit the document, although in XML semantics, the "bar" element has a default value for its "rev" attribute.

2.5 AUTO_SECTION_GROUP: Using CtX_DDL.Add_Stop_Section

CtX_DDL.Add_Stop_Section (
group_name     => 'my_section_group' /* whatever you called it in the preceding 
section */
section_name   => 'qq' /* what you want to call this section */ );

Step 3. Create an Index Using the Section Preference Created in Step 2

Create an index depending on which section group you used to create a preference:

Creating an Index Using XML_SECTION_GROUP

To index your XML document when you have used XML_SECTION_GROUP, you can use the following statement:

CREATE INDEX myindex ON docs(htmlfile) INDEXTYPE IS ctxsys.context 
     parameters('section group xmlgroup');

Creating an Index Using AUTO_SECTION_GROUP

The following statement creates the index, myindex, on a column containing XML files using the AUTO_SECTION_GROUP:

CREATE INDEX myindex ON xmldocs(xmlfile) INDEXTYPE IS ctxsys.context PARAMETERS 
('section group autogroup');

Creating an Index Using PATH_SECTION_GROUP

To index your XML document when you have used PATH_SECTION_GROUP, you can use the following statement:

CREATE INDEX myindex ON xmldocs(xmlfile) INDEXTYPE IS ctxsys.context PARAMETERS 
('section group xmlpathgroup');

See Also:

Oracle Text Reference for detailed notes on CTX_DDL.

Example 7-3 Creating an Index Using XML_SECTION_GROUP

EXEC ctx_ddl_create_section_group('myxmlgroup', 'XML_SECTION_GROUP');

/* ADDING A FIELD SECTION */
EXEC ctx_ddl.Add_Field_Section /* THIS IS KEY */
  (  group_name   =>'my_section_group',
     section_name =>'author',/* do this for EVERY tag used after "WITHIN" */
     tag          =>'author'
  );

EXEC ctx_ddl.Add_Field_Section /* THIS IS KEY */
      (  group_name   =>'my_section_group',
         section_name =>'document',/*do this for EVERY tag after "WITHIN" */
         tag          =>'document'
      );

  ...
/
/* ADDING AN ATTRIBUTE SECTION */
EXEC ctx_ddl.add_attr_section('myxmlgroup', 'booktitle', 'book@title');

/* The more sections you add to your index, the longer your search will take.*/
/* Useful for defining attributes in XML documents as sections. This allows*/
/* you to search XML attribute text using the WITHIN operator.*/ 
/*  The section name:
/* ** Is used for WITHIN queries on the attribute text. 
   ** Cannot contain the colon (:) or dot (.) characters. 
   ** Must be unique within group_name. 
   ** Is case-insensitive. 
   ** Can be no more than 64 bytes. 
   **  The tag specifies the name of the attribute in tag@attr format. This is
       case-sensitive. */
/* Names used as arguments of the keyword WITHIN can be different from the
   actual XML tag names. Many tags can be mapped to the same name at query
   time.*/ 
/* Call CTX_DDL.Add_Zone_Section for each tag in your XML document that you need 
to search on. */

EXEC ctx_ddl.add_zone_section('myxmlgroup', 'mybooksec', 'mydocname(book)');

CREATE INDEX my_index ON my_table ( my_column )
  INDEXTYPE IS ctxsys.context
  PARAMETERS ( 'SECTION GROUP my_section_group' );

SELECT my_column FROM my_table
  WHERE CONTAINS(my_column, 'smith WITHIN author') > 0;

Step 4. Create Your Query Syntax

See the section, "Querying with the CONTAINS Operator" for information about how to use the CONTAINS operator in query statements.

Querying Within Attribute Sections

You can query within attribute sections when you index with either XML_SECTION_GROUP or AUTO_SECTION_GROUP as your section group type.

Assume you have an XML document as follows:

<book title="Tale of Two Cities">It was the best of times.</book>

You can define the section title@book as the attribute section title. You can do so with the CTX_DLL.Add_Attr_Section procedure or dynamically after indexing with ALTER INDEX.

Note:

When you use the AUTO_SECTION_GROUP to index XML documents, the system automatically creates attribute sections and names them in the form attribute@tag.

If you use the XML_SECTION_GROUP, you can name attribute sections anything with CTX_DDL.ADD_ATTR_SECTION.

To search on Tale within the attribute section title, issue the following query:

WHERE CONTAINS (...,'Tale WITHIN title', ...)

When you define the TITLE attribute section as such and index the document set, you can query the XML attribute text as follows:

... WHERE CONTAINS (...,'Cities WITHIN booktitle', ....)...

When you define the AUTHOR attribute section as such and index the document set, you can query the XML attribute text as follows:

... WHERE 'England WITHIN authors'

Example 7-4 Querying an XML Document

This example does the following:

Creates and populates table res_xml
Creates an index, section_group, and preferences
Paramaterizes the preferences

Runs a test query against res_xml

drop table res_xml;

CREATE TABLE res_xml (
  pk            NUMBER PRIMARY KEY ,
  text          CLOB
  ) ;

insert into res_xml values(111,
  'ENTITY chap8 "Chapter 8, <q>Keeping it Tidy: the XML Rule Book </q>"> this is 
the document section');
commit;

---
--- script to create index on res_xml
---

--- cleanup, in case we have run this before
DROP INDEX res_index ;
EXEC CTX_DDL.DROP_SECTION_GROUP ( 'res_sections' ) ;

--- create a section group
BEGIN
  CTX_DDL.CREATE_SECTION_GROUP ( 'res_sections', 'XML_SECTION_GROUP' ) ;
  CTX_DDL.ADD_FIELD_SECTION ( 'res_sections', 'chap8', '<q>') ;
END ;
/

begin
  ctx_ddl.create_preference
    (
      preference_name => 'my_basic_lexer',
      object_name     => 'basic_lexer'
    );
  ctx_ddl.set_attribute
    (
      preference_name => 'my_basic_lexer',
      attribute_name  => 'index_text',
      attribute_value => 'true'
    );
  ctx_ddl.set_attribute
    (
      preference_name => 'my_basic_lexer',
      attribute_name  => 'index_themes',
      attribute_value => 'false');
end;
/

CREATE INDEX res_index
  ON res_xml(text)
  INDEXTYPE IS ctxsys.context 
  PARAMETERS ( 'lexer my_basic_lexer SECTION GROUP res_sections' ) ;

Test the preceding index with a test query, such as:

SELECT pk FROM res_xml WHERE CONTAINS( text, 'keeping WITHIN chap8' )>0 ;

Example 7-5 Creating an Index and Performing a Text Query

drop table explain_ex; 
 
create table explain_ex 
  ( 
    id        number primary key,
    text      varchar(2000)
  ); 
 
insert into explain_ex ( id, text )
  values ( 1, 'thinks thinking thought go going goes gone went' || chr(10) ||
              'oracle orackle oricle dog cat bird'              || chr(10) ||
              'President Clinton' ); 
insert into explain_ex ( id, text )
  values ( 2, 'Last summer I went to New England'               || chr(10) ||
              'I hiked a lot.'                                  || chr(10) ||
              'I camped a bit.' ); 
commit;

Example 7-6 Text Query Using "ABOUT" in the Text Query Expression

Set Define Off
select text 
  from explain_ex
  WHERE CONTAINS ( text,
  '( $( think & go ) , ?oracle ) & ( dog , ( cat & bird ) ) & about(mammal
                                                    during Bill Clinton)' ) > 0;
select text 
  from explain_ex
  WHERE CONTAINS ( text, 'about ( camping and hiking in new england )'  ) > 0;

Example 7-7 Creating an Index Using AUTO_SECTION_GROUP

ctx_ddl_create_section_group('auto', 'AUTO_SECTION_GROUP');

CREATE INDEX myindex ON docs(xmlfile_column) INDEXTYPE IS ctxsys.context PARAMETERS ('filter ctxsys.null_filter SECTION GROUP auto'); SELECT xmlfile_column FROM docs WHERE CONTAINS (xmlfile_column, 'virginia WITHIN title')>0;

Example 7-8 Creating an Index Using PATH_SECTION_GROUP

EXEC ctx_ddl.create_section_group('xmlpathgroup', 'PATH_SECTION_GROUP');

CREATE INDEX myindex ON xmldocs(xmlfile_column) INDEXTYPE IS ctxsys.context PARAMETERS ('section group xmlpathgroup');

SELECT xmlfile_column FROM xmldocs ... WHERE CONTAINS (column, 'Tale WITHIN title@book')>0;

Example 7-9 Using XML_SECTION_GROUP and add_attr_section to Aid Querying

Consider an XML file that defines the BOOK tag with a TITLE attribute as follows:

<BOOK TITLE="Tale of Two Cities"> 
It was the best of times. </BOOK> 
<Author="Charles Dickens">
Born in England in the town, Stratford_Upon_Avon </Author>

Recall the CTX_DDL.ADD_ATTR_SECTION syntax is:

CTX_DDL.Add_Attr_Section ( group_name, section_name, tag );

To define the title attribute as an attribute section, create an XML_SECTION_GROUP and define the attribute section as follows:

ctx_ddl_create_section_group('myxmlgroup', 'XML_SECTION_GROUP');
ctx_ddl.add_attr_section('myxmlgroup', 'booktitle', 'book@title');
ctx_ddl.add_attr_section('myxmlgroup', 'authors', 'author');
end;

Note:

Oracle knows what the end tags look like from the group_type parameter you specify when you create the section group. The start tag you specify must be unique within a section group.
Section names need not be unique across tags. You can assign the same section name to more than one tag, making details transparent to searches.

Presenting the Results of Your Query

An Oracle Text query application enables viewing documents returned by a query. You typically select a document from the hit list and then your application presents the document in some form.

With Oracle Text, you can render a document in different ways. For example, with the query terms highlighted. Highlighted query terms can be either the words of a word query or the themes of an ABOUT query in English. This rendering uses the CTX_DOC.HIGHLIGHT or CTX_DOC.MARKUP procedures.

You can also obtain theme information from documents with the CTX_DOC.THEMES PL/SQL package. Besides these there are several other CTX_DOC procedures for presenting your query results.

INPATH does not support working with highlighting or themes.

See Also:

Oracle Text Reference for more information on the CTX_DOC package.

XMLType Indexing

The Oracle9i datatype for storing XML, XMLType, is a core database feature.

You Need Query Rewrite Privileges

Note:

These privileges are only required for Oracle9i Release 1 (9.0.1).

You can create an Oracle Text index on this type, but you need a few database privileges first:

The user creating the index must have Query Rewrite privileges:
```
GRANT QUERY REWRITE TO <user>
```
Without this privilege, the create index will fail with:
```
ORA-01031: insufficient privileges
```
<user> should be the user creating the index. The database schema that owns the index, if different, does not need the grant.

query_rewrite_enabled should be true, and query_rewrite_integrity should be trusted. You can add them to the init.ora file:

query_rewrite_enabled=true
query_rewrite_integrity=trusted

or turn it on for the session as follows:

ALTER SESSION SET query_rewrite_enabled=true;
ALTER SESSION SET query_rewrite_integrity=trusted;

Without these, queries will fail with:

DRG-10599: column is not indexed

These privileges are needed because XMLType is really an object, and you access it through a function, hence an Oracle Text index on an XMLType column is actually a function-based index on the getclobval() method of the type. These are the standard grants you need to use function-based indexes, however, unlike function-based B-Tree indexes, you do not need to calculate statistics.

Note:

Oracle9i SQL Reference under CREATE INDEX, states:

To create a function-based index in your own schema on your own table, in addition to the prerequisites for creating a conventional index, you must have the QUERY REWRITE system privilege.

To create the index in another schema or on another schema's table, you must have the GLOBAL QUERY REWRITE privilege. In both cases, the table owner must also have the EXECUTE object privilege on the function(s) used in the function-based index.

In addition, in order for Oracle to use function-based indexes in queries, the QUERY_REWRITE_ENABLED parameter must be set to TRUE, and the QUERY_REWRITE_INTEGRITY parameter must be set to TRUSTED.

System Parameter is Set to the Default, CTXSYS.PATH_SECTION_GROUP

When an XMLType column is detected, and no section group is specified in the parameters string, the default system examines the new system parameter DEFAULT_XML_SECTION, and uses the section group specified there. At install time this system parameter is set to CTXSYS.PATH_SECTION_GROUP, which is the default path sectioner.

The default filter system parameter for XMLType is DEFAULT_FILTER_TEXT, which means that the INSO filter is not engaged by default.

XMLType Indexes Work Like Other Oracle Text Indexes

Other than the database privileges and the special default section group system parameter, indexes on XMLType columns work like any other Oracle Text index.

Example 7-10 Creating a Text Index on XMLType Columns

Here is a simple example:

connect ctxsys/ctxsys
GRANT QUERY REWRITE TO xtest;
connect xtest/xtest

CREATE TABLE xtest(doc sys.xmltype);
INSERT INTO xtest VALUES (sys.xmltype.createxml('<A>simple</A>'));

CREATE INDEX xtestx ON xtest(doc)
   INDEXTYPE IS ctxsys.context;
ALTER SESSION SET query_rewrite_enabled = true;
ALTER SESSION SET query_rewrite_integrity = trusted;

SELECT a.doc.getclobval() FROM xtest a
   WHERE CONTAINS (doc, 'simple INPATH(A)')>0;

Using Oracle Text with Oracle XML DB

Creating an Oracle Text Index on an UriType Column

UriType columns can be indexed natively in Oracle9i database using Oracle Text. No special datastore is needed.

Example 7-11 Creating an Oracle Text Index on a UriType Column

For example:

CREATE TABLE table uri_tab ( url sys.httpuritype);

INSERT INTO uri_tab VALUES
    (sys.httpuritype.createUri('http://www.oracle.com'));

CREATE INDEX urlx ON uri_tab(url) INDEXTYPE IS ctxsys.context;

SELECT url FROM uri_tab WHERE CONTAINS(url, 'Oracle')>0;

Table 7-5 lists system parameters used for default preference names for Oracle Text indexing, when the column type is UriType:

Table 7-5 rUriType Column Default Preference Names for Oracle Text Indexing

URIType Column	Default Preference Names
DATASTORE	DEFAULT_DATASTORE
FILTER	DEFAULT_FILTER_TEXT
SECTION GROUP	DEFAULT_SECTION_HTML
LEXER	DEFAULT_LEXER
STOPLIST	DEFAULT_STOPLIST
WORDLIST	DEFAULT_WORDLIST
STORAGE	DEFAULT_STORAGE

Querying XML Data: Use CONTAINS or existsNode()?

Oracle9i Release 1(9.0.1) introduced the Oracle Text PATH_SECTION_GROUP, INPATH(), and HASPATH() query operators. These allow you to do XPath-like text query searches on XML documents using the CONTAINS operator. CONTAINS, however, supports only a subset of XPath functionality. Also, there are important semantic differences between the CONTAINS operator and the existsNode() function.

The existsNode, extract() and extractValue() SQL functions (and the corresponding member functions of XMLType) provide full XPath support. This release of Oracle9i also introduces new extension functions to XPath to support full text searches.

Note:

This release does not support theme querying for Oracle Text CONTAINS() and existsNode() searching.

Table 7-6 lists and compares CONTAINS() and existsNode() features for searching XMLType data.

Table 7-6 Using CONTAINS() and existsNode() to Search XMLType Data

Feature	CONTAINS()	existsNode()
XPath Conformance	--	--
Predicate Support	--	--
String equality	Y	Y
Numerical equality	N	Y
Range Predicates	N	Y
XPath functions	N	Y
Spaces	N	Y
Namespaces	N	Y
Value case sensitivity	N	Y
Entity handling	N	Y
Parent-ancestor and sibling axes	N	Y
Attribute searching under wild cards. For example, */@A or ../	Y	Y
Uses XML schema or DTD information	N	Y
Empty elements may lead to false matches	Y	N
Synchronous	--	--
DML	N	CTXXPath = N Other indexes = Y
Query	N	Y
Linguistic search capability	In INPATH() -> Y	Using ora:contains() -> Y
Index type	ctxsys.context	ctxsys.ctxxpath
Query rewrites	N	Y, if XML schema-based and stored object-relationally
Functional indexes	N	Y. Can create Functional Index on existsNode() and extractValue() expressions.
Features supported if context index is already built	--	--
About	Y	N
Highlighting	Y	N
Text searching in general	Supports full text searching.	Supports limited text searching with ora:contains.
XPath searching in general	Limited XPath searching. Non-synchronous.	Full XPath searching. Synchronous.

Full-Text Search Functions in XPath Using ora:contains

Though XPath specifies a set of builtin text functions such as substring() and CONTAINS(), these are considerably less powerful than Oracle's full text search capabilities. New XPath extension functions are defined by Oracle to enable a richer set of text search capabilities. These extension functions are defined within the Oracle XML DB namespace : http://xmlns.oracle.com/xdb.

They can be used within XPath queries appearing in the context of existsNode(), extract() and extractValue() functions operating on XMLType instances.

Note:

Like other procedures in CTX_DDL package, you must have CTXAPP privilege in order to execute the CTX_DDL.CREATE_POLICY() procedure.

ora:contains Features

The following lists the ora:contains features:

The text search extension functions support most of text query operators such as stemming, fuzzy matching, and proximity search.
These functions do not require a ConText index for their evaluation.
The score values computed by these functions may differ from the regular index based query processing (through Contains SQL operator). Due to absence of document statistics, the weight for each term is fixed to 10. This means that a score for a word search is the number of occurrence multiplied by 10. If it exceeds 100, it is truncated to 100. This is also true for fuzzy matched terms.

ora:contains Syntax

The following is the syntax for the ora:contains function:

number contains(string, string, string?, string?)

where:

string, the first argument is input text value
string, the second argument is the text query string
string?, the optional third argument is the policy name
string?, the optional fourth argument is the policy owner

The contains extension function in the Oracle XML DB namespace, takes the input text value as the first argument and the text query string as the second argument. It returns the score value - a number between 0 and 100.

The optional third and fourth arguments can be used to specify the name and owner of the CTX policy which is to be used for processing the text query. If the third argument is not specified, it defaults to the CTX policy named DEFAULT_POLICY_ORACONTAINS owned by CTXSYS. If the fourth argument is not specified, the policy owner is assumed to be the current user.

ora:contains Examples

Assume the table xmltab contains XML documents corresponding to books with embedded chapters, each chapter containing a title and a body.

<book>
  <chapter>
    <title>...</title>
    <body>...</body>
  </chapter>
  <chapter>
    <title>...</title>
    <body>...</body>
  </chapter>
  ...
</book>

Example 7-12 Using ora:contains to Find a Text Query String

Find books containing a chapter whose body contains the specified text query string:

select * from xmltab x where 
  existsNode(value(x), '/book/chapter[ora:contains(body,"dog OR cat")>0]', 
  'xmlns:ora="http://xmlns.oracle.com/xdb"') = 1;

Example 7-13 Using ora:contains and extract() to Find a Text Query String

Extract chapters whose body contains the specified text query string.

select extract(value(x), 
          '/book/chapter[ora:contains(body,"dog OR cat")>0]', 
          'xmlns:ora="http://xmlns.oracle.com/xdb"')
from xmltab x;

Oracle XML DB: Creating a Policy for ora:contains()

This section includes syntax and examples for creating, updating, and dropping a policy for ora:contains():

The following CTX_DDL procedures creates/updates/drops a policy used by ora:contains():

A policy is a set of preferences used for processing ora:contains().

See Also:

for a description of the Oracle Text preferences.

Table 7-7 describes the CTX_DDL functions for creating, updating, and dropping policies for use in your XPath searches.

Table 7-7 CTX_DDL Syntax for Creating, Updating, and Dropping Policies

CTX_DDL Function	Description
CREATE_POLICY CTX_DDL.create_policy(policy_name in varchar2, filter in varchar2 default NULL, section_group in varchar2 default NULL, lexer n varchar2 default NULL, stoplist in varchar2 default NULL, wordlist in varchar2 default NULL);	Defines a policy. Arguments: policy_name - the name for the new policy filter - the filter preference to use (reserved for future use) section_group - the section group to use (currently only NULL_SECTION_GROUP is supported) lexer - the lexer preference to use. This should not have theme indexing turned on. stoplist - the stoplist preference to use wordlist - the wordlist preference to use
UPDATE_POLICY CTX_DDL.update_policy( policy_name in varchar2, filter in varchar2 default NULL, section_group in varchar2 default NULL, lexer in varchar2 default NULL, stoplist in varchar2 default NULL, wordlist in varchar2 default NULL);	Updates a policy by replacing specified preferences. Arguments: policy_name - the name for the policy filter - the new filter preference to use (reserved for future use) section_group - the new section group to use (currently only NULL_SECTION_GROUP is supported) lexer - the new lexer preference. This should not have theme indexing turned on. stoplist - the new stoplist preference to use wordlist - the new wordlist preference to use
DROP_POLICY CTX_DDL.drop_policy(policy_name in varchar2);	Deletes a policy. Arguments: policy_name - the name of the policy

CTX_DDL Function

Description

CREATE_POLICY

CTX_DDL.create_policy(policy_name in varchar2, filter in varchar2 default NULL, section_group in varchar2 default NULL, lexer n varchar2 default NULL, stoplist in varchar2 default NULL, wordlist in varchar2 default NULL);

Defines a policy.

Arguments:

policy_name - the name for the new policy

filter - the filter preference to use (reserved for future use)

section_group - the section group to use (currently only NULL_SECTION_GROUP is supported)

lexer - the lexer preference to use. This should not have theme indexing turned on.

stoplist - the stoplist preference to use

wordlist - the wordlist preference to use

UPDATE_POLICY

CTX_DDL.update_policy( policy_name in varchar2, filter in varchar2 default NULL, section_group in varchar2 default NULL, lexer in varchar2 default NULL, stoplist in varchar2 default NULL, wordlist in varchar2 default NULL);

Updates a policy by replacing specified preferences.

Arguments:

policy_name - the name for the policy

filter - the new filter preference to use (reserved for future use)

section_group - the new section group to use (currently only NULL_SECTION_GROUP is supported)

lexer - the new lexer preference. This should not have theme indexing turned on.

stoplist - the new stoplist preference to use

wordlist - the new wordlist preference to use

DROP_POLICY

CTX_DDL.drop_policy(policy_name in varchar2);

Deletes a policy.

Arguments:

policy_name - the name of the policy

Example 7-14 Creating a Policy for ora:contains

Create lexer preference named mylex:

begin 
   ctx_ddl.create_preference('mylex', 'BASIC_LEXER'); 
   ctx_ddl.set_attribute('mylex', 'printjoins', '_-'); 
   ctx_ddl.set_attribute ( 'mylex', 'index_themes', 'NO');  
   ctx_ddl.set_attribute ( 'mylex', 'index_text', 'YES'); 
end;

Create a stoplist preference named mystop.

begin 
  ctx_ddl.create_stoplist('mystop', 'BASIC_STOPLIST'); 
  ctx_ddl.add_stopword('mystop', 'because');
  ctx_ddl.add_stopword('mystop', 'nonetheless');
  ctx_ddl.add_stopword('mystop', 'therefore');
end;

Create a wordlist preference named 'mywordlist'.

begin 
 ctx_ddl.create_preference('mywordlist', 'BASIC_WORDLIST');
 ctx_ddl.set_attribute('mywordlist','FUZZY_MATCH','ENGLISH'); 
 ctx_ddl.set_attribute('mywordlist','FUZZY_SCORE','0'); 
 ctx_ddl.set_attribute('mywordlist','FUZZY_NUMRESULTS','5000'); 
 ctx_ddl.set_attribute('mywordlist','SUBSTRING_INDEX','TRUE'); 
 ctx_ddl.set
_attribute('mywordlist','STEMMER','ENGLISH'); 
end; 
 
exec ctx_ddl.create_policy('my_policy', NULL, NULL, 'mylex', 'mystop', 
'mywordlist');

exec ctx_ddl.create_policy(policy_name => 'my_policy', 
                           lexer => 'mylex',
                           stoplist => 'mystop',
                           wordlist => 'mywordlist');

Then you can issue the following existsNode() query with your own defined policy:

select * from xmltab x where 
   existsNode(value(x), 
     '/book/chapter[ora:contains(body,"dog OR cat", "my_policy")>0]', 
     'xmlns:ora="http://xmlns.oracle.com/xdb"') = 1;

You can also update your policy by using the following:

exec ctx_ddl.update_policy(policy_name => 'my_policy', 
                           lexer => 'my_new_lex');

You can drop your policy by using:

exec ctx_ddl.drop_policy(policy_name => 'my_policy');

Querying Using Other User's Policy

You can also issue the existsNode() query using another user's policy, in this case, using Scott's policy:

Example 7-15 Querying Another User's Policy

select * from xmltab x where 
  existsNode(value(x), 
    '/book/chapter[ora:contains(body,"dog OR cat", "Scotts_policy","Scott")>0]', 
    'xmlns:ora="http://xmlns.oracle.com/xdb"') = 1;

Oracle XML DB: Using CTXXPATH Indexes for existsNode()

The existsNode() SQL function, unlike the CONTAINS operator, cannot use ConText indexes to speed up its evaluation. To improve the performance of XPath searches in existsNode(), this release introduces a new index type, CTXXPATH.

The CTXXPATH index is a new indextype provided by Oracle Text. It is designed to serve as a primary filter for existsNode() processing, that is, it produces a superset of the results that would be produced by the existNode() function.

Why do We Need CTXXPATH When ConText Indexes Can Perform XPath Searches?

The existing ConText index type already has some XPath searching capabilities, but the ConText index type has some limitations:

For the ConText index to be usable as a primary filter for existsNode(),
- You must create the index using PATH_SECTION_GROUP.
- You cannot create the index with USER_LEXER or MULTI_LEXER preference.
- You must create the index with DIRECT DATASTORE.
- You must create the index with NULL FILTER.
This limits the linguistic searching capabilities that ConText index type provides.
The ConText index is asynchronous and does not follow the same transactional semantics as existsNode().
The ConText index does not handle namespaces nor user-defined entities.

With all these limitations in mind, CTXXPATH index type was designed specifically to serve the purpose of existsNode() primary filter processing. You can still create ConText indexes with whichever preferences you need on XMLType columns, and this will be used to speed up CONTAINS operators. At the same time, you can create a CTXXPATH index to speedup the processing of existsNode().

CTXXPATH Index Type

CTXXPATH index type has the following characteristics:

This index can only be used to speed up existsNode() processing. It acts as a primary filter for the existsNode() function. In other words, it provides a superset of the results that existsNode() would provide
The index can only handle a limited set of XPath expressions. See the Section , "Choosing the Right Plan: Using CTXXPATH Index in existsNode() Processing" for the list of XPath expressions not supported by the index.
The only supported parameter is the TABLESPACE parameter. See "Creating CTXXPATH Storage Preferences with CTX_DDL. Statements" .
DMLs are asynchronous. Users are required to issue a special DDL command to synchronize the DMLs, similar to that of ConText index.
Despite the asynchronous nature of DML, it still follows transactional semantics of existsNode() by also returning unindexed rows as part of its result set in order to guarantee its requirement of returning a superset of the valid results.

Creating CTXXPATH Indexes

You create CTXXPATH indexes the same way you create ConText indexes. The syntax is the same as that of ConText index:

CREATE INDEX [schema.]index ON [schema.]table(XMLType column)
   INDEXTYPE IS ctxsys.CTXXPATH [PARAMETERS(paramstring)];

where

paramstring = '[storage storage_pref] [memory memsize] [populate | nopopulate]'

Example 7-16 Creating CTXXPATH Indexes

For example:

CREATE INDEX xml_idx ON xml_tab(col_xml) indextype is ctxsys.CTXXPATH;

CREATE INDEX xml_idx ON xml_tab(col_xml) indextype is ctxsys.CTXXPATH 
          PARAMETERS('storage my_storage memory 40M');

The index can only be used to speed up queries using existsNode():

SELECT xml_id FROM xml_tab x WHERE
     x.col_xml.existsnode('/book/chapter[@title="XML"]')>1;

See Also:

Chapter 4, "Using XMLType" for more information on using existsNode().

Creating CTXXPATH Storage Preferences with CTX_DDL. Statements

The only preference allowed for CTXXPATH index type is the STORAGE preference. You create the STORAGE preference the same way you would for a ConText index type.

Example 7-17 Creating Storage Preferences for CTXXPATH Indexes

For example:

begin 
ctx_ddl.create_preference('mystore', 'BASIC_STORAGE'); 
ctx_ddl.set_attribute('mystore', 'I_TABLE_CLAUSE', 
                        'tablespace foo storage (initial 1K)'); 
ctx_ddl.set_attribute('mystore', 'K_TABLE_CLAUSE', 
                        'tablespace foo storage (initial 1K)'); 
ctx_ddl.set_attribute('mystore', 'R_TABLE_CLAUSE', 
                        'tablespace foo storage (initial 1K)'); 
ctx_ddl.set_attribute('mystore', 'N_TABLE_CLAUSE', 
                        'tablespace foo storage (initial 1K)'); 
ctx_ddl.set_attribute('mystore', 'I_INDEX_CLAUSE', 
                        'tablespace foo storage (initial 1K)'); 
end;

Performance Tuning CTXXPATH Index: Synchronizing and Optimizing the Index

To synchronize DMLs, you can use the SYNC_INDEX procedure provided in the CTX_DDL package.

Example 7-18 Optimizing the CTXXPATH Index

For example:

exec ctx_ddl.sync_index('xml_idx');

To optimize the CTXXPATH index, you can use the OPTIMIZE_INDEX() procedure provided in the CTX_DDL package. For example:

exec ctx_ddl.optimize_index('xml_idx', 'FAST');

exec ctx_ddl.optmize_index('xml_idx', 'FULL');

See Also:

Choosing the Right Plan: Using CTXXPATH Index in existsNode() Processing

It is not guaranteed that a CTXXPATH index will always be used to speed up existsNode() processing. The following is a list of reasons why Oracle Text index may not be used under existsNode():

The Cost Based Optimizer decides it is too expensive to use CTXXPATH index as primary filter.
The XPath expression cannot be handled by CTXXPATH index. Here are a list of XPath constructs CTXXPATH index cannot handle:
- XPATH functions.
- Numerical Range operators.
- Numerical equality.
- Arithmetic operators.
- Union operator "|"
- Existence of attribute
- Positional/Index predicate, that is, /A/B[5].
- Parent and sibling axes
- attribute following a *, //,.., in other words,'/A/*/@attr', '/A//@attr', '/A//../@attr'
- '.' or '*' at the end of the path expression.
- predicate following '.' or '*'.
- String literal equality is supported with the following restrictions:
  - The left hand side must be a path ('.' self by itself is not allowed,.="dog").
  - The right hand side must be a literal.

For the Optimizer to better estimate the costs and selectivities for the existsNode() function, you must gather statistics on CTXXPATH index by using ANALYZE command or DBMS_STATS package. You can analyze the index and compute the statistics using the ANALYZE command as follows:

ANALYZE INDEX myPathIndex COMPUTE STATISTICS;

or you can simply analyze the whole table:

ANALYZE TABLE XMLTAB COMPUTE STATISTICS;

Using Oracle Text: Advanced Techniques

The following sections describe several Oracle Text advanced techniques for fine-tuning your XML data search.

Highlight Support for INPATH/HASPATH Text Operators

Oracle Text provides the CTX_DOC.HIGHLIGHT() procedure to generate highlight offsets and lengths for a Text query on a document. These offsets and lengths are generated for the terms in the document that satisfy a word query, phrase query, or about query. In Oracle9i Release 2 (9.2.0.2), Oracle Text extends highlight support for INPATH and HASPATH operators.

Highlighting XML Documents with INPATH

For INPATH, CTX_DOC.HIGHTLIGHT() calculates the offset and length for the left hand child of the INPATH operator just as with the WITHIN operator. This only applies to cases where the path child points to an element. For example, if the Text query expression is:

'txt INPATH(/A/B)'  or 
'txt INPATH(/A/B[@attr="atxt" and .="Btxt"])'

then CTX_DOC.HIGHTLIGHT() generates offsets and lengths for all occurrences of 'txt' in the document satisfying the INPATH query.

If the path child points to an attribute, then nothing is highlighted. For example, if the Text query expression is:

'atxt INPATH(/A/B/@attr)'

then no highlight information is generated.

Highlighting XML Documents with HASPATH

For HASPATH, if its path operand points to an element, CTX_DOC.HIGHTLIGHT() calculates the offset and length for the bodies of the element pointed to by the XPath expression. For example, if the Text query expression is:

'HASPATH(/A/B)' or 
'HASPATH(/A/B[@att="atxt"])'

then offsets and lengths are calculated for the bodies of elements pointed to by /A/B.

If the path operand points to an attribute, such as, 'HASPATH(/A/B/@Battr)', then no highlight information is generated.

If the operand does WITHIN-EQUAL/SECTION-EQUAL testing, then CTX_DOC.HIGHTLIGHT() outputs offsets and lengths of the elements pointed to by the path child of '='. If the path child of '=' points to an attribute, then no highlight information is generated. For example, if the Text query expression is:

'HASPATH(/A/B = "ABtxt")' or 
'HASPATH(/A/B[@att="atxt"]= "ABtxt")'

then offsets and lengths are generated for the bodies of elements pointed to by /A/B. On the other hand, if the Text query expression is:

'HASPATH(/A/B/@att = "atxt")'

then no highlight information is generated because the path child '/A/B/@att' points to an attribute, not an element.

Note:

While CTX_DOC.HIGHTLIGHT() calculates and outputs the offsets and lengths, CTX_DOC.MARKUP() goes one step further by returning a version of the document with query terms or element bodies marked up using specified tags.

Note that mark-ups can make the target XML document invalid.

Distinguishing Tags Across DocTypes

In previous releases, XML_SECTION_GROUP was unable to distinguish between tags in different DTDs. For example, suppose you use the following DTD for storing contact information:

<!DOCTYPE contact>
<contact>
   <address>506 Blue Pool Road</address>
   <email>dudeman@radical.com</email>
</contact>

Appropriate sections might look like:

ctx_ddl.add_field_section('mysg','email',  'email');
ctx_ddl.add_field_section('mysg','address','address');

This is fine until you have a different kind of document in the same table:

<!DOCTYPE mail>
 <mail>
   <address>dudeman@radical.com</address>
</mail>

Now your address section, originally intended for street addresses, starts picking up email addresses, because of tag collision.

Specifying Doctype Limiters to Distinguish Between Tags

Oracle8i release 8.1.5 and higher allow you to specify doctype limiters to distinguish between these tags across doctypes. Simply specify the doctype in parentheses before the tag as follows:

ctx_ddl.add_field_section('mysg','email','email');
ctx_ddl.add_field_section('mysg','address','(contact)address');
ctx_ddl.add_field_section('mysg','email','(mail)address');

Now when the XML section group sees an address tag, it will index it as the address section when the document type is contact, or as the email section when the document type is mail.

Doctype-Limited and Unlimited Tags in a Section Group

If you have both doctype-limited and unlimited tags in a section group:

ctx_ddl.add_field_section('mysg','sec1','(type1)tag1');
ctx_ddl.add_field_section('mysg','sec2','tag1');

Then the limited tag applies when in the doctype, and the unlimited tag applies in all other doctypes.

Querying is unaffected by this. The query is done on the section name, not the tag, so querying for an email address would be done like:

radical WITHIN email

which, since we have mapped two different kinds of tags to the same section name, finds documents independent of which tags are used to express the email address.

XML_SECTION_GROUP Attribute Sections

In Oracle8i Release 1(8.1.5) and higher, XML_SECTION_GROUP offers the ability to index and search within attribute values. Consider a document with the following lines:

<comment author="jeeves">
    I really like Oracle Text
</comment>

Now XML_SECTION_GROUP offers an attribute section. This allows the inclusion of attribute values to index. For example:

ctx_ddl.add_attr_section('mysg','author','comment@author');

The syntax is similar to other add_section calls. The first argument is the name of the section group, the second is the name of the section, and the third is the tag, in the form <tag_name>@<attribute_name>. This tells Oracle Text to index the contents of the author attribute of the comment tag as the section "author".

Query syntax is just like for any other section:

WHERE CONTAINS ( ... ,'jeeves WITHIN author...',...)...

and finds the document.

Attribute Value Sensitive Section Search

Attribute sections allow you to search the contents of attributes. They do not allow you to use attribute values to specify sections to search. For instance, given the document:

<comment author="jeeves">
    I really like Oracle Text
</comment>

You can find this document by asking:

jeeves within comment@author

which is equivalent to "find me all documents which have a comment element whose author attribute's value includes the word jeeves".

However, there you cannot currently request the following:

interMedia within comment where (@author = "jeeves")

in other words, "find me all documents where interMedia appears in a comment element whose author is jeeves". This feature -- attribute value sensitive section searching -- is planned for future versions of the product.

Dynamic Add Section

Because the section group is defined before creating the index, Oracle8i Release 1 (8.1.5) is limited in its ability to cope with changing structured document sets; if your documents start coming with new tags, or you start getting new doctypes, you have to re-create the index to start making use of those tags.

With Oracle8i Release 2 (8.1.6) and higher you can add new sections to an existing index without rebuilding the index, using alter index and the new add section parameters string syntax:

add zone  section <section_name> tag <tag>
add field section <section_name> tag <tag> [ visible | invisible ]

For instance, to add a new zone section named tsec using the tag title:

alter index <indexname> rebuild
parameters ('add zone section tsec tag title')

To add a new field section named asec using the tag author:

alter index <indexname> rebuild
parameters ('add field section asec tag author')

This field section would be invisible by default, just like when using ADD_FIELD_SECTION. To add it as visible field section:

alter index <indexname> rebuild
parameters ('add field section asec tag author visible')

Dynamic add section only modifies the index metadata, and does not rebuild the index in any way. This means that these sections take effect for any document indexed after the operation, and do not affect any existing documents -- if the index already has documents with these sections, they must be manually marked for re-indexing (usually with an update of the indexed column to itself).

This operation does not support addition of special sections. Those would require all documents to be re-indexed, anyway. This operation cannot be done using rebuild online, but it should be a fairly quick operation.

Constraints for Querying Attribute Sections

The following constraints apply to querying within attribute sections:

Regular queries on attribute text do not hit the document unless qualified in a within clause. Assume you have an XML document as follows:
```
<book title="Tale of Two Cities">It was the best of times.</book>
```
A query on Tale by itself does not produce a hit on the document unless qualified with WITHIN title@book. This behavior is like field sections when you set the visible flag set to false.
You cannot use attribute sections in a nested WITHIN query.
Phrases ignore attribute text. For example, if the original document looked like:
```
Now is the time for all good <word type="noun"> men </word> to come to the 
aid.
```
Then this document would hit on the regular query good men, ignoring the intervening attribute text.

WITHIN queries can distinguish repeated attribute sections. This behavior is like zone sections but unlike field sections. For example, for the following document:

<book title="Tale of Two Cities">It was the best of times.</book>
<book title="Of Human Bondage">The sky broke dull and gray.</book>

Assume the book is a zone section and book@author is an attribute section. Consider the query:

'(Tale and Bondage) WITHIN book@author'

This query does not hit the document, because tale and bondage are in different occurrences of the attribute section book@author.

Repeated Zone Sections

Zone sections can repeat. Each occurrence is treated as a separate section. For example, if <H1> denotes a heading section, they can repeat in the same documents as follows:

<H1> The Brown Fox </H1> 
<H1> The Gray Wolf </H1>

Assuming that these zone sections are named Heading.

The query:

WHERE CONTAINS (..., 'Brown WITHIN Heading', ...)...

returns this document.

But the query:

WHERE CONTAINS (...,' (Brown and Gray) WITHIN Heading',...)...

does not.

Overlapping Zone Sections

Zone sections can overlap each other. For example, if <B> and <I> denote two different zone sections, they can overlap in document as follows:

plain <B> bold <I> bold and italic </B> only italic </I>  plain

Nested Sections

Zone sections can nest, including themselves as follows:

<TD> 
  <TABLE>
  <TD>nested cell</TD>
  </TABLE>
</TD>

Using the WITHIN operator, you can write queries to search for text in sections within sections.

Nested Section Query Example

For example, assume the BOOK1, BOOK2, and AUTHOR zone sections occur as follows in documents doc1 and doc2:

doc1:

<book1><author>Scott Tiger</author> This is a cool book to read.</book1>

doc2:

<book2> <author>Scott Tiger</author> This is a great book to read.</book2>

Consider the nested query:

'Scott WITHIN author WITHIN book1'

This query returns only doc1.

Using Table CTX_OBJECTS and CTX_OBJECT_ATTRIBUTES View

The CTX_OBJECT_ATTRIBUTES view displays attributes that can be assigned to preferences of each object. It can be queried by all users.

Check out the structure of CTX_OBJECTS and CTX_OBJECT_ATTRIBUTE view, with the following DESCRIBE commands. Because we are only interested in querying XML documents in this chapter, we focus on XML_SECTION_GROUP and AUTO_SECTION_GROUP.

Describe ctx_objects
  SELECT obj_class, obj_name FROM ctx_objects
  ORDRR BY obj_class, obj_name;

The result is:

...
SECTION_GROUP                  AUTO_SECTION_GROUP    <<==
SECTION_GROUP                  BASIC_SECTION_GROUP
SECTION_GROUP                  HTML_SECTION_GROUP
SECTION_GROUP                  NEWS_SECTION_GROUP
SECTION_GROUP                  NULL_SECTION_GROUP
SECTION_GROUP                  XML_SECTION_GROUP     <<==

...

Describe ctx_object_attributes
SELECT oat_attribute FROM ctx_object_attributes
  WHERE oat_object = 'XML_SECTION_GROUP';

The result is:

OAT_ATTRIBUTE
-------------
ATTR
FIELD
SPECIAL
ZONE

SELECT oat_attribute FROM ctx_object_attributes
  WHERE oat_object = 'AUTO_SECTION_GROUP';

The result is:

OAT_ATTRIBUTE
-------------
STOP

Example 7-19 Case Study: See the following section.

Case Study: Searching XML-Based Conference Proceedings

This case study uses INPATH, HASPATH, and extract() to search a XML-based conference proceedings.

Note:

You can download this sample application from http://otn.oracle.com/products/text

Searching for Content and Structure in XML Documents

Documents are structured presentations of information. You can define a document as an asset that contains structure, content, and presentation. This case study describes how to search for content and structure at the same time. All features described here are available in Oracle9i Release 1 (9.0.1) and higher.

Consider an online conference proceedings search where the conference attendees can perform full text searches on the structure of the papers, for example search on title, author, abstract, and so on.

Searching XML-Based Conference Proceedings Using Oracle Text

Follow these tasks to build this conference proceedings search case study:

Task 1. Grant System Privileges. Set Initialization Parameters

You must be granted with QUERY REWRITE system privileges to create a Functional Index. You must also have the following initialization parameters defined to create a Functional Index:

QUERY_REWRITE_INTEGRITY set to TRUSTED
QUERY_REWRITE_ENABLED set to TRUE
COMPATIBLE set to 8.1.0.0.0 or a greater value

Task 2. Create Table Proceedings

For example, create a table, Proceedings with two columns: tk, the paper's id, and papers, the content. Store the paper's content as an XMLType.

CREATE TABLE Proceedings (tk number, papers XMLTYPE);

Task 3. Populate Table with Data

Now populate table Proceedings with some conference papers:

INSERT INTO Proceedings(tk,papers) VALUES (1, XMLType.createXML(
'<?xml version="1.0"?> 
  <paper> 
   <title>Accelerating Dynamic Web Sites using Edge Side Includes</title> 
    <authors>Soo Yun and Scott Davies</authors> 
     <company> Oracle Corporation </company> 
       <abstract> The main focus of this presentation is on Edge Side Includes 
(ESI). ESI is a simple markup language which is used to mark cacheable and 
non-cacheable fragments of a web page. An "ESI aware server", such as Oracle Web 
Cache and Akamai EdgeSuite, can take in ESI marked content and cache and 
assemble pages closer to the users, at the edge of the network, rather than at 
the application server level. This session will discuss the challenge many 
dynamic websites face today, discuss what ESI is, explain how ESI can be used to 
alleviate these issues. The session will also describe how to build pages with 
ESI, and detail the ESI and JESI (Edge Side Includes for Java) libraries. 
</abstract> 
     <track> Fast Track to Oracle9i </track> 
</paper>'));

Task 4. Create an Oracle Text Index on the XMLType Column

Create an Oracle Text index on the XMLType column, papers, using the usual CREATE INDEX statement:

CREATE INDEX proc_idx ON Proceedings(papers)
INDEXTYPE IS ctxsys.context
parameters('FILTER ctxsys.null_filter SECTION GROUP ctxsys.path_section_group');

Task 5. Querying the Conference Proceedings with XPath and Contains()

Oracle9i Release 1 (9.0.1) introduced two new SQL functions existsNode() and extract() that operate on XMLType values as follows:

existsNode(): given an XPath expression, checks if the XPath applied over the XML document can return any valid nodes.
extract(): given an XPath expression, applies the XPath to the XML document and returns the fragment as an XMLType.

For example, select the authors only from the XML document:

SELECT p.papers.extract('/paper/authors/text()').getStringVal()
   FROM Proceedings p;

You can use the Oracle Text CONTAINS() operator to search for content in a text or XML document. For example, to search for papers that contain "Dynamic" in the title you can use:

SELECT tk FROM Proceedings
WHERE CONTAINS(papers,'Dynamic INPATH(paper/title)')>0

Using the CONTAINS() operator Oracle9i returns the columns selected. For an XML document it returns the entire document. To extract fragments of XML, you can combine the extract() function to manipulate the XML. For example, select the authors of papers that contain "Dynamic" in the title:

SELECT p.papers.extract('/paper/authors/text()').getStringVal()
FROM Proceedings p
WHERE CONTAINS(papers,'Dynamic INPATH(paper/title)')>0

You can use all the functionality of an Oracle Text query for the content search. The following example selects the authors of papers containing "Dynamic" or "Edge" or "Libraries" in the title:

SELECT p.papers.extract('/paper/authors/text()').getStringVal()
FROM Proceedings p
WHERE CONTAINS(papers,'Dynamic or Edge or Libraries INPATH(paper/title)')>0

Traditional databases allow searching of XML content or structure, but not both at the same time. Oracle provides unique features that enable querying of both XML content and structure at the same time.

Figure 7-1 illustrates entering the search for "Libraries" in the structure of the Conference Proceedings documents. You can search for "Libraries" within Authors, abstract, title, company, or track. In this example, you are searching for the term "Libraries" in the abstracts only. Since it is an XML document your are searching, you can even select which fragment of the XML document you want to display. This example only displays the title of the paper.

Figure 7-2 shows the search results.

For the .jsp code to build this look-up application, see "Searching Conference Proceedings Example: jsp" .

See Also:

Figure 7-1 Using Oracle Text to Search for "Libraries" in the Conference Proceedings Abstracts

Text description of text92_01.jpg follows.

Text description of the illustration text92_01.jpg

Figure 7-2 Oracle Text Search Results

Text description of text92_02.jpg follows.

Text description of the illustration text92_02.jpg

Searching Conference Proceedings Example: jsp

Here is the full jsp example illustrating how you can use Oracle Text to create an online XML-based Conference Proceedings look-up application.

<%@ page import="java.sql.* , oracle.jsp.dbutil.*" %>
<jsp:useBean id="name" class="oracle.jsp.jml.JmlString" scope="request" >
<jsp:setProperty name="name" property="value" param="query" />
</jsp:useBean>

<%
  String connStr="jdbc:oracle:thin:@oalonso-sun:1521:betadev";
java.util.Properties info = new java.util.Properties();
Connection conn = null;
  ResultSet  rset = null;
  Statement  stmt = null;

  if (name.isEmpty()) { %>
<html>
       <title>OOW Paper Search</title>
       <body>
       <center>
         <h3>OOW Proceedings Search </h3>
         <form method=post>
         Search for
         <input type=text size=15 maxlength=25 name=query>
         in     
         <select name="tagvalue">
           <option value="authors">Author(s)
           <option value="abstract">Abstract
           <option value="title">Title
           <option value="company">Company
           <option value="track">Track
         </select>
         <input type=submit value="Search">
         <p><b>Display</b><br>
         <table>
          <tr>
           <td>
            Author(s)<input type="radio" name="section" value="authors">
           </td>
           <td>
            Abstract <input type="radio" name="section" value="abstract">
           </td>
           <td>
            Title    <input type="radio" name="section" value="title" checked>
           </td>
           <td>
            Company  <input type="radio" name="section" value="company">
           </td>
           <td>
            Track    <input type="radio" name="section" value="track">
           </td>
          </tr>
         </table>
         </form>
       </center>
       <hr>
     </body>
     </html>

   <%
   }
   else {
   %>

    <html>
       <title>OOW Paper Search</title>
       <body>
       <center>
       <h3>OOW Proceedings Search </h3>
         <form method=post action="oowpapersearch.jsp">
         Search for
         <input type=text size=15 maxlength=25 name="query" value=<%= 
name.getValue() %>>
         in     
         <select name="tagvalue">
           <option value="authors">Author(s)
           <option value="abstract">Abstract
           <option value="title">Title
           <option value="company">Company
           <option value="track">Track
         </select>
         <input type=submit value="Search">
         <p><b>Display</b><br>
         Author(s)<input type="radio" name="section" value="authors">
         Abstract <input type="radio" name="section" value="abstract">
         Title    <input type="radio" name="section" value="title" checked>
         Company  <input type="radio" name="section" value="company">
         Track    <input type="radio" name="section" value="track">
         </form>
       </center>

   <%
     try {

       DriverManager.registerDriver(new oracle.jdbc.driver.OracleDriver() );
       info.put ("user", "ctxdemo");
       info.put ("password","ctxdemo");
       conn = DriverManager.getConnection(connStr,info);

       stmt = conn.createStatement();
       String theQuery =   request.getParameter("query")+" 
INPATH(paper/"+request.getParameter("tagvalue")+")";

       String tagValue = request.getParameter("tagvalue");
       String sectionValue = request.getParameter("section");

      // select p.papers.extract('/paper/authors').getStringVal() 
      // from oowpapers p
      // where contains(papers,'language inpath(paper/abstract)')>0

        String myQuery = "select 
p.papers.extract('/paper/"+request.getParameter("section")+"/text()').getStringV
al() from oowpapers p where contains(papers,'"+theQuery+"')>0";

       rset = stmt.executeQuery(myQuery);
String color = "ffffff";
       String myDesc = null;

       int items = 0;
       while (rset.next()) {
         myDesc = (String)rset.getString(1);
         items++;
       if (items == 1) {
    %>

            <center>
               <table border="0">
                  <tr bgcolor="#6699CC">
                    <th><%= sectionValue %></th>
                  </tr>
    <%   } %> 

         <tr bgcolor="#<%= color %>">
           <td> <%= myDesc %></td>
         </tr>
<%
         if (color.compareTo("ffffff") == 0)
               color = "eeeeee";
             else
               color = "ffffff";

       }
     } catch (SQLException e) {
     %>
        <b>Error: </b> <%= e %><p>
     <%
     } finally {
       if (conn != null) conn.close();
       if (stmt != null) stmt.close();
       if (rset != null) rset.close();
     }
     %>
     </table>
     </center>
     </body></html>
     <%
   }
%>

Frequently Asked Questions About Oracle Text

This Frequently Asked Questions (FAQs) section is divided into the following categories:

FAQs: General Questions About Oracle Text

Can I Use a CONTAINS() Query with an XML Function to Extract an XML Fragment?

Answer: Yes you can. See "Querying XML Data: Use CONTAINS or existsNode()?".

Can XML Documents Be Queried Like Table Data?

I know that an intact XML document can be stored in a CLOB in Oracle's XML solution.

Can XML documents stored in a CLOB or a BLOB be queried like table schema? For example:

[XML document stored in BLOB]...<name id="1111"><first>lee</first>            
<sencond>jumee</second></name>...

Can value (lee, jumee) be queried by elements, attributes, and the structure of XML document?

Answer: Using Oracle Text, you can find this document with a queries such as:

lee within first
jumee within second
1111 within name@id

You can combine these like this:

lee within first and jumee within second, or
(lee within first) within name.

For more information, please read the Oracle Text Technical Overview available on OTN at http://otn.oracle.com/products/text

Can I Edit Individual XML Elements?

If some element or attribute is inserted, updated, or deleted, must the whole document be updated? Or can insert, update, and delete function as in table schema?

Answer: Oracle Text indexes CLOB and BLOB, and this has no knowledge about XML specifically, so you cannot really change individual elements. You have to edit the document as a whole.

How Are XML Files Locked in CLOBs and BLOBs?

About locking, if we manage an XML document stored in a CLOB or a BLOB, can anyone access the same XML document?

Answer: Just like any other CLOB, if someone is writing to the CLOB, they have it locked and nobody else can write to the CLOB. Other users can read it, but not write to it. This is basic LOB behavior.

An alternative is to decompose the XML document and store the information in relational fields. Then you can modify individual elements, have element-level simultaneous access, and so on. In this case, using something called the USER_DATASTORE and PL/SQL, you can reconstitute the document to XML for text indexing. Then, you can search the text as if it were XML, but manage the data as if it were relational data. Again, see the Oracle Text Technical Overview for more information at: http://otn.oracle.com/products/text.

How Can I Search XML Documents and Return a Zone?

I need to store a large XML file, search it, and return a specific tagged area. Using Oracle Text some of this is possible:

I can store an XML file in a CLOB field
I can index it with ctxsys.context
I can create <Zones> and <Fields> to represent the tags in my XML file Ex. ctx_ddl.add_zone_section(xmlgroup,"dublincore", dc);
I can search for text within a zone or fieldEx. Select title from mytable where CONTAINS(textField,"some words WITHIN doubleness")

How do I return a zone or a field based on a text search?

Answer: Oracle Text will only return the "hits". You can use Oracle Text doc service to highlight or mark up the section, or you can store the CLOB in an XMLType column and use the extract() function.

How Do I Load XML Documents into the Database?

How do I insert XML documents into a database? I need to insert the XML document as-is in column of datatype CLOB into a table.

Answer: Oracle's XML SQL Utility for Java offers a command-line utility that can be used for loading XML data. More information can be found on the XML SQL Utility at the following Web site:

http://otn.oracle.com/tech/xml

as well as in Chapter 7, "XML SQL Utility (XSU)".

You can insert the XML documents as you would any text file. There is nothing special about an XML-formatted file from a CLOB perspective.

How Do I Search XML Documents with Oracle Text?

Can Oracle Text be used to index and search XML stored in CLOBs? How can we get started?

Answer: Versions of Oracle Text before Oracle8i Release 2 (8.1.6) only allowed tag-based searching. The current version allows for XML structure and attribute based searching. There is documentation on how to have the index built and the SQL usage in the Oracle Text documentation.

See Also:

Oracle Text Reference.

How Do I Search XML Using the WITHIN Operator?

I have this XML code:

<person>
  <name>efrat</name> 
 <childrens> 
     <child> 
         <id>1</id> 
         <name>keren</name> 
     </child>
  </childrens>
</person>

How do I find the person who has a child name keren but not the person's name keren? This assumes I defined every tag with the add_zone_section that can be nested and can include themselves.

Answer: Use '(keren within name) within child'.

Where Can I Find Examples of Using Oracle Text to Search XML?

Answer: See the following manuals:

Does Oracle Text Automatically Recognize XML Tags?

Can Oracle Text recognize the tags in my XML document or do I have to use the add_field_section command for each tag in the XML document? My XML documents have hundreds of tags. Is there an easy way to do this?

Answer: Which version of the database are you using? I believe you need to use that command for Oracle8 release 8.1.5 but not in Oracle8i release 2 (8.1.6). You can use AUTO_SECTION_GROUP in Oracle8i release 2 (8.1.6).

XSQL Servlet ships with a complete (albeit simple from the Oracle Text standpoint) example of a SQL script that creates a complex XML datagram out of object types, and then creates an Oracle Text index on the XML document fragment stored in the Insurance Claim type.

If you download the XSQL Servlet, and look at the file ./xsql/demo/insclaim.sql you'll be able to see the Oracle Text stuff at the bottom of the file. One of the key new features in Oracle Text in Oracle8i release 2 (8.1.6) was the AUTO Section for XML.

Can I Do Range Searching with Oracle Text?

I have an XML document that I have stored in CLOB. I have also created the indexes on the tags using section_group, and so on. One of the tags is <SALARY> </SALARY>. I want to write a SQL statement to select all the records that have salary of greater than 5000. How do I do this? I cannot use the WITHIN operator. I want to interpret the value present in this tag as a number. This could be a floating point number also since this is salary.

Answer: You cannot do this in Oracle Text. Range search is not really a text operation. The best solution is to use the other Oracle XML parsing utilities to extract the salary into a NUMBER field. Then, you can use Oracle Text for text searching, and normal SQL operators for the more structured fields, and achieve the same results.

Can Oracle Text Do Section Extraction?

We are storing all our documents in XML format in a CLOB. Are there utilities available in Oracle, perhaps Oracle Text, to retrieve the contents a field at a time? That is, given a field name, can I get the text between tags, as opposed to retrieving the whole document and traversing the structure?

Answer: Oracle Text does not do section extraction. See the XML SQL Utility for this.

Can I Create a Text Index on Three Columns?

I have created a view based on seven to eight tables and it has columns like custordnumber, product_dsc, qty, prdid, shipdate, ship_status, and so on. I need to create an Oracle Text index on the three columns:

custordnumber
product_dsc
ship_status

Is there a way to create a Text index on these columns?

Answer: The short answer is yes. You have two options:

Use the USER_DATASTORE object to create a concatenated field on the fly during indexing; or
Concatenate your fields and store them in an extra CLOB field in one of your tables. Then, create the index on the CLOB field. If you're using Oracle8i release 2 (8.1.6) or higher, then you also have the option of placing XML tags around each field prior to concatenation. This gives you the capability of searching within each field.

How Fast Is Oracle9i at Indexing Text? Can I Just Enable Boolean Searches?

We are using mySQL to do partial indexing of 9 million Web pages a day. We are running on a 4-processor Sparc 420 and are unable to do full text indexing. Can Oracle8i or Oracle9i do this?

We are not interested in transactional integrity, applying any special filters to the text pages, or in doing any other searching other than straight boolean word searches (no scoring, no stemming, no fuzzy searches, no proximity searches, and so on).

I have are the following questions:

Will Oracle8i or Oracle9i be any faster at indexing text than mySQL?
If so, is there a way to disable all the features of text indexing except for boolean word searches?

Answer: Yes. Oracle Text can create a full-text index on 9 million Web pages - and pretty quickly. In a benchmark on a large Sun box, we indexed 100 GB of Web pages (about 15 million) in 7 hours. We can also do partial indexing through regular DML or (in Oracle9i) through partitioning.

You can do "indexing lite" to some extent by disabling theme indexing. You do not need to filter documents if they are already in ASCII, HTML, or XML, and most common expansions, like fuzzy, stemming, and proximity, are done at query time.

FAQs: Searching Attribute Values with Oracle Text

Can I Build Text Indexes on Attribute Values?

Currently Oracle Text has the option to create indexes based on the content of a section group. But most XML elements are of the type Element. So, the only option for searching would be attribute values. Can I build indexes on attribute values?

Answer: Oracle8 release 8.1.6 and higher allow attribute indexing. See the following site:

http://otn.oracle.com/products/intermedia/htdocs/text_training_816/Samples/imt_816_techover.html#SCN

FAQs: Searching XML Documents in CLOBs Using Oracle Text

How Can I Search Different XML Documents Stored in CLOBs?

I store XML in CLOBs and use the DOM or SAX parsers to reparse the XML later as needed. How can I search this document repository? Oracle Text seems ideal. Do you have an example of setting this up using interMedia in Oracle8i, demonstrating how to define the XML_SECTION_GROUP and where to use a ZONE as opposed to a FIELD, and so on? For example:

How would I define interMedia parameters so that I would be able to search my CLOB column for records that had the values aorta and damage using the following XML (the DTD of which is implied)

WellKnownFileName.gif <keyword>echo</keyword>
<keyword>cardiogram aorta</keyword>

This is an image of the vessel damage.

Answer: Oracle8i release 2 (8.1.6) and higher allow searching within attribute text. That's something like: state within book@author. Oracle now offers attribute value sensitive search, more like the following:

state within book[@author = "Eric"]: 

begin  ctx_ddl.create_section_group('mygrp','basic_section_group');
  ctx_ddl.add_field_section('mygrp','keyword','keyword');
  ctx_ddl.add_field_section('mygrp','caption','caption');
end;
create index myidx on mytab(mytxtcolumn)indextype is ctxsys.contextparameters   
('section group mygrp');
select * from mytab where contains(mytxtcolumn, 'aorta within keyword')>0;
options:

Use XML section group instead of basic section group if your tags have attributes or you need case-sensitive tag detection.
Use zone sections instead of field sections if your sections overlap, or if you need to distinguish between instances. For instance, if keywords is a field section, then (aorta and echo cardiogram) within keywords finds the document. If it is a zone section, then it does not, because they are not in the SAME instance of keywords.

How Do I Store an XML Document in a CLOB Using Oracle Text?

I need to store XML files, which are currently on the file system, in the database. I want to store them as whole documents; that is, I do not want to break the document down by tags and then store the info in separate tables or fields. Rather, I want to have a universal table, that I can use to store different XML documents. I think internally it will be stored in a CLOB type of field. My XML files will always contain ASCII data.

Can this be done using Oracle Text? Should we be using Oracle Text or Oracle Text Annotator for this? I downloaded Annotator from OTN, but I could not store XML documents in the database.

I am trying to store XML documents in a CLOB column. Basically I have one table with the following definition:

CREATE TABLE xml_store_testing
(
    xml_doc_id  NUMBER,
    xml_doc     CLOB  )

I want to store my XML document in an xml_doc field.

I have written the following PL/SQL procedure, to read the contents of the XML document. The XML document is available on the file system and contains just ASCII data, no binary data.

CREATE OR REPLACE PROCEDURE FileExec
(
  p_Directory      IN VARCHAR2, 
  p_FileName       IN VARCHAR2)  
  AS   v_CLOBLocator  CLOB;
       v_FileLocator       BFILE;
BEGIN
    SELECT  xml_doc
    INTO      v_CLOBLocator
    FROM    xml_store_testing
    WHERE  xml_doc_id = 1
    FOR        UPDATE;
    v_FileLocator := BFILENAME(p_Directory, p_FileName);
    DBMS_LOB.FILEOPEN(v_FileLocator, DBMS_LOB.FILE_READONLY);
    dbms_output.put_line(to_char(DBMS_LOB.GETLENGTH(v_FileLocator)));
      DBMS_LOB.LOADFROMFILE(v_CLOBLocator, v_FileLocator,
      DBMS_LOB.GETLENGTH(v_FileLocator));
      DBMS_LOB.FILECLOSE(v_FileLocator);
END FileExec;

Answer: Put the XML documents into your CLOB column, then add an Oracle Text index on it using the XML_SECTION_GROUP. See the documentation and overview material at this Web site: http://otn.oracle.com/products/intermedia.

Is Storing XML in CLOBs Affected by Character Set?

When I put my XML documents in a CLOB column, then add an Oracle Text index using the XML section-group, it executes successfully. But when I select from the table I see unknown characters in the table in CLOB field. Could this be because of the character set difference between operating system, where XML file resides, and database, where CLOB data resides?

Answer: Yes. If the character sets are different then you probably have to pass the data through UTL_RAW.CONVERT to do a character set conversion before writing to the CLOB.

Can I Only Insert Structured Data When the Table is Created?

I need to insert data in the database from an XML file. Currently I can only insert structured data with the table already created. Is this correct?

I am working in a law project where we need to store laws containing structured data and unstructured data, and then search the data using Oracle Text. Can I insert unstructured data too? Or do I need to develop a custom application to do it? Then, if I have the data stored with some structured parts and some unstructured parts, can I use Oracle Text to search it? If I stored the unstructured part in a CLOB, and the CLOB has tags, how can I search only data in a specific tag?

Answer: Consider usingOracle9iFS, which enables you to break up a document and store it across tables and in a LOB. Oracle Text can perform data searches with tags and is knowledgeable about the hierarchical XML structure. From Oracle8i release 2 (8.1.6), Oracle Text has had this capability, along with name/value pair attribute searches.

Can I Break an XML Document Without Creating a Custom Development?

Is document breaking possible if I don't create a custom development? Although Oracle Text does not understand hierarchical XML structure, can I do something like this?

<report>
   <day>yesterday</day> there was a disaster <cause>hurricane</cause>
</report>

Indexing with Oracle Text, I would like to search LOBs where cause was hurricane. Is this possible?

Answer: You can perform that level of searching with the current release of Oracle Text. Currently, to break a document up you have to use the XML Parser with XSLT to create a style sheet that transforms the XML into DDL. Oracle9iFS gives you a higher level interface.

Another technique is to use a JDBC program to insert the text of the document or document fragment into a CLOB or LONG column, then do the searching using the CONTAINS() operator after setting up the indexes.

What Is the Syntax for Creating a Substring Index with XML_SECTION_GROUP?

I have a CLOB column that has an existing XML_SECTION_GROUP index on certain tags within the XML content of the CLOB, as follows:

begin
 ctx_ddl.create_section_group('XMLDOC','XML_SECTION_GROUP');
 end;
 /
 begin
 ctx_ddl.add_zone_section ('XMLDOC','title','title');
 ctx_ddl.add_zone_section('XMLDOC','keywords','keywords');
 ctx_ddl.add_zone_section('XMLDOC','author','author');
 end;
 /
 create index xmldoc_idx on xml_documents(xmldoc)
 indextype is ctxsys.context
 parameters ('section group xmldoc') ;

I need to search on the 'author' zone section by the first letter only. I believe I should use a substring index but I am unsure of the syntax to create a substring index. Especially when I have already declared a SECTION_GROUP preference on this column and I would also need to create a WORDLIST preference.

Answer. The main problem here is that you cannot apply that fancy substring processing just to the author section. It will apply to everything, which will probably blow up the index size. Anything you do will require reindexing the documents, so you cannot really get around having to rebuild the index entirely. Here are various ways to solve your problem:

Do nothing. Query just like: Z% WITHIN AUTHOR

Pro: You do not have to rebuild the index.

Con: The query is slow. Some queries cannot be executed due to wildcard maxterms limits.
Create a wordlist preference with PREFIX_INDEX set to TRUE, PREFIX_MIN_LENGTH set to 1, and PREFIX_MAX_LENGTH set to 1. The query looks like: Z% WITHIN AUTHOR

Pro: This is a moderately fast query.

Con: You must use Oracle8i Release 3 (8.1.7) or higher or you will get 'junk' from words from other sections.
As in the preceding, plus make AUTHOR, KEYWORDS, TITLE field sections.

Pro: This faster query than 2.

Con: The field sections are less flexible with regards to. nesting and repeating occurrences.
Use a user_datastore or procedure_filter to transform the data so that:
```
<AUTHOR>Steven King</AUTHOR>
```
becomes
```
<AUTHORINIT>AIK</AUTHORINIT><AUTHOR>Steven King<AUTHOR>
```
Use field section for AUTHORINIT and query becomes:
```
AIK within AUTHORINIT
```
I used AIK instead of just K so that you do not have to make I and A non-stopwords.

Pro: This is the fastest query and the smallest index.

Con: It involves the most work as you have to massage the data so it slows down indexing.

Why Does the XML Search for Topic X with Relevance Y Give Wrong Results?

We are using Sun SPARC Solaris 5.8, Oracle8i Enterprise Edition Release 3 (8.1.7.2.0), Oracle Text. We are indexing XML documents that contain attributes within the XML tags. One of the sections in the XML is a list of subjects associated with the document. Each subject has a relevance associated with it. We want to search for topic x with relevance y but we get the wrong results. For example: The data in some of the rows look like this, considering subject PA:

DOC  1 --> Story_seq_num = 561106
<ne-metadata.subjectlist>
  <ne-subject code="PA" source="NEWZ" relevance="50" confidence="100"/>
  <ne-subject code="CONW" source="NEWZ" relevance="100" confidence="100"/>
  <ne-subject code="LENF" source="NEWZ" relevance="100" confidence="100"/>
  <ne-subject code="TRAN" source="NEWZ" relevance="100" confidence="100"/>
</ne-metadata.subjectlist>
DOC  2 --> Story_seq_num =561107
<ne-metadata.subjectlist>
  <ne-subject code="CONW" source="NEWZ" relevance="100" confidence="100"/>
...

If users wants subject PA with relevance = 100, only DOC 2 should be returned. Here is a test case showing the results:

Are these the expected results?

TABLE

drop table t_stories1 ;
create table t_stories1 as select * from t_Stories_bck
where story_Seq_num in (561114,562571,562572,561106,561107);

INDEX SECTIONS

BEGIN
-- Drop the preference if it already exists
CTX_DDL.DROP_SECTION_GROUP('sg_nitf_story_body2');
END;
/
BEGIN
--Define a section group
ctx_ddl.create_section_group ('sg_nitf_story_body2','xml_section_group');
-- Create field sections for headline and body
ctx_ddl.add_field_section('sg_nitf_story_body2','HL','hedline',true);
ctx_ddl.add_field_section('sg_nitf_story_body2','ST','body.content', true);
--Define attribute sections for the source fields
ctx_ddl.add_attr_section( 'sg_nitf_story_body2', 'P', 'ne-provider@id');
ctx_ddl.add_attr_section( 'sg_nitf_story_body2', 'C', 'ne-publication@id');
ctx_ddl.add_attr_section( 'sg_nitf_story_body2', 'S', 'ne-publication@section');
ctx_ddl.add_attr_section( 'sg_nitf_story_body2', 'D', 'date.issue@norm');
ctx_ddl.add_attr_section( 'sg_nitf_story_body2', 'SJ', 'ne-subject@code');
ctx_ddl.add_attr_section( 'sg_nitf_story_body2', 'SJR', 'ne-subject@relevance');
ctx_ddl.add_attr_section( 'sg_nitf_story_body2', 'CO', 'ne-company@code');
ctx_ddl.add_attr_section( 'sg_nitf_story_body2', 'TO', 'ne-topic@code');
ctx_ddl.add_attr_section( 'sg_nitf_story_body2', 'TK', 'ne-orgid@value');

 ENd;
 /

creating the index

drop index ix_stories ;
CREATE INDEX ix_stories on T_STORIES1(STORY_BODY)
 INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS ('SECTION GROUP sg_nitf_story_body2 STORAGE ixst_story_body  ');

-- testing the index

--We are looking for the subject PA with relevance = 100

--query that gives us the correct results

SELECT STORY_SEQ_NUM, STORY_BODY FROM T_STORIES1 WHERE CONTAINS(STORY_BODY, 'PA 
WITHIN SJ')>0;

--Query that gives us the wrong results

SELECT STORY_SEQ_NUM, STORY_BODY  FROM T_STORIES1 WHERE CONTAINS(STORY_BODY, 'PA 
WITHIN SJ AND 100 within SJR')>0;

The data in some of the rows look like this:

Story_seq_num = 561106
<ne-metadata.subjectlist>
   <ne-subject code="PA" source="NEWZ" relevance="50" confidence="100"/>
   <ne-subject code="CONW" source="NEWZ" relevance="100" confidence="100"/>
   <ne-subject code="LENF" source="NEWZ" relevance="100" confidence="100"/>

<ne-subject code="TRAN" source="NEWZ" relevance="100" confidence="100"/>

</ne-metadata.subjectlist>

Story_seq_num =561107
<ne-metadata.subjectlist>
   <ne-subject code="CONW" source="NEWZ" relevance="100" confidence="100"/>

...

We are looking for the subject PA with relevance = 100

Only Story_seq_num = 561107 should be returned

The results are wrong because we wanted the subjects PA that have relevance =100. We get back story_seq_num=561106 that has relevance = 50 <ne-subject code="PA" source="NEWZ" relevance="50" confidence="100"/>

SQL> connect sosa/sosa
Connected.
SQL> select object_name, object_type from user_objects;

OBJECT_NAME
--------------------------------------------------------------------------------
OBJECT_TYPE
------------------
IX_STORIES
INDEX
SYS_LOB0000025364C00005$$
LOB
SYS_LOB0000025364C00009$$
LOB

OBJECT_NAME
--------------------------------------------------------------------------------
OBJECT_TYPE
------------------
SYS_LOB0000025364C00014$$
LOB
SYS_LOB0000025364C00016$$
LOB
T_STORIES1
TABLE

6 rows selected.

SQL> drop index ix_stories force;

Index dropped....

Answer. Oracle8i Release 3(8.1.7) is not able to this kind of search. You need the PATH section group in Oracle9i Release 1 (9.0.1), which has a much more sophisticated understanding of such relationships. To do this in 8.1.7 you would have to re-format the documents (possibly through a procedure filter or user datastore), use zone sections, and nested withins, so that:

<A B="C" D="E">...

became

<A><B>C</B><D>E</D>...

and queries are like:

(C within B and E within D) within A in 9.0.1, you should be able to use PATH_SECTION_GROUP on the unmodified data, with a query like:

haspath(//ne-subject[@code = "PA" and @relevance = "100"])

7 Searching XML Data with Oracle Text

Searching XML Data with Oracle Text

Introducing Oracle Text

Accessing Oracle Text

Oracle Text Now Supports XMLType

Further Oracle Text Examples

Assumptions Made in This Chapter's Examples

Oracle Text Users and Roles

User CTXSYS

Role CTXAPP

Querying with the CONTAINS Operator

CONTAINS Syntax

Table 7-1 CONTAINS Operator: Syntax Description

Example 7-1 Using a Simple SELECT Statement with CONTAINS

Example 7-2 Using the Score Operator with a Label to Obtain the Relevance

Using the WITHIN Operator to Narrow Query Down to Document Sections

Introducing SECTION_GROUPS

XML_SECTION_GROUP

Zone Sections: CTX_DLL.ADD_ZONE_SECTION Procedure

Field Sections: CTX_DLL.ADD_FIELD_SECTION Procedure

Attribute Section: CTX_DLL.ADD_ATTR_SECTION Procedure

Constraints for Querying Attribute or Field Sections

AUTO_ SECTION_GROUP/ PATH_SECTION_GROUP for INPATH and HASPATH

Dynamically Adding Sections or Stop Section Using ALTER INDEX

WITHIN Syntax for Section Querying

WITHIN Operator Limitations

INPATH or HASPATH Operators Search Using XPath-Like Expressions

Path Indexing and Path Querying with Oracle Text

Path Indexing

Path Querying

Using INPATH Operator for Path Searching in XML Documents

Table 7-2 Path Searching XML Documents Using the INPATH Operator

Using HASPATH Operator for Path Searching in XML Documents

Using HASPATH Operator for Path Existence Searching

Tag Value Equality Testing

Building a Query Application with Oracle Text

What Role Do You Need?

Step 1. Create a Section Group Preference

Table 7-3 Comparing Oracle Text Section Groups

Deciding Which Section Group to Use

Table 7-4 Guidelines for Choosing XML_, AUTO_, or PATH_ Section Groups

Creating a Section Preference with XML_SECTION_GROUP

Creating a Section Preference with AUTO_SECTION_GROUP

Creating a Section Preference with PATH_SECTION_GROUP

Step 2. Set the Preference's Attributes

2.1 XML_SECTION_GROUP: Using CTX_DDL.add_zone_section

add_zone_section Guidelines

2.2 XML_SECTION_GROUP: Using CTX_DDL.Add_Attr_Section

Add_Attr_Section Guidelines

2.3 XML_SECTION_GROUP: Using CTX_DDL.Add_Field_Section

Add_Field_Section Guidelines

How Attr_Section Differs from Field_Section

2.5 AUTO_SECTION_GROUP: Using CtX_DDL.Add_Stop_Section

Step 3. Create an Index Using the Section Preference Created in Step 2

Creating an Index Using XML_SECTION_GROUP

Creating an Index Using AUTO_SECTION_GROUP

Creating an Index Using PATH_SECTION_GROUP

Example 7-3 Creating an Index Using XML_SECTION_GROUP

Step 4. Create Your Query Syntax

Querying Within Attribute Sections

Example 7-4 Querying an XML Document

Example 7-5 Creating an Index and Performing a Text Query

Example 7-6 Text Query Using "ABOUT" in the Text Query Expression

Example 7-7 Creating an Index Using AUTO_SECTION_GROUP

Example 7-8 Creating an Index Using PATH_SECTION_GROUP

Example 7-9 Using XML_SECTION_GROUP and add_attr_section to Aid Querying

Presenting the Results of Your Query

XMLType Indexing

You Need Query Rewrite Privileges

System Parameter is Set to the Default, CTXSYS.PATH_SECTION_GROUP

XMLType Indexes Work Like Other Oracle Text Indexes

Example 7-10 Creating a Text Index on XMLType Columns

Using Oracle Text with Oracle XML DB

Creating an Oracle Text Index on an UriType Column

Example 7-11 Creating an Oracle Text Index on a UriType Column

Table 7-5 rUriType Column Default Preference Names for Oracle Text Indexing

Querying XML Data: Use CONTAINS or existsNode()?

Table 7-6 Using CONTAINS() and existsNode() to Search XMLType Data

Full-Text Search Functions in XPath Using ora:contains

ora:contains Features

7
Searching XML Data with Oracle Text