Oracle® Database Application Developer's Guide - Fundamentals 10g Release 1 (10.1) Part Number B10795-01 |
|
|
View PDF |
This chapter introduces regular expression support for Oracle Database. This chapter covers the following topics:
See Also:
|
Regular expressions specify patterns to search for in string data using standardized syntax conventions. A regular expression can specify complex patterns of character sequences. For example, the following regular expression:
a(b|c)d
searches for the pattern: 'a', followed by either 'b' or 'c', then followed by 'd'. This regular expression matches both 'abd' and 'acd'.
A regular expression is specified using two types of characters:
Examples of regular expression syntax are given later in this chapter.
Oracle Database implements regular expression support compliant with the POSIX Extended Regular Expression (ERE) specification.
Regular expression support is implemented with a set of Oracle Database SQL functions that allow you to search and manipulate string data. You can use these functions in any environment where Oracle Database SQL is used. See "Oracle Database SQL Functions for Regular Expressions" later in this chapter for more information.
Oracle Database supports a set of common metacharacters used in regular expressions. The behavior of supported metacharacters and related features is described in "Metacharacters Supported in Regular Expressions".
The database provides a set of SQL functions that allow you to search and manipulate strings using regular expressions. You can use these functions on any datatype that holds character data such as CHAR, NCHAR, CLOB, NCLOB, NVARCHAR2, and VARCHAR2.
A regular expression must be enclosed or wrapped between single quotes. Doing so, ensures that the entire expression is interpreted by the SQL function and can improve the readability of your code.
Table 12-1 gives a brief description of each regular expression function.
SQL Function | Description |
---|---|
REGEXP_LIKE |
This function searches a character column for a pattern. Use this function in the WHERE clause of a query to return rows matching the regular expression you specify. See the Oracle Database SQL Reference for syntax details on the REGEXP_LIKE function. |
REGEXP_REPLACE |
This function searches for a pattern in a character column and replaces each occurrence of that pattern with the pattern you specify. See the Oracle Database SQL Reference for syntax details on the REGEXP_REPLACE function. |
REGEXP_INSTR |
This function searches a string for a given occurrence of a regular expression pattern. You specify which occurrence you want to find and the start position to search from. This function returns an integer indicating the position in the string where the match is found. See the Oracle Database SQL Reference for syntax details on the REGEXP_INSTR function. |
REGEXP_SUBSTR |
This function returns the actual substring matching the regular expression pattern you specify. See the Oracle Database SQL Reference for syntax details on the REGEXP_SUBSTR function. |
Table 12-2 lists the metacharacters supported for use in regular expressions passed to SQL regular expression functions. Details on the matching behavior of these metacharacters is given in "Constructing Regular Expressions".
This section discusses construction of regular expressions.
The simplest match that you can perform with regular expressions is the basic string match. For this type of match, the regular expression is a string of literals with no metacharacters. For example, to find the sequence 'abc', you specify the regular expression:
abc
As mentioned earlier, regular expressions are constructed using metacharacters and literals. Metacharacters that operate on a single literal, such as '+' and '?' can also operate on a sequence of literals or on a whole expression. To do so, you use the grouping operator to enclose the sequence or subexpression. See "Subexpression" for more information on grouping.
This section gives usage examples for each supported metacharacter or regular expression operator.
The dot operator '.'
matches any single character in the current character set. For example, to find the sequence--'a', followed by any character, followed by 'c'--use the expression:
a.c
This expression matches all of the following sequences:
abc adc a1c a&c
The expression does not match:
abb
The one or more operator '+'
matches one or more occurrences of the preceding expression. For example, to find one or more occurrences of the character 'a', you use the regular expression:
a+
This expression matches all of the following:
a aa aaa
The expression does not match:
bbb
The question mark matches zero or one--and only one--occurrence of the preceding character or subexpression. You can think of this operator as specifying an expression that is optional in the source text.
For example, to find--'a', optionally followed by 'b', then followed by 'c'--you use the following regular expression:
ab?c
This expression matches:
abc ac
The expression does not match:
adc abbc
The zero or more operator '*'
, matches zero or more occurrences of the preceding character or subexpression. For example, to find--'a', followed by zero or more occurrences of 'b', then followed by 'c'--use the regular expression:
ab*c
This expression matches all of the following sequences:
ac abc abbc abbbbc
The expression does not match:
adc
The exact-count interval operator is specified with a single digit enclosed in braces. You use this operator to search for an exact number of occurrences of the preceding character or subexpression.
For example, to find where 'a' occurs exactly 5 times, you specify the regular expression:
a{5}
This expression matches:
aaaaa
The expression does not match:
aaaa
You use the at-least-count interval operator to search for a specified number of occurrences, or more, of the preceding character or subexpression. For example, to find where 'a' occurs at least 3 times, you use the regular expression:
a{3,}
This expression matches all of the following:
aaa aaaaa
The expression does not match:
aa
You use the between-count interval operator to search for a number of occurrences within a specified range. For example, to find where 'a' occurs at least 3 times and no more than 5 times, you use the following regular expression:
a{3,5}
This expression matches all of the following sequences:
aaa aaaa aaaaa
The expression does not match:
aa
You use the matching character list to search for an occurrence of any character in a list. For example, to find either 'a', 'b', or 'c' use the following regular expression:
[abc]
This expression matches the first character in each of the following strings:
at bet cot
The expression does not match:
def
The following regular expression operators are allowed within the character list, any other metacharacters included in a character list lose their special meaning (are treated as literals):
'-'
[:
:]
[. .]
[= =]
Use the non-matching character list to specify characters that you do not want to match. Characters that are not in the non-matching character list are returned as a match. For example, to exclude the characters 'a', 'b', and 'c' from your search results, use the following regular expression:
[^abc]
This expression matches characters 'd' and 'g' in the following strings:
abcdef ghi
The expression does not match:
abc
As with the matching character list, the following regular expression operators are allowed within the non-matching character list (any other metacharacters included in a character list are ignored):
'-'
[:
:]
[. .]
[= =]
For example, the following regular expression excludes any character between 'a' and 'i' from the search result:
[^a-i]
This expression matches the characters 'j' and 'l' in the following strings:
hijk lmn
The expression does not match the characters:
abcdefghi
Use the Or operator '|'
to specify an alternate expression. For example to match 'a' or 'b', use the following regular expression:
a|b
You can use the subexpression operator to group characters that you want to find as a string or to create a complex expression. For example, to find the optional string 'abc', followed by 'def', use the following regular expression:
(abc)?def
This expression matches strings 'abcdef' and 'def' in the following strings:
abcdefghi defghi
The expression does not match the string:
ghi
The backreference lets you search for a repeated expression. You specify a backreference with '\
n
'
, where n
is an integer from 1 to 9 indicating the nth preceding subexpression in your regular expression.
For example, to find a repeated occurrence of either string 'abc' or 'def', use the following regular expression:
(abc|def)\1
This expression matches the following strings:
abcabc defdef
The expression does not match the following strings:
abcdef abc
The backreference counts subexpressions from left to right starting with the opening parenthesis of each preceding subexpression.
The backreference lets you search for a repeated string without knowing the actual string ahead of time. For example, the regular expression:
^(.*)\1$
matches a line consisting of two adjacent appearances of the same string.
Use the escape character '\'
to search for a character that is normally treated as a metacharacter. For example to search for the '+' character, use the following regular expression:
\+
This expression matches the plus character '+' in the following string:
abc+def
The expression does not match any characters in the string:
abcdef
Use the beginning of line anchor ^
to search for an expression that occurs only at the beginning of a line. For example, to find an occurrence of the string def
at the beginning of a line, use the expression:
^def
This expression matches def
in the string:
defghi
The expression does not match def
in the following string:
abcdef
The end of line anchor metacharacter '$'
lets you search for an expression that occurs only at the end of a line. For example, to find an occurrence of def
that occurs at the end of a line, use the following expression:
def$
This expression matches def
in the string:
abcdef
The expression does not match def
in the following string:
defghi
The POSIX character class operator lets you search for an expression within a character list that is a member of a specific POSIX Character Class. You can use this operator to search for characters with specific formatting such as uppercase characters, or you can search for special characters such as digits or punctuation characters. The full set of POSIX character classes is supported.
To use this operator, specify the expression using the syntax [:
class
:]
where class
is the name of the POSIX character class to search for. For example, to search for one or more consecutive uppercase characters, use the following regular expression:
[[:upper:]]+
This expression matches 'DEF' in the string:
abcDEFghi
The expression does not return a match for the following string:
abcdefghi
Note that the character class must occur within a character list, so the character class is always nested within the brackets for the character list in the regular expression.
See Also:
Mastering Regular Expressions published by O'Reilly & Associates, Inc. for more information on POSIX character classes |
The POSIX collating sequence element operator [. .]
lets you use a collating sequence in your regular expression. The element you specify must be a defined collating sequence in the current locale.
This operator lets you use a multicharacter collating sequence in your regular expression where only one character would otherwise be allowed. For example, you can use this operator to ensure that the collating sequence 'ch', when defined in a locale such as Spanish, is treated as one character in operations that depend on the ordering of characters.
To use the collating sequence operator, specify [.
element
.]
where element
is the collating sequence you want to find. You can use any collating sequence that is defined in the current locale including single-character elements as well as multicharacter elements.
For example, to find the collating sequence 'ch', use the following regular expression:
[[.ch.]]
This expression matches the sequence 'ch' in the following string:
chabc
The expression does not match the following string:
cdefg
You can use the collating sequence operator in any regular expression where collation is needed. For example, to specify the range from 'a' to 'ch', you can use the following expression:
[a-[.ch.]]
Use the POSIX character equivalence class operator to search for characters in the current locale that are equivalent. For example, to find the Spanish character 'ñ' as well as 'n'.
To use this operator, specify [=
character
=]
, to find all characters that are members of the same character equivalence class as the specified character
.
For example, the following regular expression could be used to search for characters equivalent to 'n' in a Spanish locale:
[[=n=]]
This expression matches both 'N' and 'ñ' in the following string:
El Niño
Note:
|