12
Customizing Locale Data

This chapter shows how to customize locale data. It includes the following topics:

Overview of the Oracle Locale Builder Utility

The Oracle Locale Builder offers an easy and efficient way to customize locale data. It provides a graphical user interface through which you can easily view, modify, and define locale-specific data. It extracts data from the text and binary definition files and presents them in a readable format so that you can process the information without worrying about the formats used in these files.

The Oracle Locale Builder handles four types of locale definitions: language, territory, character set, and linguistic sort. It also supports user-defined characters and customized linguistic rules. You can view definitions in existing text and binary definition files and make changes to them or create your own definitions.

This section contains the following topics:

Configuring Unicode Fonts for the Oracle Locale Builder

The Oracle Locale Builder uses Unicode characters in many of its functions. For example, it shows the mapping of local character code points to Unicode code points.Therefore, Oracle Corporation recommends that you use a Unicode font to fully support the Oracle Locale Builder. If a character cannot be rendered with your local fonts, then it will probably be displayed as an empty box.

Font Configuration on Windows

There are many Windows TrueType and OpenType fonts that support Unicode. Oracle Corporation recommends using the Arial Unicode MS font from Microsoft, because it includes about 51,000 glyphs and supports most of the characters in Unicode 3.1.

After installing the Unicode font, add the font to the Java Runtime font.properties file so it can be used by the Oracle Locale Builder. The font.properties file is located in the $JAVAHOME/lib directory. For example, for the Arial Unicode MS font, add the following entries to the font.properties file:

dialog.n=Arial Unicode MS, DEFAULT_CHARSET
dialoginput.n=Arial Unicode MS, DEFAULT_CHARSET
serif.n=Arial Unicode MS, DEFAULT_CHARSET
sansserif.n=Arial Unicode MS, DEFAULT_CHARSET

n is the next available sequence number to assign to the Arial Unicode MS font in the font list. Java Runtime searches the font mapping list for each virtual font and use the first font available on your system.

After you edit the font.properties file, restart the Oracle Locale Builder.

See Also:

Sun's internationalization website for more information about the font.properties file

Font Configuration on Other Platforms

There are fewer choices of Unicode fonts for non-Windows platforms than for Windows platforms. If you cannot find a Unicode font with satisfactory character coverage, then use multiple fonts for different languages. Install each font and add the font entries into the font.properties file using the steps described for the Windows platform.

For example, to display Japanese characters on Sun Solaris using the font ricoh-hg mincho, add entries to the existing font.properties file in $JAVAHOME/lib in the dialog, dialoginput, serif, and sansserif sections. For example:

serif.plain.3=-ricoh-hg mincho l-medium-r-normal--*-%d-*-*-m-*-jisx0201.1976-0

See Also:

Your platform-specific documentation for more information about available fonts

The Oracle Locale Builder User Interface

Ensure that the ORACLE_HOME initialization parameter is set before starting the Builder.

Start the Oracle Locale Builder by changing into the $ORACLE_HOME/ocommon/nls/lbuilder directory and issuing the following command:

% lbuilder

After you start the Oracle Locale Builder, the screen shown in Figure 12-1 appears.

Figure 12-1 Oracle Locale Builder Utility

Text description of the illustration pic1.gif

Oracle Locale Builder Screens and Dialog Boxes

Before using Oracle Locale Builder for a specific task, you should become familiar with screens and dialog boxes that include the following:

Existing Definitions Dialog Box
Session Log Dialog Box
Preview NLT Screen
Open File Dialog Box

Note:
Oracle Locale Builder includes online help.

Existing Definitions Dialog Box

When you choose New Language, New Territory, New Character Set, or New Linguistic Sort, the first screen you see is labelled General. Click Show Existing Definitions to see the Existing Definitions dialog box.

The Existing Definitions dialog box enables you to open locale objects by name. If you know a specific language, territory, linguistic sort (collation), or character set that you want to start with, click its displayed name. For example, you can open the AMERICAN language definition file as shown in Figure 12-2.

Figure 12-2 Existing Definitions Dialog Box

Text description of the illustration pic17.gif

Choosing AMERICAN opens the lx00001.nlb file.

Language and territory abbreviations are for reference only and cannot be opened.

Session Log Dialog Box

In the Tools menu, choose View Log to see the Session Log dialog box. The Session Log dialog box shows what actions have been taken in the current session. The Save Log button enables you to keep a record of all changes. Figure 12-3 shows an example of a session log.

Figure 12-3 Session Log Dialog Box

Text description of the illustration pic22.gif

Preview NLT Screen

The NLT file is a text file with the file extension .nlt that shows the settings for a specific language, territory, character set, or linguistic sort. The Preview NLT screen presents a readable form of the file so that you can see whether the changes you have made look correct. You cannot modify the NLT file from the Preview NLT screen. You must use the specific elements of the Oracle Locale Builder to modify the NLT file.

Figure 12-4 shows an example of the Preview NLT screen for a user-defined language called AMERICAN FRENCH.

Figure 12-4 Previewing the NLT File

Text description of the illustration pic6.gif

Open File Dialog Box

You can see the Open File dialog box by going to the File menu, choosing Open, and choosing By File Name. Then choose the NLB file that you want to modify or use as a template. An NLB file is a binary file with the file extension .nlb that contains the binary equivalent of the information in the NLT file. Figure 12-5 shows the Open File dialog box with the lx00001.nlb file selected. The Preview panel shows that this NLB file is for the AMERICAN language.

Figure 12-5 Open File Dialog Box

Text description of the illustration pic16.gif

Creating a New Language Definition with the Oracle Locale Builder

This section shows how to create a new language based on French. This new language is called AMERICAN FRENCH. First, open FRENCH from the Existing Definitions dialog box. Then change the language name to AMERICAN FRENCH and the Language Abbreviation to AF in the General dialog box. Leave the default values for the other settings. Figure 12-6 shows the resulting General dialog box.

Figure 12-6 Language General Information

Text description of the illustration pic2.gif

The following restrictions apply when choosing names for locale objects such as languages:

Names must contain only ASCII characters
Names must start with a letter
Language, territory, and character set names cannot contain underscores

The valid range for the language ID field for a user-defined language is 1,000 to 10,000. You can accept the value provided by Oracle Locale Builder or you can specify a value within the range.

Note:

Only certain ID ranges are valid values for user-defined LANGUAGE, TERRITORY, CHARACTER SET, MONOLINGUAL COLLATION, and MULTILINGUAL COLLATION definitions. The ranges are specified in the sections of this chapter that concern each type of user-defined locale object.

Figure 12-7 shows how to set month names using the Month Names tab.

Figure 12-7 Language Definition Month Information

Text description of the illustration pic3.gif

All names are shown as they appear in the NLT file. If you choose Yes for capitalization, the month names are capitalized in your application, but they do not appear capitalized in the Month Names screen.

Figure 12-8 shows the Day Names screen.

Figure 12-8 Language Definition Type Information

Text description of the illustration pic4.gif

You can choose day names for your user-defined language. All names are shown as they appear in the NLT file. If you choose Yes for capitalization, the day names are capitalized in your application, but they do not appear capitalized in the Day Names screen.

Creating a New Territory Definition with the Oracle Locale Builder

This section shows how to create a new territory called REDWOOD SHORES and use RS as a territory abbreviation. The new territory is not based on an existing territory definition.

The basic tasks are to assign a territory name and choose formats for the calendar, numbers, date and time, and currency. Figure 12-9 shows the General screen with REDWOOD SHORES set as the Territory Name, 1001 set as the Territory ID, and RS set as the Territory Abbreviation.

Figure 12-9 Defining a New Territory

Text description of the illustration pic7.gif

The valid range for a territory ID for a user-defined territory is 1,000 to 10,000.

Figure 12-10 shows settings for calendar formats.

Figure 12-10 Choosing a Calendar Format

Text description of the illustration pic8.gif

Tuesday is set as the first day of the week, and the first week of the calendar year is set as an ISO week. The screen displays a sample calendar.

See Also:

"Calendar Formats" for more information about choosing the first day of the week and the first week of the calendar year
"Customizing Calendars with the NLS Calendar Utility" for information about customizing calendars themselves

Figure 12-11 shows date and time settings.

Figure 12-11 Choosing Date and Time Formats

Text description of the illustration pic9.gif

Sample formats are displayed when you choose settings from the drop-down menus. In this case, the Short Date Format is set to YY/MM/DD. The Short Time Format is set to HH24:MI:SS. The Long Date Format is set to YYYY/MM/DD DAY. The Long Time Format is set to HH12:MI:SS AM.

You can also enter your own formats instead of using the selection from the drop-down menus.

See Also:

Figure 12-12 shows settings for number formats.

Figure 12-12 Choosing Number Formats

Text description of the illustration pic10.gif

A period has been chosen for the Decimal Symbol. The Negative Sign Location is set to be on the left of the number. The Numeric Group Separator is a comma. The Number Grouping is set to 4 digits. The List Separator is a comma. The Measurement System is metric. The Rounding Indicator is 4.

You can enter your own values instead of using the drop-down menus.

Sample formats are displayed when you choose settings from the drop-down menus.

See Also:

"Numeric Formats"

Figure 12-13 shows settings for currency formats in the Monetary dialog box.

Figure 12-13 Choosing Currency Formats

Text description of the illustration pic11.gif

The Local Currency Symbol is set to $. The Alternative Currency Symbol is the Euro symbol. The Currency Presentation shows one of several possible sequences of the local currency symbol, the debit symbol, and the number. The Decimal Symbol is the period. The Group Separator is the comma. The Monetary Number Grouping is 3. The Monetary Precision, or number of digits after the decimal symbol, is 3. The Credit Symbol is +. The Debit Symbol is -. The International Currency Separator is a blank space, so it is not visible in the screen. The International Currency Symbol (ISO currency symbol) is USD. Sample currency formats are displayed, based on the values you have selected.

You can enter your own values instead of using the drop-down menus.

See Also:

"Currency Formats"

The rest of this section contains the following topics:

Customizing Time Zone Data

The time zone files contain the valid time zone names. The following information is included for each time zone:

Offset from Coordinated Universal Time (UTC)
Transition times for daylight savings time
Abbreviations for standard time and daylight savings time. The abbreviations are used with the time zone names.

Two time zone files are included in the Oracle home directory. The default file is oracore/zoneinfo/timezone.dat. It contains the most commonly used time zones. A larger set of time zones is included in oracore/zoneinfo/timezlrg.dat. Unless you need the larger set of time zones, use the default time zone file because database performance is better.

To use the larger time zone file, complete the following tasks:

Shut down the database.
Set the ORA_TZFILE environment variable to the full path name of the timezlrg.dat file.
Restart the database.

After you have used the timezlrg.dat file, you must continue to use it unless you are sure that none of the additional time zones are used for data that is stored in the database. Also, all databases that share information must use the same time zone file.

To view the time zone names, enter the following statement:

SQL> SELECT * FROM V$TIMEZONE_NAMES;

Customizing Calendars with the NLS Calendar Utility

Oracle supports several calendars. All of them are defined with data derived from Oracle's globalization support, but some of them may require the addition of ruler eras or deviation days in the future. To add this information without waiting for a new release of the Oracle database server, you can use an external file that is automatically loaded when the calendar functions are executed.

Calendar data is first defined in a text file. The text definition file must be converted into binary format. You can use the NLS Calendar Utility (lxegen) to convert the text definition file into binary format.

The name of the text definition file and its location are hard-coded and depend on the platform. On UNIX platforms, the file name is lxecal.nlt. It is located in the $ORACLE_HOME/ocommon/nls directory. A sample text definition file is included in the directory.

The lxegen utility produces a binary file from the text definition file. The name of the binary file is also hard-coded and depends on the platform. On UNIX platforms, the name of the binary file is lxecal.nlb. The binary file is generated in the same directory as the text file and overwrites an existing binary file.

After the binary file has been generated, it is automatically loaded during system initialization. Do not move or rename the file.

Invoke the calendar utility from the command line as follows:

% lxegen

See Also:

Platform-specific documentation for the location of the files on your system
"Calendar Systems"

Displaying a Code Chart with the Oracle Locale Builder

You can display and print the code charts of character sets with the Oracle Locale Builder.

Figure 12-14 shows the opening screen for Oracle Locale Builder.

Figure 12-14 Opening Screen for Oracle Locale Builder

Text description of the illustration pic1.gif

In the File menu, choose New. In the New menu, choose Character Set. Figure 12-15 shows the resulting screen.

Figure 12-15 General Character Set Screen

Text description of charsetg.gif follows.

Text description of the illustration charsetg.gif

Click Show Existing Definitions. Highlight the character set you wish to display. Figure 12-16 shows the Existing Definitions dialog box with US7ASCII highlighted.

Figure 12-16 Choosing US7ASCII in the Existing Definitions Dialog Box

Text description of charsets.gif follows.

Text description of the illustration charsets.gif

Click Open to choose the character set. Figure 12-17 shows the General screen when US7ASCII has been chosen.

Figure 12-17 General Screen When US7ASCII Has Been Loaded

Text description of us7ascii.gif follows.

Text description of the illustration us7ascii.gif

Click the Character Data Mapping tab. Figure 12-18 shows the Character Data Mapping screen for US7ASCII.

Figure 12-18 Character Data Mapping for US7ASCII

Text description of chdatama.gif follows.

Text description of the illustration chdatama.gif

Click View CodeChart. Figure 12-19 shows the code chart for US7ASCII.

Figure 12-19 US7ASCII Code Chart

Text description of codechar.gif follows.

Text description of the illustration codechar.gif

It shows the encoded value of each character in the local character set, the glyph associated with each character, and the Unicode value of each character in the local character set.

If you want to print the code chart, then click Print Page.

Creating a New Character Set Definition with the Oracle Locale Builder

You can customize a character set to meet specific user needs. In Oracle9i, you can extend an existing encoded character set definition. User-defined characters are often used to encode special characters that represent the following:

Proper names
Historical Han characters that are not defined in an existing character set standard
Vendor-specific characters
New symbols or characters that you define

This section describes how Oracle supports user-defined characters. It includes the following topics:

Character Sets with User-Defined Characters

User-defined characters are typically supported within East Asian character sets. These East Asian character sets have at least one range of reserved code points for user-defined characters. For example, Japanese Shift-JIS preserves 1880 code points for user-defined characters. They are shown in Table 12-1.

Table 12-1 Shift JIS User-Defined Character Ranges

Japanese Shift JIS User-Defined Character Range	Number of Code Points
F040-F07E, F080-F0FC	188
F140-F17E, F180-F1FC	188
F240-F27E, F280-F2FC	188
F340-F37E, F380-F3FC	188
F440-F47E, F480-F4FC	188
F540-F57E, F580-F5FC	188
FF640-F67E, F680-F6FC	188
F740-F77E, F780-F7FC	188
F840-F87E, F880-F8FC	188
F940-F97E, F980-F9FC	188

The Oracle character sets listed in Table 12-2 contain predefined ranges that support user-defined characters.

Table 12-2 Oracle Character Sets with User-Defined Character Ranges

Character Set Name	Number of Code Points Available for User-Defined Characters
JA16DBCS	4370
JA16EBCDIC930	4370
JA16SJIS	1880
JA16SJISYEN	1880
KO16DBCS	1880
KO16MSWIN949	1880
ZHS16DBCS	1880
ZHS16GBK	2149
ZHT16DBCS	6204
ZHT16MSWIN950	6217

Oracle Character Set Conversion Architecture

The code point value that represents a particular character can vary among different character sets. A Japanese kanji character is shown in Figure 12-20.

Figure 12-20 Japanese Kanji Character

Text description of the illustration char2.gif

The following table shows how the character is encoded in different character sets.

Unicode Encoding	JA16SJIS Encoding	JA16EUC Encoding	JA16DBCS Encoding
4E9C	889F	B0A1	4867

In Oracle, all character sets are defined in terms of Unicode 3.1 code points. That is, each character is defined as a Unicode 3.1 code value. Character conversion takes place transparently to users by using Unicode as the intermediate form. For example, when a JA16SJIS client connects to a JA16EUC database, the character shown inFigure 12-20 has the code point value 889F when it is entered from the JA16SJIS client. It is internally converted to Unicode (with code point value 4E9C) and then converted to JA16EU (code point value B0A1).

Unicode 3.1 Private Use Area

Unicode 3.1 reserves the range E000-F8FF for the Private Use Area (PUA). The PUA is intended for private use character definition by end users or vendors.

User-defined characters can be converted between two Oracle character sets by using Unicode 3.1 PUA as the intermediate form, the same as standard characters.

User-Defined Character Cross-References Between Character Sets

User-defined character cross-references between Japanese character sets, Korean character sets, Simplified Chinese character sets and Traditional Chinese character sets are contained in the following distribution sets:

${ORACLE_HOME}/ocommon/nls/demo/udc_ja.txt
${ORACLE_HOME}/ocommon/nls/demo/udc_ko.txt
${ORACLE_HOME}/ocommon/nls/demo/udc_zhs.txt
${ORACLE_HOME}/ocommon/nls/demo/udc_zht.txt

These cross-references are useful when registering user-defined characters across operating systems. For example, when registering a new user-defined character on both a Japanese Shift-JIS operating system and a Japanese IBM Host operating system, you may want to use F040 on the Shift-JIS operating system and 6941 on IBM Host operating system for the new user-defined character so that Oracle can convert correctly between JA16SJIS and JA16DBCS. You can find out that both Shift-JIS UDC value F040 and IBM Host UDC value 6941 are mapped to the same Unicode PUA value E000 in the user-defined character cross-reference.

Guidelines for Creating a New Character Set from an Existing Character Set

By default, the Oracle Locale Builder generates the next available character set name for you. You can also generate your own character set name. Use the following format for naming character set definition NLT files:

lx2dddd.nlt

dddd is the 4-digit Character Set ID in hex.

When you modify a character set, observe the following guidelines:

Do not remap existing characters.
All character mappings must be unique.
New characters should be mapped into the Unicode private use range e000 to f4ff. (Note that the actual Unicode 3.1 private use range is e000-f8ff. However, Oracle reserves f500-f8ff for its own private use.)
No line in the character set definition file can be longer than 80 characters.

If a character set is derived from an existing Oracle character set, Oracle Corporation recommends using the following character set naming convention:

<Oracle_character_set_name><organization_name>EXT<version>

For example, if a company such as Sun Microsystems adds user-defined characters to the JA16EUC character set, the following character set name is appropriate:

JA16EUCSUNWEXT1

The character set name contains the following parts:

JA16EUC is the character set name defined by Oracle
SUNW represents the organization name (company stock trading abbreviation for Sun Microsystems)
EXT specifies that this character set is an extension to the JA16EUC character set
1 specifies the version

Example: Creating a New Character Set Definition with the Oracle Locale Builder

This section shows how to create a new character set called MYCHARSET with 10001 for its Character Set ID. The example starts with the US7ASCII character set and adds 10 Chinese characters. Figure 12-21 shows the General screen.

Figure 12-21 Character Set General Information

Text description of the illustration pic12.gif

Click Show Existing Definitions and choose the US7ASCII character set from the Existing Definitions dialog box.

The ISO Character Set ID and Base Character Set ID fields are optional. The Base Character Set ID is used for inheriting values so that the properties of the base character set are used as a template. The Character Set ID is automatically generated, but you can override it. The valid range for a user-defined character set ID is 10,000 to 20,000. The ISO Character Set ID field remains blank for user-defined character sets.

Figure 12-22 shows the Type Specification screen.

Figure 12-22 Character Set Type Specification

Text description of the illustration pic13.gif

The Character Set Category is ASCII_BASED. The BYTE_UNIQUE flag is checked.

When you have chosen an existing character set, the fields for the Type Specification screen should already be set to appropriate values. You should keep these values unless you have a specific reason for changing them. If you need to change the settings, use the following guidelines:

FIXED_WIDTH is to identify character sets whose characters have a uniform length.
BYTE_UNIQUE means the single-byte range of code points is distinct from the multibyte range. The code in the first byte indicates whether the character is single-byte or multibyte. An example is JA16EUC.
DISPLAY identifies character sets that are used only for display on clients and not for storage. Some Arabic, Devanagari, and Hebrew character sets are display character sets.
SHIFT is for character sets that require extra shift characters to distinguish between single-byte characters and multibyte characters.

See Also:
"Variable-width multibyte encoding schemes" for more information about shift-in and shift-out character sets

Figure 12-23 shows how to add user-defined characters.

Figure 12-23 Importing User-Defined Character Data

Text description of the illustration pic14.gif

Open the Character Data Mapping screen. Highlight the character that you want to add characters after in the character set. In this example, the 0xfe local character value is highlighted.

You can add one character at a time or use a text file to import a large number of characters. In this example, a text file is imported. The first column is the local character value. The second column is the Unicode value. The file contains the following character values:

88a2 963f
88a3 54c0
88a4 611b
88a5 6328
88a6 59f6
88a7 9022
88a8 8475
88a9 831c
88aa 7a50
88ab 60aa

In the File menu, choose Import User-Defined Customers Data.

Figure 12-24 shows that the imported characters are added after 0xfe in the character set.

Figure 12-24 New Characters in the Character Set

Text description of the illustration pic15.gif

Supporting User-Defined Characters in Java

If you have Java products such as JDBC or SQLJ in your applications and want them to support user-defined characters, then customize your character set as desired. Then generate and install a special Java zip file (gss_custom.zip) into your Oracle home directory.

On UNIX, enter a command similar to the following:

$ORACLE_HOME/JRE/bin/jre -classpath $ORACLE_HOME/jlib/gss-1_1.zip:


$ORACLE_HOME/jlib/gss_charset-1_2.zip Ginstall lx22710.nlt

On Windows, enter a command similar to the following:

%JREHOME%\bin\jre.exe -classpath %ORACLE_HOME%\jlib\gss-1_1.zip:


%ORACLE_HOME%\jlib\gss_charset-1_2.zip  Ginstall lx22710.nlt

%JREHOME% is the C:\Program Files\Oracle\jre\version_num directory.

lx22710.nlt is an example of an NLT file created by customizing a character set using the Oracle Locale Builder.

These commands generate a gss_custom.zip file in the current directory. If you need to add support for more than one customized character set, you can append their definitions to the same gss_custom.zip file by re-issuing the command for each of the additional customized character sets. For example, enter the following commands on UNIX:

$ORACLE_HOME/JRE/bin/jre -classpath $ORACLE_HOME/jlib/gss-1_1.zip:
               $ORACLE_HOME/jlib/gss_charset-1_2.zip Ginstall lx22710.nlt

$ORACLE_HOME/JRE/bin/jre -classpath $ORACLE_HOME/jlib/gss-1_1.zip:
               $ORACLE_HOME/jlib/gss_charset-1_2.zip Ginstall lx22711.nlt

$ORACLE_HOME/JRE/bin/jre -classpath $ORACLE_HOME/jlib/gss-1_1.zip:
               $ORACLE_HOME/jlib/gss_charset-1_2.zip Ginstall lx22712.nlt

lx22710.nlt, lx22711.nlt and lx22712.nlt are contained in gss_custom.zip.

After gss_custom.zip has been created, store it in the
$ORACLE_HOME/ocommon/nls/admin/data directory. Enter the following command:

% cp gss_custom.zip $ORACLE_HOME/ocommon/nls/admin/data

Adding the Custom Zip File to Java Components

You may want to add the gss_custom.zip file to the following Java components:

Java Virtual Machine

Load the zip file into the database.

Enter the following command on UNIX:

%loadjava -u sys/passwd -grant EXECUTE -synonym -r -r -v gss_custom.zip

Enter the following command on Windows:

loadjava -u sys/passwd -grant EXECUTE -synonym -r -r -v gss_custom.zip

Replace passwd by the password for SYS.

Oracle HTTP Server

Edit the jserv.properties file.

On UNIX, add the following line:

wrapper.classpath = $ORACLE_HOME/ocommon/nls/admin/data/gss_custom.zip

On Windows, add the following line:

wrapper.classpath = %ORA_HOME%\ocommon\nls\admin\data\gss_custom.zip

JDBC on the Client

Modify the CLASSPATH.

Enter the following command on UNIX:

% setenv CLASSPATH $ORACLE_HOME/ocommon/nls/admin/data/gss_custom.zip

On Windows, add %ORACLE_HOME%\ocommon\nls\admin\data\gss_custom.zip to the existing CLASSPATH.

Creating a New Linguistic Sort with the Oracle Locale Builder

This section shows how to create a new multilingual linguistic sort called MY_GENERIC_M with a Collation ID of 10001. The GENERIC_M linguistic is used as the basis for the new linguistic sort. Figure 12-25 shows how to begin.

Figure 12-25 Collation General Information

Text description of the illustration pic18.gif

Settings for the flags are automatically derived. SWAP_WITH_NEXT is relevant for Thai and Lao sorts. REVERSE_SECONDARY is for French sorts. CANONICAL_EQUIVALENCE determines whether canonical rules will be used. In this example, CANONICAL_EQUIVALENCE is checked.

The valid range for Collation ID (sort ID) for a user-defined sort is 1,000 to 2,000 for monolingual collation and 10,000 to 11,000 for multilingual collation.

See Also:

Figure 12-29, "Canonical Rules" for more information about canonical rules
Chapter 4, "Linguistic Sorting"

Figure 12-26 shows the Unicode Collation Sequence screen.

Figure 12-26 Unicode Collation Sequence

Text description of unicodec.gif follows.

Text description of the illustration unicodec.gif

This example customizes the character set by moving digits so that they sort after letters. Complete the following steps:

Highlight the Unicode value that you want to move. In Figure 12-26, the x0034 Unicode value is highlighted. Its location in the Unicode Collation Sequence is called a node.
Click Cut. Select the location where you want to move the node.
Click Paste. Clicking Paste opens the Paste Node dialog box, shown in Figure 12-27.

Figure 12-27 Paste Node Dialog Box

Text description of pastenod.gif follows.

Text description of the illustration pastenod.gif

The Paste Node dialog box enables you to choose whether to paste the node after or before the location you have selected. It also enables you to choose the level (Primary, Secondary, or Tertiary) of the node in relation to the node that you want to paste it next to.

Select the position and the level at which you want to paste the node.

In Figure 12-27, the After button and the Primary button are selected.
Click OK to paste the node.

Use similar steps to move other digits to a position after the letters a through z.

Figure 12-28 shows the resulting Unicode Collation Sequence after the digits 0 through 4 were moved to a position after the letters a through z.

Figure 12-28 Unicode Collation Sequence After Modification

Text description of the illustration afterpas.gif

The rest of this section contains the following topics:
- Changing the Sort Order for All Characters with the Same Diacritic
- Changing the Sort Order for One Character with a Diacritic
Changing the Sort Order for All Characters with the Same Diacritic

This example shows how to change the sort order for characters with diacritics. You can do this by changing the sort for all characters containing a particular diacritic or by changing one character at a time. This example changes the sort of all characters with a circumflex (for example, û) to be after all characters containing a tilde.

Verify the current sort order by choosing Canonical Rules in the Tools menu. This opens the Canonical Rules dialog box, shown in Figure 12-29.

Figure 12-29 Canonical Rules

Text description of the illustration ex1.gif

Figure 12-29 shows how characters are decomposed into their canonical equivalents and their current sorting orders. For example, û is represented as u plus ^.

See Also:
Chapter 4, "Linguistic Sorting" for more information about canonical rules

In the main Oracle Locale Builder window, click the Non-Spacing Characters tab. If you use the Non-Spacing Characters screen, then changes for diacritics apply to all characters. Figure 12-30 shows the Non-Spacing Characters screen.

Figure 12-30 Changing the Sort Order for All Characters with the Same Diacritic

Text description of the illustration ex2.gif

Select the circumflex and click Cut. Click Yes in the Removal Confirmation dialog box. Select the tilde and click Paste. Choose After and Secondary in the Paste Node dialog box and click OK.

Figure 12-31 illustrates the new sort order.

Figure 12-31 The New Sort Order for Characters with the Same Diacritic

Text description of the illustration ex3.gif

Changing the Sort Order for One Character with a Diacritic

To change the order of a specific character with a diacritic, insert the character directly into the appropriate position. Characters with diacritics do not appear in the Unicode Collation screen, so you cannot cut and paste them into the new location.

This example changes the sort order for ä so that it sorts after Z.

Select the Unicode Collation tab. Highlight the character, Z, that you want to put ä next to. Click Add. The Insert New Node dialog box appears, as shown in Figure 12-32.

Figure 12-32 Changing the Sort Order of One Character with a Diacritic

Text description of the illustration ex5.gif

Choose After and Primary in the Insert New Node dialog box. Enter the Unicode code point value of ä. The code point value is \x00e4. Click OK.

Figure 12-33 shows the resulting sort order.

Figure 12-33 New Sort Order After Changing a Single Character

Text description of the illustration ex4.gif

Generating and Installing NLB Files

After you have defined a new language, territory, character set, or linguistic sort, generate new NLB files from the NLT files:
1. Back up the NLS installation boot file (lx0boot.nlb) and the NLS system boot file (lx1boot.nlb) in the ORA_NLS33 directory. On a UNIX platform, enter commands similar to the following:
```
% cd $ORA_NLS33
% cp lx0boot.nlb lx0boot.nlb.orig
% cp lx1boot.nlb lx1boot.nlb.orig
```
2. In Oracle Locale Builder, choose Tools > Generate NLB or click the Generate NLB icon in the left side bar.
3. Click Browse to find the directory where the NLT file is located. The location dialog box is shown in Figure 12-34.
  
  Figure 12-34 Location Dialog Box
  
  Text description of the illustration ex6.gif
  
  Do not try to specify an NLT file. Oracle Locale Builder generates an NLB file for each NLT file.
4. Click OK to generate the NLB files.
  
  Figure 12-35 illustrates the final notification that you have successfully generated NLB files for all NLT files in the directory.
  
  Figure 12-35 NLB Generation Success Dialog Box
  
  Text description of the illustration ex7.gif
5. Copy the lx1boot.nlb file into the path that is specified by the ORA_NLS33 initialization parameter, typically $ORACLE_HOME/OCOMMON/nls/admin/data. For example, on a UNIX platform, enter a command similar to the following:
```
% cp /directory_name/lx1boot.nlb $ORA_NLS33/lx1boot.nlb
```
6. Copy the new NLB files into the ORA_NLS33 directory. For example, on a UNIX platform, enter commands similar to the following:
```
% cp /directory_name/lx22710.nlb $ORA_NLS33
% cp /directory_name/lx52710.nlb $ORA_NLA33
```
  Note:
  Oracle Locale Builder generates NLB files in the directory where the NLT files reside.
7. Repeat the preceding steps on each hardware platform. NLB files are platform-specific binary files. You must compile and install the new NLB files on both the server and the client machines.
8. Restart the database to use the newly created locale data.
9. To use the new locale data on the client side, exit the client and re-invoke the client after installing the NLB files.

12 Customizing Locale Data

Overview of the Oracle Locale Builder Utility

Configuring Unicode Fonts for the Oracle Locale Builder

Font Configuration on Windows

Font Configuration on Other Platforms

The Oracle Locale Builder User Interface

Figure 12-1 Oracle Locale Builder Utility

Oracle Locale Builder Screens and Dialog Boxes

Existing Definitions Dialog Box

Figure 12-2 Existing Definitions Dialog Box

Session Log Dialog Box

Figure 12-3 Session Log Dialog Box

Preview NLT Screen

Figure 12-4 Previewing the NLT File

Open File Dialog Box

Figure 12-5 Open File Dialog Box

Creating a New Language Definition with the Oracle Locale Builder

Figure 12-6 Language General Information

Figure 12-7 Language Definition Month Information

Figure 12-8 Language Definition Type Information

Creating a New Territory Definition with the Oracle Locale Builder

Figure 12-9 Defining a New Territory

Figure 12-10 Choosing a Calendar Format

Figure 12-11 Choosing Date and Time Formats

Figure 12-12 Choosing Number Formats

Figure 12-13 Choosing Currency Formats

Customizing Time Zone Data

Customizing Calendars with the NLS Calendar Utility

Displaying a Code Chart with the Oracle Locale Builder

Figure 12-14 Opening Screen for Oracle Locale Builder

Figure 12-15 General Character Set Screen

Figure 12-16 Choosing US7ASCII in the Existing Definitions Dialog Box

Figure 12-17 General Screen When US7ASCII Has Been Loaded

Figure 12-18 Character Data Mapping for US7ASCII

Figure 12-19 US7ASCII Code Chart

Creating a New Character Set Definition with the Oracle Locale Builder

Character Sets with User-Defined Characters

Table 12-1 Shift JIS User-Defined Character Ranges

Table 12-2 Oracle Character Sets with User-Defined Character Ranges

Oracle Character Set Conversion Architecture

Figure 12-20 Japanese Kanji Character

Unicode 3.1 Private Use Area

User-Defined Character Cross-References Between Character Sets

Guidelines for Creating a New Character Set from an Existing Character Set

Example: Creating a New Character Set Definition with the Oracle Locale Builder

Figure 12-21 Character Set General Information

Figure 12-22 Character Set Type Specification

Figure 12-23 Importing User-Defined Character Data

Figure 12-24 New Characters in the Character Set

Supporting User-Defined Characters in Java

Adding the Custom Zip File to Java Components

Java Virtual Machine

Oracle HTTP Server

JDBC on the Client

Creating a New Linguistic Sort with the Oracle Locale Builder

Figure 12-25 Collation General Information

Figure 12-26 Unicode Collation Sequence

Figure 12-27 Paste Node Dialog Box

Figure 12-28 Unicode Collation Sequence After Modification

Changing the Sort Order for All Characters with the Same Diacritic

Figure 12-29 Canonical Rules

Figure 12-30 Changing the Sort Order for All Characters with the Same Diacritic

Figure 12-31 The New Sort Order for Characters with the Same Diacritic

Changing the Sort Order for One Character with a Diacritic

Figure 12-32 Changing the Sort Order of One Character with a Diacritic

Figure 12-33 New Sort Order After Changing a Single Character

Generating and Installing NLB Files

Figure 12-34 Location Dialog Box

Figure 12-35 NLB Generation Success Dialog Box

12
Customizing Locale Data