Oracle® C++ Call Interface Programmer's Guide, 11g Release 1 (11.1) Part Number B28390-01 |
|
|
View PDF |
This chapter describes OCCI support for multibyte and Unicode charactersets.
This chapter contains these topics:
OCCI now enables application development in all Oracle supported multibyte and Unicode charactersets. The UTF16 encoding of Unicode is fully supported. Application programs can specify their charactersets when the OCCI Environment is created. OCCI interfaces that take character string arguments (such as SQL statements, username, s, error messages, object names, and so on) have been extended to handle data in any characterset. Character data from relational tables or objects can be in any characterset. OCCI can be used to develop multi-lingual, global and Unicode applications.
OCCI applications need to specify the client characterset and client national characterset when initializing the OCCI Environment. The client characterset specifies the characterset for all SQL statements, object/user names, error messages, and data of all CHAR
datatype (CHAR
, VARCHAR2
, LONG
) columns/attributes. The client national characterset specifies the characterset for data of all NCHAR
datatype (NCHAR
, NVARCHAR2
) columns/attributes.
A new createEnvironment()
interface that takes the client characterset and client national characterset is now provided. This allows OCCI applications to set characterset information dynamically, independent of the NLS_LANG
and NLS_CHAR
initialization parameter.
Example 9-1 How to Use Globalization and Unicode Support
Environment *env = Environment:createEnvironment("JA16SJIS","UTF8");
This statement creates a OCCI Environment with JA16SJIS as the client characterset and UTF8 as the client national characterset.
Any valid Oracle characterset name (except 'AL16UTF16') can be passed to createEnvironment(). A OCCI specific string "OCCIUTF16" (in uppercase) can be passed to specify UTF16 as the characterset.
Environment *env = Environment::createEnvironment("OCCIUTF16","OCCIUTF16"); Environment *env = Environment::createEnvironment("US7ASCII", "OCCIUTF16");
Note: If an application specifies "OCCIUTF16" as the client characterset (first argument), then the application should use only the UTF16 interfaces of OCCI. These interfaces takeUString argument types
The charactersets in the OCCI Environment are client-side only. They indicate the charactersets the OCCI application uses to interact with Oracle. The database characterset and database national characterset are specified when the database is created. Oracle converts all data from the client characterset/national characterset to the database characterset/national characterset before the server processes the data. |
The datatypes used for supporting globalization and use of unicode include UString Datatype, Multibyte and UTF16 data, and CLOB and NCLOB Datatypes.
UString
is a datatype that enables applications and the OCCI library to pass and receive Unicode data in UTF-16 encoding. UString
is templated from the C++ STL basic_string
with Oracle's utext
datatype.
typedef basic_string<utext> UString;
Oracle's utext
datatype is a 2 byte short datatype and represents Unicode characters in the UTF-16 encoding. A Unicode character's codepoint can be represented in 1 utext
or 2 utext
s (2 or 4 bytes). Characters from European and most Asian scripts are represented in a single utext. Supplementary characters defined in the Unicode 3.1 standard are represented with 2 utext
elements.
In Microsoft Windows platforms, UString
is equivalent to the C++ standard wstring
datatype. This is because the wchar_t
datatype is type defined to a 2 byte short
in these platforms, which is same as Oracle's utext
, allowing applications to use a wstring
type variable where a UString
would be normally required. Consequently, applications can also pass wide-character string literals, created by prefixing the literal with the letter 'L', to OCCI Unicode APIs.
Example 9-2 Using wstring Datatype
//bind Unicode data using wstring datatype
//binding the Euro symbol, UTF16 codepoint 0x20AC
wchar_t eurochars[] = {0x20AC,0x00};
wstring eurostr(eurochars);
stmt->setUString(1,eurostr);
//Call the Unicode version of createConnection by
//passing widechar literals
Connection *conn = env->createConnection(L"HR",L"password",L"");
OCCI applications should use the UString datatype for data in UTF16 characterset
For data in multibyte charactersets like JA16SJIS and UTF8, applications should use the C++ string
type. The existing OCCI APIs that take string
arguments can handle data in any multibyte characterset. Due to the use of string
type, OCCI supports only byte length semantics for multibyte characterset strings
.
Example 9-3 Binding UTF8 Data Using the string Datatype
//bind UTF8 data //binding the Euro symbol, UTF8 codepoint : 0xE282AC char eurochars[] = {0xE2,0x82,0xAC,0x00}; string eurostr(eurochars) stmt->setString(1,eurostr);//use the string interface
For Unicode data in the UTF16 characterset, the OCCI specific datatype: UString and the OCCI UTF16 interfaces should be used.
Oracle provides the CLOB
and NCLOB
datatypes for storing and processing large amounts of character data. CLOB
s represent data in the database characterset and NCLOB
s represent data in the database national characterset. CLOB
s and NCLOB
s can be used as column types in relational tables and as attributes in object types.
The OCCI Clob
class is used to work with both CLOB
and NCLOB
datatypes. If the database type is NCLOB
, then the Clob
set CharSetForm()
method should be called with OCCI_SQLCS_NCHAR
before reading/writing from the LOB
.
The OCCI Clob
class has support for multibyte and UTF16 charactersets. By default, the Clob
interfaces assume the data is encoded in the client-side characterset (for both CLOB
s and NCLOB
s). To specify a different characterset or to specify the client-side national characterset for a NCLOB
, call the setCharSetId()
or setCharSetIdUString()
methods with the appropriate characterset. The OCCI specific string 'OCCIUTF16' can be passed to indicate UTF16 as the characterset.
Example 9-5 Using CLOB and NCLOB Datatypes
//client characterset - ZHT16BIG5, national characterset - UTF16 Environment *env = Environment::createEnvironment("ZHT16BIG5","OCCIUTF16");... Clob nclobvar; //for NCLOBs, need to call setCharSetForm method. nclobvar.setCharSetForm(OCCI_SQLCS_NCHAR);... //if reading/writing data in UTF16 for this NCLOB, still need to //explicitly call setCharSetId nclobvar.setCharSetId("OCCIUTF16")
To read or write data in multibyte charactersets, use the existing read and write interfaces that take a char buffer. New overloaded interfaces that take utext buffers for UTF16 data have been added to the Clob Class as read()
, write()
and writeChunk()
methods. The arguments and return values for these methods are either bytes or characters, depending on the characterset of the LOB
.
Multibyte and UTF16 charactersets are supported for handling character data in object attributes. All CHAR
datatype (CHAR
/VARCHAR2
) attributes hold data in the client-side characterset, while all NCHAR
datatype (NCHAR
/NVARCHAR2
) attributes hold data in the client-side national characterset. A member variable of UString
datatype represents an attribute in UTF16 characterset.
See Also:
|