Insert dữ liệu vào oracle lỗi utf-8 năm 2024
Oracle offers several database access products for inserting and retrieving Unicode data. Oracle offers database access products for commonly used programming environments such as Java and C/C++. Data is transparently converted between the database and client programs, which ensures that client programs are independent of the database character set and national character set. In addition, client programs are sometimes even independent of the character data type, such as Show INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 or INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4, used in the database. To avoid overloading the database server with data conversion operations, Oracle always tries to move them to the client side database access products. In a few cases, data must be converted in the database, which affects performance. This chapter discusses details of the data conversion paths. 7.1.1 Database Access Product Stack and UnicodeOracle offers a comprehensive set of database access products that enable programs from different development environments to access Unicode data stored in the database. These products are listed in the following table. Table 7-1 Oracle Database Access Products Programming Environment Oracle Database Access Products C/C++ Oracle Call Interface (OCI) Oracle Pro*C/C++ Oracle ODBC driver Oracle Provider for OLE DB Oracle Data Provider for .NET Java Oracle JDBC OCI or thin driver Oracle server-side thin driver Oracle server-side internal driver PL/SQL Oracle PL/SQL and SQL Visual Basic/C# Oracle ODBC driver Oracle Provider for OLE DB The following figure shows how the database access products can access the database. The Oracle Call Interface (OCI) is the lowest level API that the rest of the client-side database access products use. It provides a flexible way for C/C++ programs to access Unicode data stored in SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 and INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types. Using OCI, you can programmatically specify the character set (UTF-8, UTF-16, and others) for the data to be inserted or retrieved. It accesses the database through Oracle Net. Oracle Pro*C/C++ enables you to embed SQL and PL/SQL in your programs. It uses OCI's Unicode capabilities to provide UTF-16 and UTF-8 data access for SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 and INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types. The Oracle ODBC driver enables C/C++, Visual Basic, and VBScript programs running on Windows platforms to access Unicode data stored in SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 and INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types of the database. It provides UTF-16 data access by implementing the SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 1 interface specified in the ODBC standard specification. The Oracle Provider for OLE DB enables C/C++, Visual Basic, and VBScript programs running on Windows platforms to access Unicode data stored in SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 and INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types. It provides UTF-16 data access through wide string OLE DB data types. The Oracle Data Provider for .NET enables programs running in any .NET programming environment on Windows platforms to access Unicode data stored in SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 and INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types. It provides UTF-16 data access through Unicode data types. Oracle JDBC drivers are the primary Java programmatic interface for accessing an Oracle database. Oracle provides the following JDBC drivers:
All drivers support Unicode data access to SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 and INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types in the database. The PL/SQL and SQL engines process PL/SQL programs and SQL statements on behalf of client-side programs such as OCI and server-side PL/SQL stored procedures. They allow PL/SQL programs to declare INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4, SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 9, INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3, and NAME John Smith 1 variables and to access SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 and INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types in the database. The following sections describe how each of the database access products supports Unicode data access to an Oracle database and offer examples for using those products: 7.2 SQL and PL/SQL Programming with Unicode7.2.1 SQL NCHAR Data Types7.2.1.1 The NCHAR Data TypeWhen you define a table column or a PL/SQL variable as the INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data type, the length is always specified as the number of characters. For example, the following statement creates a column with a maximum length of 30 characters: CREATE TABLE table1 (column1 NCHAR(30)); The maximum number of bytes for the column is determined as follows: maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) For example, if the national character set is UTF8, then the maximum byte length is 30 characters times 3 bytes for each character, or 90 bytes. The national character set, which is used for all INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types, is defined when the database is created. The national character set can be either UTF8 or AL16UTF16. The default is AL16UTF16. The maximum column size allowed is 32000 characters when the national character set is UTF8 and 8000 when it is AL16UTF16. The actual data is subject to the maximum byte limit of 16000. The two size constraints must be satisfied at the same time. In PL/SQL, the maximum length of INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data is 32767 bytes. You can define an INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 variable of up to 32767 characters, but the actual data cannot exceed 32767 bytes. If you insert a value that is shorter than the column length, then Oracle pads the value with blanks to whichever length is smaller: maximum character length or maximum byte length. Note: UTF8 may affect performance because it is a variable-width character set. Excessive blank padding of INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 fields decreases performance. Consider using the NAME John Smith 1 data type or changing to the AL16UTF16 character set for the INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data type. 7.2.1.2 The NVARCHAR2 Data TypeThe NAME John Smith 1 data type specifies a variable length character string that uses the national character set. When you create a table with an NAME John Smith 1 column, you specify the maximum number of characters for the column. Lengths for NAME John Smith 1 are always in units of characters, just as for INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3. Oracle subsequently stores each value in the column exactly as you specify it, if the value does not exceed the column's maximum length. Oracle does not pad the string value to the maximum length. The maximum length for the NAME John Smith 1 type is 4000 characters if DECLARE ndatestring NVARCHAR2(20) := N'12-SEP-1975'; ndstr NVARCHAR2(50); BEGIN SELECT name INTO ndstr FROM customers WHERE (birthdate)> TO_DATE(ndatestring, 'DD-MON-YYYY', NLS_DATE_LANGUAGE = 'AMERICAN'); END; 6 DECLARE ndatestring NVARCHAR2(20) := N'12-SEP-1975'; ndstr NVARCHAR2(50); BEGIN SELECT name INTO ndstr FROM customers WHERE (birthdate)> TO_DATE(ndatestring, 'DD-MON-YYYY', NLS_DATE_LANGUAGE = 'AMERICAN'); END; 7 DECLARE ndatestring NVARCHAR2(20) := N'12-SEP-1975'; ndstr NVARCHAR2(50); BEGIN SELECT name INTO ndstr FROM customers WHERE (birthdate)> TO_DATE(ndatestring, 'DD-MON-YYYY', NLS_DATE_LANGUAGE = 'AMERICAN'); END; 8 or 32767 characters if DECLARE ndatestring NVARCHAR2(20) := N'12-SEP-1975'; ndstr NVARCHAR2(50); BEGIN SELECT name INTO ndstr FROM customers WHERE (birthdate)> TO_DATE(ndatestring, 'DD-MON-YYYY', NLS_DATE_LANGUAGE = 'AMERICAN'); END; 6 DECLARE ndatestring NVARCHAR2(20) := N'12-SEP-1975'; ndstr NVARCHAR2(50); BEGIN SELECT name INTO ndstr FROM customers WHERE (birthdate)> TO_DATE(ndatestring, 'DD-MON-YYYY', NLS_DATE_LANGUAGE = 'AMERICAN'); END; 7 SELECT INSTR(name, N'Sm', 1, 1) FROM customers; 1. These lengths are based on using UTF8; the values are 2000 and 16383 characters when using AL16UTF16. In PL/SQL, the maximum length for an NAME John Smith 1 variable is 32767 bytes. You can define NAME John Smith 1 variables up to 32767 characters, but the actual data cannot exceed 32767 bytes. The following statement creates a table with one NAME John Smith 1 column whose maximum length in characters is 2000 and maximum length in bytes is 4000. CREATE TABLE table2 (column2 NVARCHAR2(2000)); 7.2.1.3 The NCLOB Data TypeSELECT INSTR(name, N'Sm', 1, 1) FROM customers; 5 is a character large object containing Unicode characters, with a maximum size of 4 gigabytes. Unlike the SELECT INSTR(name, N'Sm', 1, 1) FROM customers; 6 data type, the SELECT INSTR(name, N'Sm', 1, 1) FROM customers; 5 data type has full transactional support so that changes made through SQL, the SELECT INSTR(name, N'Sm', 1, 1) FROM customers; 8 package, or OCI participate fully in transactions. Manipulations of SELECT INSTR(name, N'Sm', 1, 1) FROM customers; 5 value can be committed and rolled back. Note, however, that you cannot save an SELECT INSTR(name, N'Sm', 1, 1) FROM customers; 5 locator in a PL/SQL or OCI variable in one transaction and then use it in another transaction or session. SELECT INSTR(name, N'Sm', 1, 1) FROM customers; 5 values are stored in the database in a format that is compatible with UCS-2, regardless of the national character set. Oracle translates the stored Unicode value to the character set requested on the client or on the server, which can be fixed-width or variable-width. When you insert data into an SELECT INSTR(name, N'Sm', 1, 1) FROM customers; 5 column using a variable-width character set, Oracle converts the data into a format that is compatible with UCS-2 before storing it in the database. 7.2.2 Implicit Data Type Conversion Between NCHAR and Other Data TypesOracle supports implicit conversions between SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types and other Oracle data types, such as INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4, SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 9, SELECT CONCAT(name,id) FROM customers; 6, SELECT CONCAT(name,id) FROM customers; 7, SELECT CONCAT(name,id) FROM customers; 8, and SELECT CONCAT(name,id) FROM customers; 9. Any implicit conversions for INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 and SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 9 data types are also supported for SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types. You can use SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types the same way as SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 data types. Type conversions between SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 data types and SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types may involve character set conversion when the database and national character sets are different. Padding with blanks may occur if the target data is either INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 or INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3. 7.2.3 Exception Handling for Data Loss During Data Type ConversionData loss can occur during data type conversion when character set conversion is necessary. If a character in the source character set is not defined in the target character set, then a replacement character is used in its place. For example, if you try to insert INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data into a regular INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 column and the character data in INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 (Unicode) form cannot be converted to the database character set, then the character is replaced by a replacement character defined by the database character set. The maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 12 initialization parameter controls the behavior of data loss during character type conversion. When this parameter is set to maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 13, any SQL statements that result in data loss return an maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 14 error and the corresponding operation is stopped. When this parameter is set to maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 15, data loss is not reported and the unconvertible characters are replaced with replacement characters. The default value is maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 15. This parameter works for both implicit and explicit conversion. In PL/SQL, when data loss occurs during conversion of SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 and INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types, the maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 19 exception is raised for both implicit and explicit conversion. 7.2.4 Rules for Implicit Data Type ConversionIn some cases, conversion between data types is possible in only one direction. In other cases, conversion in both directions is possible. Oracle defines a set of rules for conversion between data types. The following table contains the rules for conversion between data types. Table 7-2 Rules for Conversion Between Data Types Statement Rule maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 20/ maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 21 statement Values are converted to the data type of the target database column. maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 22 maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 23 statement Data from the database is converted to the data type of the target variable. Variable assignments Values on the right of the equal sign are converted to the data type of the target variable on the left of the equal sign. Parameters in SQL and PL/SQL functions INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4, SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 9, INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3, and NAME John Smith 1 are loaded the same way. An argument with a INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4, SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 9, INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 or NAME John Smith 1 data type is compared to a formal parameter of any of the INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4, SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 9, INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 or NAME John Smith 1 data types. If the argument and formal parameter data types do not match exactly, then implicit conversions are introduced when data is copied into the parameter on function entry and copied out to the argument on function exit. Concatenation || operation or maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 36 function If one operand is a SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 or INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data type and the other operand is a SELECT CONCAT(name,id) FROM customers; 6 or other non-character data type, then the other data type is converted to SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 9 or NAME John Smith 1. For concatenation between character data types, see "". SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 or INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types and SELECT CONCAT(name,id) FROM customers; 6 data type Character values are converted to SELECT CONCAT(name,id) FROM customers; 6 data type. SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 or INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types and SELECT CONCAT(name,id) FROM customers; 7 data type Character values are converted to SELECT CONCAT(name,id) FROM customers; 7 data type. SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 or INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types and SELECT CONCAT(name,id) FROM customers; 8 data type Character values are converted to SELECT CONCAT(name,id) FROM customers; 8 data type. SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types and SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 data types Comparisons between SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types and SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 data types are more complex because they can be encoded in different character sets. When INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 and SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 9 values are compared, the INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 values are converted to SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 9 values. When INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 and NAME John Smith 1 values are compared, the INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 values are converted to NAME John Smith 1 values. When there is comparison between SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types and SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 data types, character set conversion occurs if they are encoded in different character sets. The character set for SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types is always Unicode and can be either UTF8 or AL16UTF16 encoding, which have the same character repertoires but are different encodings of the Unicode standard. SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 data types use the database character set, which can be any character set that Oracle supports. Unicode is a superset of any character set supported by Oracle, so SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 data types can always be converted to SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types without data loss. 7.2.5 SQL Functions for Unicode Data TypesSQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types can be converted to and from SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 data types and other data types using explicit conversion functions. The examples in this section use the table created by the following statement: CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); Example 7-1 Populating the Customers Table Using the TO_NCHAR Function The maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 74 function converts the data at run time, while the maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 75 function converts the data at compilation time. INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); Example 7-2 Selecting from the Customer Table Using the TO_CHAR Function The following statement converts the values of maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 76 from characters in the national character set to characters in the database character set before selecting them according to the maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 77 clause: SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; You should see the following output: NAME John Smith Example 7-3 Selecting from the Customer Table Using the TO_DATE Function Using the maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 75 function shows that either INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 or INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 data can be passed as parameters for the maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 81 function. The data types can mixed because they are converted at run time. DECLARE ndatestring NVARCHAR2(20) := N'12-SEP-1975'; ndstr NVARCHAR2(50); BEGIN SELECT name INTO ndstr FROM customers WHERE (birthdate)> TO_DATE(ndatestring, 'DD-MON-YYYY', NLS_DATE_LANGUAGE = 'AMERICAN'); END; As demonstrated in , SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data can be passed to explicit conversion functions. SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 and INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data can be mixed together when using multiple string parameters. 7.2.6 Other SQL FunctionsMost SQL functions can take arguments of SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types as well as mixed character data types. The return data type is based on the type of the first argument. If a non-string data type like SELECT CONCAT(name,id) FROM customers; 6 or SELECT CONCAT(name,id) FROM customers; 7 is passed to these functions, then it is converted to SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 9. The following examples use the maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 89 table created in "". Example 7-4 INSTR Function In this example, the string literal maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 90 is converted to NAME John Smith 1 and then scanned by maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 92, to detect the position of the first occurrence of this string in maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 76. SELECT INSTR(name, N'Sm', 1, 1) FROM customers; Example 7-5 CONCAT Function SELECT CONCAT(name,id) FROM customers; maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 94 is converted to NAME John Smith 1 and then concatenated with maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 76. Example 7-6 RPAD Function maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 0 The following output results: maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 1 The space character ' ' is converted to the corresponding character in the INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 character set and then padded to the right of maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 76 until the total display length reaches 100. 7.2.7 Unicode String LiteralsYou can input Unicode string literals in SQL and PL/SQL as follows:
The last two methods can be used to encode any Unicode string literals. 7.2.8 NCHAR String Literal ReplacementThis section provides information on how to avoid data loss when performing INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 string literal replacement. Being part of a SQL or PL/SQL statement, the text of any literal, with or without the prefix maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 75, is encoded in the same character set as the rest of the statement. On the client side, the statement is in the client character set, which is determined by the client character set defined in CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26, or specified in the CREATE TABLE table2 (column2 NVARCHAR2(2000)); 27 call, or predefined as UTF-16 in JDBC. On the server side, the statement is in the database character set.
7.2.9 Using the UTL_FILE Package with NCHAR DataThe CREATE TABLE table2 (column2 NVARCHAR2(2000)); 36 package handles Unicode national character set data of the NAME John Smith 1 data type. INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 and SELECT INSTR(name, N'Sm', 1, 1) FROM customers; 5 are supported through implicit conversion. The functions and procedures include the following:
The above functions and procedures process text files encoded in the UTF8 character set, that is, in the Unicode CESU-8 encoding. See "" for more information about CESU-8. The functions and procedures convert between UTF8 and the national character set of the database, which can be UTF8 or AL16UTF16, as needed. 7.3 OCI Programming with Unicode7.3.1 OCIEnvNlsCreate() Function for Unicode ProgrammingThe CREATE TABLE table2 (column2 NVARCHAR2(2000)); 27 function is used to specify a SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 character set and a SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 character set when the OCI environment is created. It is an enhanced version of the CREATE TABLE table2 (column2 NVARCHAR2(2000)); 61 function and has extended arguments for two character set IDs. The OCI_UTF16ID UTF-16 character set ID replaces the Unicode mode introduced in Oracle9i release 1 (9.0.1). For example: maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 2 The Unicode mode, in which the OCI_UTF16 flag is used with the CREATE TABLE table2 (column2 NVARCHAR2(2000)); 61 function, is deprecated. When OCI_UTF16ID is specified for both SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 and SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 character sets, all metadata and bound and defined data are encoded in UTF-16. Metadata includes SQL statements, user names, error messages, and column names. Thus, all inherited operations are independent of the CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26 setting, and all metatext data parameters ( CREATE TABLE table2 (column2 NVARCHAR2(2000));
CREATE TABLE table2 (column2 NVARCHAR2(2000));
To prepare the SQL statement when the CREATE TABLE table2 (column2 NVARCHAR2(2000)); 68 function is initialized with the OCI_UTF16ID character set ID, call the CREATE TABLE table2 (column2 NVARCHAR2(2000)); 69 function with a CREATE TABLE table2 (column2 NVARCHAR2(2000)); 70 string. The following example runs on the Windows platform only. You may need to change CREATE TABLE table2 (column2 NVARCHAR2(2000)); 71 data types for other platforms. maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 3 To bind and define data, you do not have to set the CREATE TABLE table2 (column2 NVARCHAR2(2000)); 72 attribute because the CREATE TABLE table2 (column2 NVARCHAR2(2000)); 68 function has already been initialized with UTF-16 character set IDs. The bind variable names also must be UTF-16 strings. maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 4 The CREATE TABLE table2 (column2 NVARCHAR2(2000)); 74 function performs the operation. 7.3.2 OCI Unicode Code ConversionUnicode character set conversions take place between an OCI client and the database server if the client and server character sets are different. The conversion occurs on either the client or the server depending on the circumstances, but usually on the client side. 7.3.2.1 Data IntegrityYou can lose data during conversion if you call an OCI API inappropriately. If the server and client character sets are different, then you can lose data when the destination character set is a smaller set than the source character set. You can avoid this potential problem if both character sets are Unicode character sets (for example, UTF8 and AL16UTF16). When you bind or define SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types, you should set the CREATE TABLE table2 (column2 NVARCHAR2(2000)); 76 attribute to CREATE TABLE table2 (column2 NVARCHAR2(2000)); 77. Otherwise, you can lose data because the data is converted to the database character set before converting to or from the national character set. This occurs only if the database character set is not Unicode. 7.3.2.2 OCI Performance Implications When Using UnicodeRedundant data conversions can cause performance degradation in your OCI applications. These conversions occur in two cases:
To avoid performance problems, you should always set CREATE TABLE table2 (column2 NVARCHAR2(2000)); 76 correctly, based on the data type of the target columns. If you do not know the target data type, then you should set the CREATE TABLE table2 (column2 NVARCHAR2(2000)); 76 attribute to CREATE TABLE table2 (column2 NVARCHAR2(2000)); 77 when binding and defining. The following table contains information about OCI character set conversions. Table 7-3 OCI Character Set Conversions Data Types for OCI Client Buffer OCI_ATTR_CHARSET_FORM Data Types of the Target Column in the Database Conversion Between Comments CREATE TABLE table2 (column2 NVARCHAR2(2000)); 86 CREATE TABLE table2 (column2 NVARCHAR2(2000)); 87 CREATE TABLE table2 (column2 NVARCHAR2(2000)); 88 SELECT CONCAT(name,id) FROM customers; 9 UTF-16 and database character set in OCI No unexpected data loss CREATE TABLE table2 (column2 NVARCHAR2(2000)); 86 CREATE TABLE table2 (column2 NVARCHAR2(2000)); 77 CREATE TABLE table2 (column2 NVARCHAR2(2000)); 92 SELECT INSTR(name, N'Sm', 1, 1) FROM customers; 5 UTF-16 and national character set in OCI No unexpected data loss CREATE TABLE table2 (column2 NVARCHAR2(2000)); 86 CREATE TABLE table2 (column2 NVARCHAR2(2000)); 77 CREATE TABLE table2 (column2 NVARCHAR2(2000)); 88 SELECT CONCAT(name,id) FROM customers; 9 UTF-16 and national character set in OCI National character set and database character set in database server No unexpected data loss, but may degrade performance because the conversion goes through the national character set CREATE TABLE table2 (column2 NVARCHAR2(2000)); 86 CREATE TABLE table2 (column2 NVARCHAR2(2000)); 87 CREATE TABLE table2 (column2 NVARCHAR2(2000)); 92 SELECT INSTR(name, N'Sm', 1, 1) FROM customers; 5 UTF-16 and database character set in OCI Database character set and national character set in database server Data loss may occur if the database character set is not Unicode CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 02 CREATE TABLE table2 (column2 NVARCHAR2(2000)); 87 CREATE TABLE table2 (column2 NVARCHAR2(2000)); 88 SELECT CONCAT(name,id) FROM customers; 9 CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26 character set and database character set in OCI No unexpected data loss CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 02 CREATE TABLE table2 (column2 NVARCHAR2(2000)); 77 CREATE TABLE table2 (column2 NVARCHAR2(2000)); 92 SELECT INSTR(name, N'Sm', 1, 1) FROM customers; 5 CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26 character set and national character set in OCI No unexpected data loss CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 02 CREATE TABLE table2 (column2 NVARCHAR2(2000)); 77 CREATE TABLE table2 (column2 NVARCHAR2(2000)); 88 SELECT CONCAT(name,id) FROM customers; 9 CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26 character set and national character set in OCI National character set and database character set in database server No unexpected data loss, but may degrade performance because the conversion goes through the national character set CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 02 CREATE TABLE table2 (column2 NVARCHAR2(2000)); 87 CREATE TABLE table2 (column2 NVARCHAR2(2000)); 92 SELECT INSTR(name, N'Sm', 1, 1) FROM customers; 5 CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26 character set and database character set in OCI Database character set and national character set in database server Data loss may occur because the conversion goes through the database character set 7.3.2.3 OCI Unicode Data ExpansionData conversion can result in data expansion, which can cause a buffer to overflow. For binding operations, you must set the CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 22 attribute to a large enough size to hold the expanded data on the server. If this is difficult to do, then you must consider changing the table schema. For defining operations, client applications must allocate enough buffer space for the expanded data. The size of the buffer should be the maximum length of the expanded data. You can estimate the maximum buffer length with the following calculation:
This method is the simplest and quickest way, but it may not be accurate and can waste memory. It is applicable to any character set combination. For example, for UTF-16 data binding and defining, the following example calculates the client buffer: maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 5 7.3.3 Setting UTF-8 to the NLS_LANG Character Set in OCIFor OCI client applications that support Unicode UTF-8 encoding, use AL32UTF8 to specify the CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26 character set, unless the database character set is UTF8. Use UTF8 if the database character set is UTF8. Do not set CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26 to AL16UTF16, because AL16UTF16 is the national character set for the server. If you need to use UTF-16, then you should specify the client character set to CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 25, using the CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 26 function when binding or defining data. 7.3.4 Binding and Defining SQL CHAR Data Types in OCITo specify a Unicode character set for binding and defining data with SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 data types, you may need to call the CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 26 function to set the appropriate character set ID after CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 29 or CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 30 APIs. There are two typical cases:
7.3.5 Binding and Defining SQL NCHAR Data Types in OCIOracle recommends that you access SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types using UTF-16 binding or defining when using OCI. Beginning with Oracle9i, SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types are Unicode data types with an encoding of either UTF8 or AL16UTF16. To access data in SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types, set the CREATE TABLE table2 (column2 NVARCHAR2(2000)); 76 attribute to CREATE TABLE table2 (column2 NVARCHAR2(2000)); 77 between binding or defining and execution so that it performs an appropriate data conversion without data loss. The length of data in SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types is always in the number of Unicode code units. The following program is a typical example of inserting and fetching data against an INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data column: maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 8 7.3.6 Handling SQL NCHAR String Literals in OCIBy default, the INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 literal replacement is not enabled in OCI. You can enable it in OCI by setting the environment variable CREATE TABLE table2 (column2 NVARCHAR2(2000)); 34 to maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 13. You can also enable literal replacement programmatically in OCI by using the CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 56 and CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 57 modes in CREATE TABLE table2 (column2 NVARCHAR2(2000)); 61 and CREATE TABLE table2 (column2 NVARCHAR2(2000)); 27. For example, CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 60 enables INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 literal replacement and CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 62 disables it. As an example, consider the following statement: maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 9 Note: When INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 literal replacement is enabled, CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 64 and CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 65 transform CREATE TABLE table2 (column2 NVARCHAR2(2000)); 30 literals with CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 67 literals in the SQL text and store the resulting SQL text in the statement handle. Thus, if an application uses CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 68 to retrieve the SQL text from the OCI statement handle, the SQL text returns CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 67 instead of CREATE TABLE table2 (column2 NVARCHAR2(2000)); 30 as specified in the original text. See Also:
7.3.7 Binding and Defining CLOB and NCLOB Unicode Data in OCIIn order to write (bind) and read (define) UTF-16 data for SELECT CONCAT(name,id) FROM customers; 9 or SELECT INSTR(name, N'Sm', 1, 1) FROM customers; 5 columns, the UTF-16 character set ID must be specified as CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 73 and CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 74. When you write UTF-16 data into a SELECT CONCAT(name,id) FROM customers; 9 column, call CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 73 as follows: CREATE TABLE table2 (column2 NVARCHAR2(2000)); 0 The CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 77 parameter is the data length in number of Unicode code units. The CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 78 parameter indicates the offset of data from the beginning of the data column. The CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 79 parameter must be set for UTF-16 data. To read UTF-16 data from SELECT CONCAT(name,id) FROM customers; 9 columns, call CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 74 as follows: CREATE TABLE table2 (column2 NVARCHAR2(2000)); 1 The data length is always represented in the number of Unicode code units. Note one Unicode supplementary character is counted as two code units, because the encoding is UTF-16. After binding or defining a CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 82 column, you can measure the data length stored in the CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 82 column using CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 84. The returning value is the data length in the number of code units if you bind or define as UTF-16. CREATE TABLE table2 (column2 NVARCHAR2(2000)); 2 If you are using an SELECT INSTR(name, N'Sm', 1, 1) FROM customers; 5, then you must set CREATE TABLE table2 (column2 NVARCHAR2(2000)); 76 to CREATE TABLE table2 (column2 NVARCHAR2(2000)); 77. 7.4 Pro*C/C++ Programming with UnicodePro*C/C++ provides the following ways to insert or retrieve Unicode data into or from the database:
Pro*C/C++ does not use the Unicode OCI API for SQL text. As a result, embedded SQL text must be encoded in the character set specified in the CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26 environment variable. This section contains the following topics: 7.4.1 Pro*C/C++ Data Conversion in UnicodeData conversion occurs in the OCI layer, but it is the Pro*C/C++ preprocessor that instructs OCI which conversion path should be taken based on the data types used in a Pro*C/C++ program. The following table shows the conversion paths. Table 7-4 Pro*C/C++ Bind and Define Data Conversion Pro*C/C++ Data Type SQL Data Type Conversion Path CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 88 or CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 02 INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26 character set to and from the database character set happens in OCI CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 88 or CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 02 INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26 character set to and from database character set happens in OCI Database character set to and from national character set happens in database server CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 95 INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26 character set to and from national character set happens in OCI CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 95 INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26 character set to and from national character set happens in OCI National character set to and from database character set in database server CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 92 or CREATE TABLE table2 (column2 NVARCHAR2(2000)); 86 INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 UTF-16 to and from the national character set happens in OCI CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 92 or CREATE TABLE table2 (column2 NVARCHAR2(2000)); 86 INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 UTF-16 to and from national character set happens in OCI National character set to database character set happens in database server 7.4.2 Using the VARCHAR Data Type in Pro*C/C++The Pro*C/C++ CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 88 data type is preprocessed to a struct with a INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 25 field and CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 02 buffer field. The following example uses the C/C++ CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 02 native data type and the CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 88 Pro*C/C++ data types to bind and define table columns. CREATE TABLE table2 (column2 NVARCHAR2(2000)); 3 When you use the CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 88 data type or native CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 02 data type in a Pro*C/C++ program, the preprocessor assumes that the program intends to access columns of SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 data types instead of SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types in the database. The preprocessor generates C/C++ code to reflect this fact by doing a bind or define using the CREATE TABLE table2 (column2 NVARCHAR2(2000)); 87 value for the CREATE TABLE table2 (column2 NVARCHAR2(2000)); 76 attribute. As a result, if a bind or define variable is bound to a column of SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types in the database, then implicit conversion occurs in the database server to convert the data from the database character set to the national database character set and vice versa. During the conversion, data loss occurs when the database character set is a smaller set than the national character set. 7.4.3 Using the NVARCHAR Data Type in Pro*C/C++The Pro*C/C++ CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 95 data type is similar to the Pro*C/C++ CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 88 data type. It should be used to access SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types in the database. It tells Pro*C/C++ preprocessor to bind or define a text buffer to the column of SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types. The preprocessor specifies the CREATE TABLE table2 (column2 NVARCHAR2(2000)); 77 value for the CREATE TABLE table2 (column2 NVARCHAR2(2000)); 76 attribute of the bind or define variable. As a result, no implicit conversion occurs in the database. If the CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 95 buffer is bound against columns of SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 data types, then the data in the CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 95 buffer (encoded in the CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26 character set) is converted to or from the national character set in OCI, and the data is then converted to the database character set in the database server. Data can be lost when the CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26 character set is a larger set than the database character set. 7.4.4 Using the UVARCHAR Data Type in Pro*C/C++The CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 92 data type is preprocessed to a struct with a INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 25 field and CREATE TABLE table2 (column2 NVARCHAR2(2000)); 86 buffer field. The following example code contains two host variables, INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 50 and INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 51. The INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 50 host variable is declared as a CREATE TABLE table2 (column2 NVARCHAR2(2000)); 86 buffer containing 20 Unicode characters. The INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 51 host variable is declared as a INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 55 buffer containing 50 Unicode characters. The INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 56 and INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 57 fields are accessible as fields of a struct. CREATE TABLE table2 (column2 NVARCHAR2(2000)); 4 When you use the CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 92 data type or native CREATE TABLE table2 (column2 NVARCHAR2(2000)); 86 data type in Pro*C/C++ programs, the preprocessor assumes that the program intends to access SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types. The preprocessor generates C/C++ code by binding or defining using the CREATE TABLE table2 (column2 NVARCHAR2(2000)); 77 value for CREATE TABLE table2 (column2 NVARCHAR2(2000)); 76 attribute. As a result, if a bind or define variable is bound to a column of a SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data type, then an implicit conversion of the data from the national character set occurs in the database server. However, there is no data lost in this scenario because the national character set is always a larger set than the database character set. 7.5 JDBC Programming with Unicode7.5.1 Binding and Defining Java Strings to SQL CHAR Data TypesOracle JDBC drivers allow you to access SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 data types in the database using Java string bind or define variables. The following code illustrates how to bind a Java string to a INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 column. CREATE TABLE table2 (column2 NVARCHAR2(2000)); 5 You can define the target SQL columns by specifying their data types and lengths. When you define a SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 column with the data type and the length, JDBC uses this information to optimize the performance of fetching SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 data from the column. The following is an example of defining a SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 column. CREATE TABLE table2 (column2 NVARCHAR2(2000)); 6 You must cast INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 69 to INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 70 to call INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 71. The second parameter of INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 71 is the data type of the target SQL column. The third parameter is the length in number of characters. 7.5.2 Binding and Defining Java Strings to SQL NCHAR Data TypesFor binding or defining Java string variables to SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types, Oracle provides an extended INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 69 which has the INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 75 method through which you can explicitly specify the target column of a bind variable to be a SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data type. The following code illustrates how to bind a Java string to an INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 column. CREATE TABLE table2 (column2 NVARCHAR2(2000)); 7 You can define the target SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 columns by specifying their data types, forms of use, and lengths. JDBC uses this information to optimize the performance of fetching SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data from these columns. The following is an example of defining a SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 column. CREATE TABLE table2 (column2 NVARCHAR2(2000)); 8 To define a SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 column, you must specify the data type that is equivalent to a SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 column in the first argument, the length in number of characters in the second argument, and the form of use in the fourth argument of INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 71. You can bind or define a Java string against an INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 column without explicitly specifying the form of use argument. This implies the following:
In addition, if you bind or define a Java string for a column of SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 data types but specify the form of use argument, then performance of the database is degraded. However, data should not be lost because the national character set is always a larger set than the database character set. 7.5.2.1 New JDBC4.0 Methods for NCHAR Data TypesJDBC 11.1 adds support for the new JDBC 4.0 (JDK6) SQL data types INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3, CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 95, INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 92, and SELECT INSTR(name, N'Sm', 1, 1) FROM customers; 5. To retrieve a national character value, an application can call one of the following methods:
The INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 95 method verifies that the retrieved value is indeed an SELECT INSTR(name, N'Sm', 1, 1) FROM customers; 5. Otherwise, these methods are equivalent to corresponding methods without the letter maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 75. To specify a value for a parameter marker of national character type, an application can call one of the following methods:
These methods are equivalent to corresponding methods without the letter maximum number of bytes = (maximum number of characters) x (maximum number of bytes for each character) 75 preceded by a call to SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 04. 7.5.3 Using the SQL NCHAR Data Types Without Changing the CodeA Java system property has been introduced in the Oracle JDBC drivers for customers to tell whether the form of use argument should be specified by default in a Java application. This property has the following purposes:
The Java system property is specified in the command line that invokes the Java application. The syntax of specifying this flag is as follows: CREATE TABLE table2 (column2 NVARCHAR2(2000)); 9 With this property specified, the Oracle JDBC drivers assume the presence of the form of use argument for all bind and define operations in the application. If you have a database schema that consists of both the SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 and SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 columns, then using this flag may have some performance impact when accessing the SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 columns because of implicit conversion done in the database server. 7.5.4 Using SQL NCHAR String Literals in JDBCWhen using INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 string literals in JDBC, there is a potential for data loss because characters are converted to the database character set before processing. See "" for more details. The desired behavior for preserving the INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 string literals can be achieved by enabling the property set SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 16. If the value is true, then this option is enabled; otherwise, it is disabled. The default setting is false. It can be enabled in two ways: a) as a Java system property or b) as a connection property. Once enabled, conversion is performed on all SQL in the VM (system property) or in the connection (connection property). For example, the property can be set as a Java system property as follows: CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 0 Alternatively, you can set this as a connection property as follows: CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 1 If you set this as a connection property, it overrides a system property setting. 7.5.5 Data Conversion in JDBC7.5.5.1 Data Conversion for the OCI DriverFor the OCI driver, the SQL statements are always converted to the database character set by the driver before it is sent to the database for processing. When the database character set is neither US7ASCII nor WE8ISO8859P1, the driver converts the SQL statements to UTF-8 first in Java and then to the database character set in C. Otherwise, it converts the SQL statements directly to the database character set. For Java string bind variables, The following table summarizes the conversion paths taken for different scenarios. For Java string define variables, the same conversion paths, but in the opposite direction, are taken. Table 7-5 OCI Driver Conversion Path Form of Use SQL Data Type Conversion Path SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 17 INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 Conversion between the UTF-16 encoding of a Java string and the database character set happens in the JDBC driver. SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 17 INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 Conversion between the UTF-16 encoding of a Java string and the database character set happens in the JDBC driver. Then, conversion between the database character set and the national character set happens in the database server. SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 21 INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 Conversion between the UTF-16 encoding of a Java string and the national character set happens in the JDBC driver. SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 21 INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 Conversion between the UTF-16 encoding of a Java string and the national character set happens in the JDBC driver. Then, conversion between the national character set and the database character set happens in the database server. 7.5.5.2 Data Conversion for Thin DriversSQL statements are always converted to either the database character set or to UTF-8 by the driver before they are sent to the database for processing. The driver converts the SQL statement to the database character set when the database character set is one of the following character sets:
Otherwise, the driver converts the SQL statement to UTF-8 and notifies the database that the statement requires further conversion before being processed. The database, in turn, converts the SQL statement to the database character set. For Java string bind variables, the conversion paths shown in the following table are taken for the thin driver. For Java string define variables, the same conversion paths but in the opposite direction are taken. The four character sets listed earlier are called selected characters sets in the table. Table 7-6 Thin Driver Conversion Path Form of Use SQL Data Type Database Character Set Conversion Path SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 17 INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 One of the selected character sets Conversion between the UTF-16 encoding of a Java string and the database character set happens in the thin driver. SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 17 INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 One of the selected character sets Conversion between the UTF-16 encoding of a Java string and the database character set happens in the thin driver. Then, conversion between the database character set and the national character set happens in the database server. SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 17 INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 Other than the selected character sets Conversion between the UTF-16 encoding of a Java string and UTF-8 happens in the thin driver. Then, conversion between UTF-8 and the database character set happens in the database server. SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 17 INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 Other than the selected character sets Conversion between the UTF-16 encoding of a Java string and UTF-8 happens in the thin driver. Then, conversion from UTF-8 to the database character set and then to the national character set happens in the database server. SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 21 INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 Any Conversion between the UTF-16 encoding of a Java string and the national character set happens in the thin driver. Then, conversion between the national character set and the database character set happens in the database server. SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 21 INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 Any Conversion between the UTF-16 encoding of a Java string and the national character set happens in the thin driver. 7.5.5.3 Data Conversion for the Server-Side Internal DriverAll data conversion occurs in the database server because the server-side internal driver works inside the database. 7.5.6 Using oracle.sql.CHAR in Oracle Object TypesJDBC drivers support Oracle object types. Oracle objects are always sent from database to client as an object represented in the database character set or national character set. That means the data conversion path in "" does not apply to Oracle object access. Instead, the SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 37 class is used for passing SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 4 and SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data of an object type from the database to the client. This section includes the following topics: 7.5.6.1 oracle.sql.CHARThe SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 37 class has a special functionality for conversion of character data. The Oracle character set is a key attribute of the SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 37 class. The Oracle character set is always passed in when an SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 37 object is constructed. Without a known character set, the bytes of data in the SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 37 object are meaningless. The SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 37 class provides the following methods for converting character data to strings:
You may want to construct an SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 37 object yourself (to pass into a prepared statement, for example). When you construct an SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 37 object, you must provide character set information to the SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 37 object by using an instance of the SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 60 class. Each instance of the SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 60 class represents one of the character sets that Oracle supports. Complete the following tasks to construct an SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 37 object:
The server (database) and the client (or application running on the client) can use different character sets. When you use the methods of this class to transfer data between the server and the client, the JDBC drivers must convert the data between the server character set and the client character set. 7.5.6.2 Accessing SQL CHAR and NCHAR Attributes with oracle.sql.CHARThe following is an example of an object type created using SQL: CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 4 The Java class corresponding to this object type can be constructed as follows: CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 5 The SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 37 class is used here to map to the SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 78 attributes of the Oracle object type, which is of SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 9 data type. JDBC populates this class with the byte representation of the SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 9 data in the database and the SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 63 object corresponding to the database character set. The following code retrieves a SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 82 object from the SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 83 table: CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 6 The SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 45 method of the SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 37 class converts the byte array from the database character set or national character set to UTF-16 by calling Oracle's Java data conversion classes and returning a Java string. For the SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 86 call to work, the SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 87 interface has to be implemented in the class SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 82, and the SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 89 SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 90 has to be set up to indicate the mapping of the object type SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 91 to the Java class. 7.5.7 Restrictions on Accessing SQL CHAR Data with JDBC7.5.7.1 Character Integrity Issues in a Multibyte Database EnvironmentOracle JDBC drivers perform character set conversions as appropriate when character data is inserted into or retrieved from the database. The drivers convert Unicode characters used by Java clients to Oracle database character set characters, and vice versa. Character data that makes a round trip from the Java Unicode character set to the database character set and back to Java can suffer some loss of information. This happens when multiple Unicode characters are mapped to a single character in the database character set. An example is the Unicode full-width tilde character (0xFF5E) and its mapping to Oracle's JA16SJIS character set. The round-trip conversion for this Unicode character results in the Unicode character 0x301C, which is a wave dash (a character commonly used in Japan to indicate range), not a tilde. The following figure shows the round-trip conversion of the tilde character. This issue is not a bug in Oracle's JDBC. It is an unfortunate side effect of the ambiguity in character mapping specifications on different operating systems. Fortunately, this problem affects only a small number of characters in a small number of Oracle character sets such as JA16SJIS, JA16EUC, ZHT16BIG5, and KO16KS5601. The workaround is to avoid making a full round-trip with these characters. 7.6 ODBC and OLE DB Programming with Unicode7.6.1 Unicode-Enabled Drivers in ODBC and OLE DBOracle's ODBC driver and Oracle Provider for OLE DB can handle Unicode data properly without data loss. For example, you can run a Unicode ODBC application containing Japanese data on English Windows if you install Japanese fonts and an input method editor for entering Japanese characters. Oracle provides ODBC and OLE DB products for Windows platforms only. For UNIX platforms, contact your vendor. 7.6.2 OCI Dependency in UnicodeOCI Unicode binding and defining features are used by the ODBC and OLE DB drivers to handle Unicode data. OCI Unicode data binding and defining features are independent from CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26. This means Unicode data is handled properly, irrespective of the CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26 setting on the platform. 7.6.3 ODBC and OLE DB Code Conversion in UnicodeIn general, no redundant data conversion occurs unless you specify a different client data type from that of the server. If you bind Unicode buffer SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 94 with a Unicode data column like INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3, for example, then ODBC and OLE DB drivers bypass it between the application and OCI layer. If you do not specify data types before fetching, but call SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 96 with the client data types instead, then the conversions described in the following table occur. Table 7-7 ODBC Implicit Binding Code Conversions Data Types of ODBC Client Buffer Data Types of the Target Column in the Database Fetch Conversions Comments SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 94 CREATE TABLE table2 (column2 NVARCHAR2(2000)); 88 SELECT CONCAT(name,id) FROM customers; 9 If the database character set is a subset of the CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26 character set, then the conversions occur in the following order:
No unexpected data loss May degrade performance if database character set is a subset of the CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26 character set NAME John Smith 03 CREATE TABLE table2 (column2 NVARCHAR2(2000)); 88 SELECT CONCAT(name,id) FROM customers; 9 If database character set is a subset of CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26 character set: Database character set to CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26 in OCI If database character set is NOT a subset of CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26 character set: Database character set, UTF-16, to CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26 character set in OCI and ODBC No unexpected data loss May degrade performance if database character set is not a subset of CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26 character set You must specify the data type for inserting and updating operations. The data type of the ODBC client buffer is given when you call SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 96 but not immediately. Hence, NAME John Smith 12 does not have the information. Because the ODBC driver guarantees data integrity, if you perform implicit bindings, then redundant conversion may result in performance degradation. Your choice is the trade-off between performance with explicit binding or usability with implicit binding. 7.6.3.1 OLE DB Code ConversionsUnlike ODBC, OLE DB only enables you to perform implicit bindings for inserting, updating, and fetching data. The conversion algorithm for determining the intermediate character set is the same as the implicit binding cases of ODBC. Table 7-8 OLE DB Implicit Bindings Data Types of OLE_DB Client Buffer Data Types of the Target Column in the Database In-Binding and Out-Binding Conversions Comments NAME John Smith 13 CREATE TABLE table2 (column2 NVARCHAR2(2000)); 88 SELECT CONCAT(name,id) FROM customers; 9 If database character set is a subset of the CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26 character set: Database character set to and from CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26 character set in OCI. CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26 character set to UTF-16 in OLE DB If database character set is NOT a subset of CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26 character set: Database character set to and from UTF-16 in OCI No unexpected data loss May degrade performance if database character set is a subset of CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26 character set NAME John Smith 21 CREATE TABLE table2 (column2 NVARCHAR2(2000)); 88 SELECT CONCAT(name,id) FROM customers; 9 If database character set is a subset of the CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26 character set: Database character set to and from CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26 in OCI If database character set is not a subset of CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26 character set: Database character set to and from UTF-16 in OCI. UTF-16 to CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26 character set in OLE DB No unexpected data loss May degrade performance if database character set is not a subset of CREATE TABLE table2 (column2 NVARCHAR2(2000)); 26 character set 7.6.4 ODBC Unicode Data TypesIn ODBC Unicode applications, use SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 1 to store Unicode data. All standard Windows Unicode functions can be used for SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 1 data manipulations. For example, NAME John Smith 31 counts the number of characters of SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 1 data: CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 7 Microsoft's ODBC 3.5 specification defines three Unicode data type identifiers for the SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%'; 94, NAME John Smith 34, and NAME John Smith 35 clients; and three Unicode data type identifiers for servers NAME John Smith 36, NAME John Smith 37, and NAME John Smith 35. For binding operations, specify data types for both client and server using NAME John Smith 39. The following is an example of Unicode binding, where the client buffer NAME John Smith 40 indicates that Unicode data ( SELECT name FROM customers WHERE TO_CHAR(name) LIKE '%Sm%';
NAME John Smith 36): CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 8 The following table represents the data type mappings of the ODBC Unicode data types for the server against SQL INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 data types. Table 7-9 Server ODBC Unicode Data Type Mapping ODBC Data Type Oracle Data Type NAME John Smith 36 INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 3 NAME John Smith 37 NAME John Smith 1 NAME John Smith 35 SELECT INSTR(name, N'Sm', 1, 1) FROM customers; 5 According to ODBC specifications, NAME John Smith 36, NAME John Smith 37, and NAME John Smith 35 are treated as Unicode data, and are therefore measured in the number of characters instead of the number of bytes. 7.6.5 OLE DB Unicode Data TypesOLE DB offers the CREATE TABLE table2 (column2 NVARCHAR2(2000)); 71, NAME John Smith 54, and NAME John Smith 55 data types for a Unicode C client. In practice, CREATE TABLE table2 (column2 NVARCHAR2(2000)); 71 is the most common data type and the others are for specific purposes. The following example assigns a static SQL statement: CREATE TABLE customers (id NUMBER, name NVARCHAR2(50), address NVARCHAR2(200), birthdate DATE); 9 The NAME John Smith 55 macro works exactly like an "L" modifier to indicate the Unicode string. If you need to allocate Unicode data buffer dynamically using NAME John Smith 55, then use the NAME John Smith 59 allocator (for example, NAME John Smith 60). However, using NAME John Smith 55 is not the normal method for variable length data; use CREATE TABLE table2 (column2 NVARCHAR2(2000)); 71* instead for generic string types. NAME John Smith 54 is similar. It is a string with a length prefix in the memory location preceding the string. Some functions and methods can accept only NAME John Smith 54 Unicode data types. Therefore, NAME John Smith 54 Unicode string must be manipulated with special functions like NAME John Smith 66 for allocation and NAME John Smith 67 for freeing memory. Unlike ODBC, OLE DB does not allow you to specify the server data type explicitly. When you set the client data type, the OLE DB driver automatically performs data conversion if necessary. The following table shows the OLE DB data type mapping. Table 7-10 OLE DB Data Type Mapping OLE DB Data Type Oracle Data Type NAME John Smith 13 NAME John Smith 69 If NAME John Smith 70 is specified, then it is assumed to be NAME John Smith 13 because both are Unicode strings. 7.6.6 ADO AccessADO is a high-level API to access database with the OLE DB and ODBC drivers. Most database application developers use the ADO interface on Windows because it is easily accessible from Visual Basic, the primary scripting language for Active Server Pages (ASP) for the Internet Information Server (IIS). To OLE DB and ODBC drivers, ADO is simply an OLE DB consumer or ODBC application. ADO assumes that OLE DB and ODBC drivers are Unicode-aware components; hence, it always attempts to manipulate Unicode data. 7.7 XML Programming with UnicodeXML support of Unicode is essential for software development for global markets so that text information can be exchanged in any language. Unicode uniformly supports almost every character and language, which makes it much easier to support multiple languages within XML. To enable Unicode for XML within an Oracle database, the character set of the database must be UTF-8. By enabling Unicode text handling in your application, you acquire a basis for supporting any language. Every XML document is Unicode text and potentially multilingual, unless it is guaranteed that only a known subset of Unicode characters will appear on your documents. Thus Oracle recommends that you enable Unicode for XML. Unicode support comes with Java and many other modern programming environments. This section includes the following topics: 7.7.1 Writing an XML File in Unicode with JavaA common mistake in reading and writing XML files is using the NAME John Smith 72 and NAME John Smith 73 classes for character input and output. Using NAME John Smith 72 and NAME John Smith 73 for XML files should be avoided because it requires character set conversion based on the default character encoding of the run-time environment. For example, using NAME John Smith 76 class is not safe because it converts the document to the default character encoding. The output file can suffer from a parsing error or data loss if the document contains characters that are not available in the default character encoding. UTF-8 is popular for XML documents, but UTF-8 is not usually the default file encoding for Java. Thus using a Java class that assumes the default file encoding can cause problems. The following example shows how to avoid these problems: INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 0 7.7.2 Reading an XML File in Unicode with JavaDo not read XML files as text input. When reading an XML document stored in a file system, use the parser to automatically detect the character encoding of the document. Avoid using a NAME John Smith 72 class or specifying a character encoding on the input stream. Given a binary input stream with no external encoding information, the parser automatically figures out the character encoding based on the byte order mark and encoding declaration of the XML document. Any well-formed document in any supported encoding can be successfully parsed using the following sample code: INSERT INTO customers VALUES (1000, TO_NCHAR('John Smith'),N'500 Oracle Parkway',sysdate); 1 7.7.3 Parsing an XML Stream in Unicode with JavaWhen the source of an XML document is not a file system, the encoding information is usually available before reading the document. For example, if the input document is provided in the form of a Java character stream or Reader, its encoding is evident and no detection should take place. The parser can begin parsing a Reader in Unicode without regard to the character encoding. |