Oracle database character set conversion law of a comprehensive analysis of
Oracle database character set conversion law of a comprehensive analysis ofrn
rn
rn
A good article in this chapter, specific and comprehensive.
rn
rn
rn
As an Oracle database users, for the Export and Import two commands will not feel strange, because this is exactly what we both frequently used for data backup and recovery tool. However, in using these two commands have taken place during the Oracle character set issues, often to a number of Oracle users to unnecessary trouble and unnecessary data loss. This will Export and Import process Oracle character set conversion laws and the use of these two commands will make a summary note. rn
The reasons for character set conversion
rn
Export, Import process, as shown above can be seen from this diagram there are four related to the character set, which is four character sets is precisely the cause of the inconsistency Oracle character set conversion for reasons. rn
* Source database character set; rn
* Export the course of a user session character set; rn
* Import the course of a user session character set; rn
* Target database character set. rn
In the Export and Import process, if there is character set conversion affect the four factors that are inconsistent, then the Oracle character set conversion can occur, namely: rn
In the Export process, if the source database character sets and Export user session character set inconsistent, character set conversion will occur, and in the exported binary format Dmp files stored in the head a few bytes of Export user session character set ID. In this conversion process may occur in data loss. rn
Example 1: If the source database to use ZHS16GBK, while the Export user session character set to use US7ASCII, because ZHS16GBK are 8-bit character set, while the US7ASCII is 7-bit character set, this conversion process, the Chinese characters in the US7ASCII not be able to find, etc. characters, so all Chinese characters will be lost and turned into "??" form, that is generated after such a conversion Dmp file data loss has occurred. rn
Example 2: If the source database to use ZHS16GBK, while the Export user session character set to use ZHS16CGB231280, but ZHS16GBK character set is a superset of character set ZHS16CGB231280, this process can correct the majority of character transformation, only a few character sets beyond ZHS16CGB231280 characters become "??" form. If the source database using ZHS16CGB231280 character set, while the Export the user's session using ZHS16GBK character set, then the conversion process can be completely successful conversion. rn
In the Import to the target database during the conversion process, its character set conversion is exactly the opposite happened during Export contrary, there shall not repeat. rn
Dmp exported in the Export file, containing the Export user session character set. In the Import process, the first place is the Dmp document character set (ie, Export user session character set) to the Import user session character set conversion. If this conversion process can not be completed properly, Import to the target database, the import process also can not be completed. rn
Carried out the correct character set conversion rn
Under normal circumstances, we use Oracle's Export and Import process, do not want to place characters in the conversion, but sometimes such a conversion is necessary. If we install the Oracle database, select ZHS16CGB231280 character set, because of this character set is a small Chinese character set, for some characters can not be correctly said that this needs to be resolved using the ZHS16GBK character set, this time will be carried out character set conversion. rn
In order to ensure that the Export, Import process, Oracle character set conversion or incorrect conversion does not occur, we recommend the best course of this process before, check the source database character sets and Export user session character set is consistent, the source database and target database character set characters sets are the same target database character with the Import user session character sets are the same. If you can guarantee that this four character sets are identical, then Export, Import process, Oracle character set conversion would not have happened. rn
Using the following way to check the database character set: rn
By InitXXXX.ora file view; rn
With SQL statements see: SELECT NAME, VALUE $ FROM SYS.PROPS $ WHERE NAME = 'NLS_CHARACTERSET'. rn
For Export, Import user session character set, in the Windows system registry can also be in the NLS_LANG to view or modify, for Unix System provides the user's environment variable by setting the NLS_LANG to view or modify. rn
In particular, note that, Oracle database character set is usually determined at the time created, once the stored user data should not be modified after, because its data is stored using the character set, and change the other character sets, the original data not be able to correctly expressed. But if it does want to change the character set, you can be achieved through the following steps: rn
Backup the database to delete the original data (can be physical backup, such as the use of Export, please note that to ensure the character set conversion or data does not occur without loss); rn
Use Internal user updates sys.props $ table character set: rn
rn
|
Restart the database; rn
Restore the data. rn
The following conversion between character sets is feasible: rn
Character set collections sets the parent to the character set conversion is feasible, such as ZHS16CGB231280 to ZHS16GBK conversion; and character set the parent class to the character set conversion collected works, it will lose some of the data. rn
Contains only English characters of the double-byte character set data can also be converted to single-byte character sets, such as ZHS16GBK (English Only) can US7ASCII converted properly. rn
The scope of the same single-byte coded character set conversion between each other can often be. rn
Please note that the data loss is not mentioned here, refers to a character set A character set into another, after B, B can be further from the character set into the correct character set A or B character set A character set that can correctly converted over data.
rn
Character set of the program rn
According to how many bytes a character need to be expressed in the character set can be divided into single-byte character sets and multi-byte character sets. Among them, single-byte character set is divided into 7-bit character set and the 8-bit character sets. Single-byte 7-bit coded character set are US7ASC Ⅱ, single-byte 8-bit coded character set are in line with ISO 8859-1 standard specifies the WE8ISO8859P1 so. Multi-byte code is divided into fixed-length (length greater than or equal to 2) encoding mode and non-fixed length encoding mode. Multi-byte coded character set of the ZHS16GBK, ZHS16CGB231280, JA16SJIS such as is used to represent a character two-byte character set, also known as double-byte character sets. rn
An alphabet is a character, a Chinese characters are a few characters do? We know that a Chinese character is a double-byte characters, but it has several characters related to its database character set. If the database character set to use single-byte US7ASCII, then a Chinese characters are the two characters; If the database character set to use double-byte character set ZHS16GBK, is a Chinese character is a character. On this point you can use the Oracle function Substr been proven. rn
When using the US7ASC Ⅱ character sets: rn
Select substr ( 'Northeastern University', 1,2) from dual; rn
Statement to the implementation of the results returned 'the East'. rn
ZHS16GBK character set to use when: rn
Select substr ( 'Northeastern University', 1,2) from dual; rn
Statement to the implementation of the results returned 'the Northeast'. rn
Select the appropriate database character set rn
Select the database character set should consider the following items: rn
1. Database need to support what language rn
Choice for the database character set, often they will find several character sets are suitable for your current language requirements, such as Simplified Chinese, there ZHS16GBK and ZHSCGB231280 other character sets to choose from, which should be selected? In selecting character set should be considered when the future system requirements to the database. If you know the future, the database should be extended to support different languages, choose a wider range of characters that assembly is a better idea. rn
2. System resources and application of the interaction between the nature of rn
Select the database character set should ensure that the operating system and applications seamless connection between. If you select the operating system character set is not a valid character set, the system will need to do between these two character conversion. In such a character during the conversion process, there may occur the phenomenon of a number of characters is lost. From one to another character set A character set B during the conversion process, A the characters must be equivalent in B can be found in the character, otherwise the "?" Instead. In this sense, if the scope of two kinds of character set encoding is the same, you can convert each other. rn
Character set conversion process can affect system performance, therefore, should ensure that the client and server-side have the same character set in order to avoid character set conversion, but also can improve a certain degree of system performance. rn
3. The system performance requirements rn
Different database character set for the performance of the database have a certain effect. To get the best database performance, choose the database character set should avoid character conversion, and to select the desired language for the most efficient coding efficiency. Typically, single-byte character set than the multi-byte character sets have better performance in smaller space requirements are also more. rn
4. Other limitations rn
In a suitable choice for the database character set should refer to the corresponding versions of Oracle documentation, check the Oracle character set for a number of limitations. Such as the Oracle 8.1.5 version, the following character sets can not be used: JA16EUCFIXED, ZHS16GBKFIXED, JA16DBCSFIXED, KO16DBCSFIXED, ZHS16DBCSFIXED, JA16SJISFIXED, ZHT32TRISFIXED. rn
In summary, the correct understanding of Oracle character set conversion process will enable us to avoid unnecessary trouble and data loss. The rational use of Oracle character set conversion process can also help us to correctly convert from one character set to another character set to meet our demand for a variety of applications.
Tags: sys.props$ on database .net applications, oracle database character bits, Oracle database chinese oracle, export import, data backup, target database, character sets, source database, character set issues, chinese characters, oracle users, inconsistency, export user, binary format, oracle database users, conversion law, unnecessary data, set id, recovery tool
















