-
- Downloads
Extend GB18030 encoding conversion to cover full Unicode range.
Our previous code for GB18030 <-> UTF8 conversion only covered Unicode code points up to U+FFFF, but the actual spec defines conversions for all code points up to U+10FFFF. That would be rather impractical as a lookup table, but fortunately there is a simple algorithmic conversion between the additional code points and the equivalent GB18030 byte patterns. Make use of the just-added callback facility in LocalToUtf/UtfToLocal to perform the additional conversions. Having created the infrastructure to do that, we can use the same code to map certain linearly-related subranges of the Unicode space below U+FFFF, allowing removal of the corresponding lookup table entries. This more than halves the lookup table size, which is a substantial savings; utf8_and_gb18030.so drops from nearly a megabyte to about half that. In support of doing that, replace ISO10646-GB18030.TXT with the data file gb-18030-2000.xml (retrieved from http://source.icu-project.org/repos/icu/data/trunk/charset/data/xml/ ) in which these subranges have been deleted from the simple lookup entries. Per bug #12845 from Arjen Nienhuis. The conversion code added here is based on his proposed patch, though I whacked it around rather heavily.
Showing
- src/backend/utils/mb/Unicode/ISO10646-GB18030.TXT 0 additions, 63488 deletionssrc/backend/utils/mb/Unicode/ISO10646-GB18030.TXT
- src/backend/utils/mb/Unicode/Makefile 4 additions, 4 deletionssrc/backend/utils/mb/Unicode/Makefile
- src/backend/utils/mb/Unicode/UCS_to_GB18030.pl 30 additions, 51 deletionssrc/backend/utils/mb/Unicode/UCS_to_GB18030.pl
- src/backend/utils/mb/Unicode/gb-18030-2000.xml 30916 additions, 0 deletionssrc/backend/utils/mb/Unicode/gb-18030-2000.xml
- src/backend/utils/mb/Unicode/gb18030_to_utf8.map 2 additions, 32631 deletionssrc/backend/utils/mb/Unicode/gb18030_to_utf8.map
- src/backend/utils/mb/Unicode/utf8_to_gb18030.map 2 additions, 32629 deletionssrc/backend/utils/mb/Unicode/utf8_to_gb18030.map
- src/backend/utils/mb/conversion_procs/utf8_and_gb18030/utf8_and_gb18030.c 157 additions, 2 deletions...s/mb/conversion_procs/utf8_and_gb18030/utf8_and_gb18030.c
Loading
Please register or sign in to comment