[ Previous |
Next |
Contents |
Glossary |
Home |
Search ]
AIX Version 4.3 General Programming Concepts: Writing and Debugging Programs
Converters Overview for Programming
National Language Support (NLS) provides a base for internationalization in which data often can be changed from one code set to another. Support of several standard converters for this purpose is provided. This section discusses the following aspects of conversion:
Converters Introduction
Data sent by one program to another program residing on a remote host may require conversion from the code set of the source machine to that of the receiver. For example, when communicating with a VM system, the workstation converts its ISO8859-1 data to an EBCDIC form.
Code sets define graphic characters and control character assignments to code points. These coded characters must also be converted when a program obtains data in one code set but displays it in another code set.
Two interfaces for conversions are provided:
iconv command |
Allows you to request a specific conversion by naming the FromCode and ToCode code sets. |
libiconv functions |
Allow applications to request converters by name. |
The system provides ready-to-use libraries of converters. You supply the name of the converter you want to use. The converter libraries are found in the /usr/lib/nls/loc/iconv/* and /usr/lib/nls/loc/iconvTable/* directories.
In addition to code set converters, the converter library also provides a set of network interchange converters. In a network environment, the code sets of the communications systems and the protocols of communication determine how the data should be converted.
Interchange converters are used to convert data sent from one system to another. Conversions from one internal code set to another require code set converters. When data must be converted from a sender's code set to a receiver's code set or from 8-bit data to 7-bit data, a uniform interface is required. The iconv subroutines provide this interface.
Standard Converters
The system supports standard converters for use with the iconv command and subroutines. The following list describes the different types of converters:
Code Set Converter Types |
Description |
Table converter |
Converts single-byte stateless code sets. Performs a table translation from one byte to another byte. |
Algorithmic converter |
Performs a conversion that cannot be implemented using a simple single-byte mapping table. All multibyte converters are currently implemented in this way. |
Understanding libiconv
The iconv application programming interface (API) consists of three subroutines that accomplish conversion:
iconv_open |
Performs the initialization required to convert characters from the code set specified by the FromCode parameter to the code set specified by the ToCode parameter. The strings specified are dependent on the converters installed in the system. If initialization is successful, the converter descriptor, iconv_t, is returned in its initial state. |
iconv |
Invokes the converter function using the descriptor obtained from the iconv_open subroutine. The inbuf parameter points to the first character in the input buffer, and the inbytesleft parameter indicates the number of bytes to the end of the buffer being converted. The outbuf parameter points to the first available byte in the output buffer, and the outbytesleft parameter indicates the number of available bytes to the end of the buffer.
For state-dependent encoding, the subroutine is placed in its initial state by a call for which the inbuf value is a null pointer. Subsequent calls with the inbuf parameter as something other than a null pointer cause the internal state of the function to be altered as necessary. |
iconv_close |
Closes the conversion descriptor specified by the cd variable and makes it usable again. |
In a network environment, two factors determine how data should be converted:
- Code sets of the sender and the receiver
- Communication protocol (8-bit or 7-bit data)
The following table outlines the conversion methods and recommends how you should convert data in different situations. See the "List of Interchange Converters--7-bit" and the "List of Interchange Converters--8-bit" for more information.
Outline of Methods and Recommended Choices |
|
Communication with system using the same code set |
Communication with system using different code set or receiver's code set is unknown |
|
Protocol |
Protocol |
Method to choose |
7-bit only |
8-bit |
7-bit only |
8-bit |
as is |
Not valid |
Best choice |
Not valid |
Not valid if remote code set is unknown |
fold7 |
OK |
OK |
Best choice |
OK |
fold8 |
Not valid |
OK |
Not valid |
Best choice |
uucode |
Best choice |
OK |
Not valid |
Not valid |
If the sender uses the same code set as the receiver, there are two possibilities:
- When protocol allows 8-bit data, the data can be sent without conversions.
- When protocol allows only 7-bit data, the 8-bit code points must be mapped to 7-bit values. Use the iconv interface and one of the following methods:
uucode method |
Provides the same mapping as the uuencode and uudecode commands. This is the recommended method. |
fold7 method |
Converts internal code sets using 7-bit data. This method passes ASCII without any change. |
If the sender uses a code set different from the receiver, there are two possibilities:
- When protocol allows only 7-bit data, use the fold7 method.
- When protocol allows 8-bit data and you know the receiver's code set, use the iconv interface to convert the data. If you do not know the receiver's code set, use the following method:
fold8 method |
Converts internal code sets to standard interchange formats. The 8-bit data is transmitted and the information is preserved so that the receiver can reconstruct the data in its code set. |
Using the iconv_open Subroutine
The following examples illustrate how to use the iconv_open subroutine in different situations:
- Sender and receiver use the same code sets:
If the protocol allows 8-bit data, you can send data without converting it.
If the protocol allows only 7-bit data, do the following:
Sender:
cd = iconv_open("uucode", nl_langinfo(CODESET));
Receiver:
cd = iconv_open(nl_langinfo(CODESET), "uucode");
- Sender and receiver use different code sets:
If the protocol allows 8-bit data and the receiver's code set is unknown, do the following:
Sender:
cd = iconv_open("fold8", nl_langinfo(CODESET));
Receiver:
cd = iconv_open(nl_langinfo(CODESET),"fold8" );
If the protocol allows only 7-bit data, do the following:
Sender:
cd = iconv_open("fold7", nl_langinfo(CODESET));
Receiver:
cd = iconv_open(nl_langinfo(CODESET), "fold7" );
How the iconv_open Subroutine Finds Converters
The iconv_open subroutine uses the LOCPATH environment variable to search for a converter whose name is in the form:
iconv/FromCodeSet_ToCodeSet
The FromCodeSet string represents the sender's code set, and the ToCodeSet string represents the receiver's code set. The underscore character separates the two strings.
Note: All setuid and setgid programs will ignore the LOCPATH environment variable.
Since the iconv converter is aloadable object module, a different object is required when running in the 64-bit environment. In the 64-bit environment, the iconv_open routine will use the LOCPATH environment variable to search for a converter whose name is in the form:
iconv/FromCodeSet_ToCodeSet__64.
The iconv library will automatically choose whether to load the standard
converter object or the 64-bit converter object.
If the iconv_open subroutine does not find the converter, it uses the from,to pair to search for a file that defines a table-driven conversion. The file contains a conversion table created by the genxlt command.
The iconvTable converter uses the LOCPATH environment variable to search for a file whose name is in the form:
iconvTable/FromCodeSet_ToCodeSet
If the converter is found, it performs a load operation and is initialized. The converter descriptor, iconv_t, is returned in its initial state.
Converter Programs versus Tables
Converter programs are executable functions that convert data according to a set of rules. Converter tables are single-byte conversion tables that perform stateless conversions. Programs and tables are in separate directories:
/usr/lib/nls/loc/iconv |
Converter programs |
/usr/lib/nls/loc/iconvTable |
Converter tables. |
After a converter program is compiled and linked with the libiconv.a library, the program is placed in the /usr/lib/nls/loc/iconv directory.
To build a table converter, build a source converter table file. Use the genxlt command to compile translation tables into a format understood by the table converter. The output file is then placed in the /usr/lib/nls/loc/iconvTable directory.
Unicode and Universal Converters
Unicode (or UCS-2) conversion tables are found in:
$LOCPATH/uconvTable/*CodeSet*
The $LOCPATH/uconv/UCSTBL converter program is used to perform the conversion to and from UCS-2 using the iconv utilities. For the iconv utilities to use these uconvTable conversion tables, links must be set up within the $LOCPATH/iconv directory, for example, for code set "X."
ln -s /usr/lib/nls/loc/uconv/UCSTBL /usr/lib/nls/loc/iconv/X_UCS-2
ln -s /usr/lib/nls/loc/uconv/UCSTBL /usr/lib/nls/loc/iconv/UCS-2_X
A "Universal converter" program is provided that can be used to convert between any two code sets whose conversions to and from UCS-2 is defined. Given the following uconvTables:
X -> UCS-2
UCS-2 -> Y
a universal conversion can be defined that maps
X -> UCS-2 -> Y
by use of the $LOCPATH/iconv/Universal_UCS_Conv. The conversion X->Y is set by defining links to the universal converter, for example:
ln-s /usr/lib/nls/loc/iconv/Universal_UCS_Conv /usr/lib/nls/loc/iconv/X_Y
Using Converters
The iconv interface is a set of subroutines used to open, perform, and close conversions:
Code Set Conversion Filter Example
The following example shows how you can use these subroutines to create a code set conversion filter that accepts the ToCode and FromCode parameters as input arguments:
#include <stdio.h>
#include <nl_types.h>
#include <iconv.h>
#include <string.h>
#include <errno.h>
#include <locale.h>
#define ICONV_DONE() (r>=0)
#define ICONV_INVAL() (r<0) && (errno==EILSEQ))
#define ICONV_OVER() (r<0) && (errno==E2BIG))
#define ICONV_TRUNC() (r<0) && (errno==EINVAL))
#define USAGE 1
#define ERROR 2
#define INCOMP 3
char ibuf[BUFSIZ], obuf[BUFSIZ];
extern int errno;
main (argc,argv)
int argc;
char **argv;
{
size_t ileft,oleft;
nl_catd catd;
iconv_t cd;
int r;
char *ip,*op;
setlocale(LC_ALL,"");
catd = catopen (argv[0],0);
if(argc!=3){
fprintf(stderr,
catgets (catd,NL_SETD,USAGE,"usage;conv fromcode tocode\n"));
exit(1);
}
cd=iconv_open(argv[2],argv[1]);
ileft=0;
while(!feof(stdin)) {
/*
* After the next operation,ibuf will
* contain new data plus any truncated
* data left from the previous read.
*/
ileft+=fread(ibuf+ileft,1,BUFSIZ-ileft,stdin);
do {
ip=ibuf;
op=obuf;
oleft=BUFSIZ;
r=iconv(cd,&ip,&ileft,&op,&oleft);
if(ICONV_INVAL()){
fprintf(stderr,
catgets(catd,NL_SETD,ERROR,"invalid input\n"));
exit(2);
}
fwrite(obuf,1,BUFSIZ-oleft,stdout);
if(ICONV_TRUNC() || ICONV_OVER())
/*
*Data remaining in buffer-copy
*it to the beginning
*/
memcpy(ibuf,ip,ileft);
/*
*loop until all characters in the input
*buffer have been converted.
*/
} while(ICONV_OVER());
}
if(ileft!=0){
/*
*This can only happen if the last call
*to iconv() returned ICONV_TRUNC, meaning
*the last data in the input stream was
*incomplete.
*/
fprintf(stderr,catgets(catd,NL_SETD,INCOMP,"input incomplete\n"));
exit(3);
}
iconv_close(cd);
exit(0);
}
Naming Converters
Code set names are in the form CodesetRegistry-CodesetEncoding where:
CodesetRegistry |
Identifies the registration authority for the encoding. The CodesetRegistry must be made of characters from the portable code set (usually A-Z and 0-9). |
CodesetEncoding |
Identifies the coded character set defined by the registered authority. |
The from,to variable used by the iconv command and iconv_open subroutine identifies a file whose name should be in the form /usr/lib/nls/loc/iconv/%f_%t or /usr/lib/nls/loc/iconvTable/%f_%t, where:
%f |
Represents the FromCode set name. |
%t |
Represents the ToCode set name. |
List of Converters
Converters change data from one code set to another. The sets of converters supported with the ICONV library are in the following sections. All converters shipped with the BOS Runtime Environment are located in the /usr/lib/nls/loc/iconv/* or /usr/lib/nls/loc/iconvTable/* directory.
These directories also contain private converters; that is, they are used by other converters. However, users and programs should only depend on the converters in the following lists.
Any converter shipped with the BOS Runtime Environment and not listed here should be considered private and subject to change or deletion. Converters supplied by other products can be placed in the /usr/lib/nls/loc/iconv/* or /usr/lib/nls/loc/iconvTable/* directory.
Programmers are encouraged to use registered code set names or code set names associated with an application. The X Consortium maintains a registry of code set names for reference. See the "Code Set Overview" for more information about code sets.
List of PC, ISO, and EBCDIC Code Set Converters
These converters provide conversion between PC, ISO, and EBCDIC single-byte stateless code sets. The following types of conversions are supported: PC to/from ISO, PC to/from EBCDIC, and ISO to/from EBCDIC.
Conversion is provided between compatible code sets such as Latin-1 to Latin-1 and Greek to Greek. However, conversion between different EBCDIC national code sets is not supported. For information about converting between incompatible character sets refer to the "List of Interchange Converters--7-bit" and the "List of Interchange Converters--8-bit".
Conversion tables in the iconvTable directory are created by the genxlt command.
Compatible Code Set Names
The following table lists code set names that are compatible. Each line defines to/from strings that may be used when requesting a converter.
Note: The PC and ISO code sets are ASCII-based.
Code Set Compatibility |
Character Set |
Languages |
PC |
ISO |
EBCDIC |
Latin-1 |
U.S. English, Portuguese, Canadian French |
IBM-850 |
ISO8859-1 |
IBM-037 |
Latin-1 |
Danish, Norwegian |
IBM-850 |
ISO8859-1 |
IBM-277 |
Latin-1 |
Finnish, Swedish |
IBM-850 |
ISO8859-1 |
IBM-278 |
Latin-1 |
Italian |
IBM-850 |
ISO8859-1 |
IBM-280 |
Latin-1 |
Japanese |
IBM-850 |
ISO8859-1 |
IBM-281 |
Latin-1 |
Spanish |
IBM-850 |
ISO8859-1 |
IBM-284 |
Latin-1 |
U.K. English |
IBM-850 |
ISO8859-1 |
IBM-285 |
Latin-1 |
German |
IBM-850 |
ISO8859-1 |
IBM-273 |
Latin-1 |
French |
IBM-850 |
ISO8859-1 |
IBM-297 |
Latin-1 |
Belgian, Swiss German |
IBM-850 |
ISO8859-1 |
IBM-500 |
Latin-2 |
Croatian, Czechoslovakian, Hungarian, Polish, Romanian, Serbian Latin, Slovak, Slovene |
IBM-852 |
ISO88859-2 |
IBM-870 |
Cyrillic |
Bulgarian, Macedonian, Serbian Cyrillic, Russian |
IBM-855 |
ISO8859-5 |
IBM-880 IBM-1025 |
Cyrillic |
Russian |
IBM-866 |
ISO8859-5 |
IBM-1025 |
Hebrew |
Hebrew |
IBM-856 IBM-862 |
ISO8859-8 |
IBM-424 IBM-803 |
Turkish |
Turkish |
IBM-857 |
ISO8859-9 |
IBM-1026 |
Arabic |
Arabic |
IBM-864 IBM-1046 |
ISO8859-6 |
IBM-420 |
Greek |
Greek |
IBM-869 |
ISO8859-7 |
IBM-875 |
Greek |
Greek |
IBM-869 |
ISO8859-7 |
IBM-875 |
Baltic |
Lithuanian, Latvian, Estonian |
IBM-921 IBM-922 |
|
IBM-1112 IBM-1122 |
Note: A character that exists in the source code set but does not exist in the target code set is converted to a converter-defined substitute character.
Files
The following table describes the inconvTable converters found in the /usr/lib/nls/loc/iconvTable directory:
iconvTable Converters |
|
Converter Table |
Description |
Language |
IBM-037_IBM-850 |
IBM-037 to IBM-850 |
U.S. English, Portuguese, Canadian-French |
IBM-273_IBM-850 |
IBM-273 to IBM-850 |
German |
IBM-277_IBM-850 |
IBM-277 to IBM-850 |
Danish, Norwegian |
IBM-278_IBM-850 |
IBM-278 to IBM-850 |
Finnish, Swedish |
IBM-280_IBM-850 |
IBM-280 to IBM-850 |
Italian |
IBM-281_IBM-850 |
IBM-281 to IBM-850 |
Japanese-Latin |
IBM-284_IBM-850 |
IBM-284 to IBM-850 |
Spanish |
IBM-285_IBM-850 |
IBM-285 to IBM-850 |
U.K. English |
IBM-297_IBM-850 |
IBM-297 to IBM-850 |
French |
IBM-420_IBM_1046 |
IBM-420 to IBM-1046 |
Arabic |
IBM-424_IBM-856 |
IBM-424 to IBM-856 |
Hebrew |
IBM-424_IBM-862 |
IBM-424 to IBM-862 |
Hebrew |
IBM-500_IBM-850 |
IBM-500 to IBM-850 |
Belgian, Swiss German |
IBM-803_IBM-856 |
IBM-803 to IBM-856 |
Hebrew |
IBM-803_IBM-862 |
IBM-803 to IBM-862 |
Hebrew |
IBM-850_IBM-037 |
IBM-850 to IBM-037 |
U.S. English, Portuguese, Canadian-French |
IBM-850_IBM-273 |
IBM-850 to IBM-273 |
German |
IBM-850_IBM-277 |
IBM-850 to IBM-277 |
Danish, Norwegian |
IBM-850_IBM-278 |
IBM-850 to IBM-278 |
Finnish, Swedish |
IBM-850_IBM-280 |
IBM-850 to IBM-280 |
Italian |
IBM-850_IBM-281 |
IBM-850 to IBM-281 |
Japanese-Latin |
IBM-850_IBM-284 |
IBM-850 to IBM-284 |
Spanish |
IBM-850_IBM-285 |
IBM-850 to IBM-285 |
U.K. English |
IBM-850_IBM-297 |
IBM-850 to IBM-297 |
French |
IBM-850_IBM-500 |
IBM-850 to IBM-500 |
Belgian, Swiss German |
IBM-856_IBM-424 |
IBM-856 to IBM-424 |
Hebrew |
IBM-856_IBM-803 |
IBM-856 to IBM-803 |
Hebrew |
IBM-856_IBM-862 |
IBM-856 to IBM-862 |
Hebrew |
IBM-862_IBM-424 |
IBM-862 to IBM-424 |
Hebrew |
IBM-862_IBM-803 |
IBM-862 to IBM-803 |
Hebrew |
IBM-862_IBM-856 |
IBM-862 to IBM-856 |
Hebrew |
IBM-864_IBM-1046 |
IBM-864 to IBM-1046 |
Arabic |
IBM-921_IBM-1112 |
IBM-921 to IBM-1112 |
Lithuanian, Latvian |
IBM-922_IBM-1122 |
IBM-922 to IBM-1122 |
Estonian |
IBM-1112_IBM-921 |
IBM-1121 to IBM-921 |
Lithuanian, Latvian |
IBM-1122_IBM-922 |
IBM-1122 to IBM-922 |
Estonian |
IBM-1046_IBM-420 |
IBM-1046 to IBM-420 |
Arabic |
IBM-1046_IBM-864 |
IBM-1046 to IBM-864 |
Arabic |
IBM-037_ISO8859-1 |
IBM-037 to ISO8859-1 |
U.S. English, Portuguese, Canadian French |
IBM-273_ISO8859-1 |
IBM-273 to ISO8859-1 |
German |
IBM-277_ISO8859-1 |
IBM-277 to ISO8859-1 |
Danish, Norwegian |
IBM-278_ISO8859-1 |
IBM-278 to ISO8859-1 |
Finnish, Swedish |
IBM-280_ISO8859-1 |
IBM-280 to ISO8859-1 |
Italian |
IBM-281_ISO8859-1 |
IBM-281 to ISO8859-1 |
Japanese-Latin |
IBM-284_ISO8859-1 |
IBM-284 to ISO8859-1 |
Spanish |
IBM-285_ISO8859-1 |
IBM-285 to ISO8859-1 |
U.K. English |
IBM-297_ISO8859-1 |
IBM-297 to ISO8859-1 |
French |
IBM-420_ISO8859-6 |
IBM-420 to ISO8859-6 |
Arabic |
IBM-424_ISO8859-8 |
IBM-424 to ISO8859-8 |
Hebrew |
IBM-500_ISO8859-1 |
IBM-500 to ISO8859-1 |
Belgian, Swiss German |
IBM-803_ISO8859-8 |
IBM-803 to ISO8859-8 |
Hebrew |
IBM-852_ISO8859-2 |
IBM-852 to ISO8859-2 |
Croatian, Czechoslovakian, Hungarian, Polish, Romanian, Serbian Latin, Slovak, Slovene |
IBM-855_ISO8859-5 |
IBM-855 to ISO8859-5 |
Bulgarian, Macedonian, Serbian Cyrillic, Russian |
IBM-866_ISO8859-5 |
IBM-866 to ISO8859-5 |
Russian |
IBM-869_ISO8859-7 |
IBM-869 to ISO8859-7 |
Greek |
IBM-875_ISO8859-7 |
IBM-875 to ISO8859-7 |
Greek |
IBM-870_ISO8859-2 |
IBM-870 to ISO8859-2 |
Croatian, Czechoslovakian, Hungarian, Polish, Romanian, Serbian, Slovak, Slovene |
IBM-880_ISO8859-5 |
IBM-880 to ISO8859-5 |
Bulgarian, Macedonian, Serbian Cyrillic, Russian |
IBM-1025_ISO8859-5 |
IBM-1025 to ISO8859-5 |
Bulgarian, Macedonian, Serbian Cyrillic, Russian |
IBM-857_ISO8859-9 |
IBM-857 to ISO8859-9 |
Turkish |
IBM-1026_ISO8859-9 |
IBM-1026 to ISO8859-9 |
Turkish |
IBM-850_ISO8859-1 |
IBM-850 to ISO8859-1 |
Latin |
IBM-856_ISO8859-8 |
IBM-856 to ISO8859-8 |
Hebrew |
IBM-862_ISO8859-8 |
IBM-862 to ISO8859-8 |
Hebrew |
IBM-864_ISO8859-6 |
IBM-864 to ISO8859-6 |
Arabic |
IBM-1046_ISO8859-6 |
IBM-1046 to ISO8859-6 |
Arabic |
ISO8859-1_IBM-850 |
ISO8859-1 to IBM-850 |
Latin |
ISO8859-6_IBM-864 |
ISO8859-6 to IBM-864 |
Arabic |
ISO8859-6_IBM-1046 |
ISO8859-6 to IBM-1046 |
Arabic |
ISO8859-8_IBM-856 |
ISO8859-8 to IBM-856 |
Hebrew |
ISO8859-8_IBM-862 |
ISO8859-8 to IBM-862 |
Hebrew |
ISO8859-1_IBM-037 |
ISO8859-1 to IBM-037 |
U.S. English, Portuguese, Canadian French |
ISO8859-1_IBM-273 |
ISO8859-1 to IBM-273 |
German |
ISO8859-1_IBM-277 |
ISO8859-1 to IBM-277 |
Danish, Norwegian |
ISO8859-1_IBM-278 |
ISO8859-1 to IBM-278 |
Finnish, Swedish |
ISO8859-1_IBM-280 |
ISO8859-1 to IBM-280 |
Italian |
ISO8859-1_IBM-281 |
ISO8859-1 to IBM-281 |
Japanese-Latin |
ISO8859-1_IBM-284 |
ISO8859-1 to IBM-284 |
Spanish |
ISO8859-1_IBM-285 |
ISO8859-1 to IBM-285 |
U.K. English |
ISO8859-1_IBM-297 |
ISO8859-1 to IBM-297 |
French |
ISO8859-1_IBM-500 |
ISO8859-1 to IBM-500 |
Belgian, Swiss German |
ISO8859-2_IBM-852 |
ISO8859-2 to IBM-852 |
Croatian, Czechoslovakian, Hungarian, Polish, Romanian, Serbian Latin, Slovak, Slovene |
ISO8859-2_IBM-870 |
ISO8859-2 to IBM-870 |
Croatian, Czechoslovakian, Hungarian, Polish, Romanian, Serbian Latin, Slovak, Slovene |
ISO8859-5_IBM-855 |
ISO8859-5 to IBM-855 |
Bulgarian, Macedonian, Serbian Cyrillic, Russian |
ISO8859-5_IBM-880 |
ISO8859-5 to IBM-880 |
Bulgarian, Macedonian, Serbian Cyrillic, Russian |
ISO8859-5_IBM-1025 |
ISO8859-5 to IBM-1025 |
Bulgarian, Macedonian, Serbian Cyrillic, Russian |
ISO8859-6_IBM-420 |
ISO8859-6 to IBM-420 |
Arabic |
ISO8859-5_IBM-866 |
ISO8859-5 to IBM-866 |
Russian |
ISO8859-7_IBM-869 |
ISO8859-7 to IBM-869 |
Greek |
ISO8859-7_IBM-875 |
ISO8859-7 to IBM-875 |
Greek |
ISO8859-8_IBM-424 |
ISO8859-8 to IBM-424 |
Hebrew |
ISO8859-8_IBM-803 |
ISO8859-8 to IBM-803 |
Hebrew |
ISO8859-9_IBM-857 |
ISO8859-9 to IBM-857 |
Turkish |
ISO8859-9_IBM-1026 |
ISO8859-9 to IBM-1026 |
Turkish |
List of Multibyte Code Set Converters
Multibyte code-set converters convert characters among the following code-sets:
- PC multibyte code sets
- EUC multibyte code sets (ISO-based)
- EBCDIC multibyte code sets
The following table lists code set names that are compatible. Each line defines to/from strings that may be used when requesting a converter.
Code Set Compatibility |
|
|
Language |
PC |
ISO |
EBCDIC |
Japanese |
IBM-932 |
IBM-eucJP |
IBM-930, IBM-939 |
Korean |
IBM-934 |
IBM-eucKR |
IBM-933 |
Traditional Chinese |
IBM-938, big-5 |
IBM-eucTW |
IBM-937 |
Simplified Chinese |
IBM-1381 |
IBM-eucCN |
IBM-935 |
- Conversions between Simplified and Traditional Chinese are provided (IBM-eucTW <--> IBM-eucCN and big5 <--> IBM-eucCN).
- UTF-8 is an additional code set. See "List of UTF-8 Interchange Converters" for more information.
Files
The following list describes the Multibyte Code Set converters that are found in the /usr/lib/nls/loc/iconv directory.
Converter |
Description |
IBM-eucJP_IBM-932 |
IBM-eucJP to IBM-932 |
IBM-eucJP_IBM-930 |
IBM-eucJP to IBM-930 |
IBM-eucCN_IBM-936(PC5550) |
IBM-eucCN to IBM-936(PC5550) |
IBM-eucCN_IBM-935 |
IBM-eucCN to IBM-935 |
IBM-eucJP_IBM-939 |
IBM-eucJP to IBM-939 |
IBM-eucCN_IBM-1381 |
IBM-eucCN to IBM-1381 |
IBM-930_IBM-932 |
IBM-930 to IBM-932 |
IBM-930_IBM-eucJP |
IBM-930 to IBM-eucJP |
IBM-932_IBM-eucJP |
IBM-932 to IBM-eucJP |
IBM-932_IBM-930 |
IBM-932 to IBM-930 |
IBM-936(PC5550)_IBM-935 |
IBM-936(PC5550) to IBM-935 |
IBM-936_IBM-935 |
IBM-936 to IBM-935 |
IBM-932_IBM-939 |
IBM-932 to IBM-939 |
IBM-939_IBM-932 |
IBM-939 to IBM-932 |
IBM-935_IBM-936(PC5550) |
IBM-935 to IBM-936(PC5550) |
IBM-935_IBM-936 |
IBM-935 to IBM-936 |
IBM-1381_IBM-935 |
IBM-1381 to IBM-935 |
IBM-935_IBM-1381 |
IBM-935 to IBM-1381 |
IBM-935_IBM-eucCN |
IBM-935 to IBM-eucCN |
IBM-936(PC5550)_IBM-eucCN |
IBM-936(PC5550) to IBM-eucCN |
IBM-eucTW_IBM-eucCN |
IBM-eucTW to IBM-eucCN |
big5_IBM-eucCN |
big5 to IBM-eucCN |
IBM-1381_IBM-eucCN |
IBM-1381 to IBM-eucCN |
IBM-939_IBM-eucJP |
IBM-939 to IBM-eucJP |
IBM-eucKR_IBM-934 |
IBM-eucKR to IBM-934 |
IBM-934_IBM-eucKR |
IBM-934 to IBM-eucKR |
IBM-eucKR_IBM-933 |
IBM-eucKR to IBM-933 |
IBM-933_IBM-eucKR |
IBM-933 to IBM-eucKR |
IBM-eucTW_IBM-937 |
IBM-eucTW to IBM-937 |
IBM-938_IBM-937 |
IBM-938 to IBM-937 |
big-5_IBM-937 |
big-5 to IBM-937 |
IBM-eucCN_IBM-eucTW |
IBM-eucCN to IBM-eucTW |
IBM-937_IBM-eucTW |
IBM-937 to IBM-eucTW |
IBM-937_IBM-938 |
IBM-937 to IBM-938 |
IBM-eucTW_IBM-938 |
IBM_eucTW to IBM_938 |
IBM-eucCN_big5 |
IBM-eucCN to big5 |
IBM-eucTW_big-5 |
IBM_eucTW to big-5 |
IBM-937_big-5 |
IBM-937 to big-5 |
CNS11643.1992-3_IBM-eucTW |
CNS11643.1992-3 to IBM_eucTW |
CNS11643.1992-3-GL_IBM-eucTW |
CNS11643.1992-3-GL to IBM_eucTW |
CNS11643.1992-3-GR_IBM-eucTW |
CNS11643.1992-3-GR to IBM_eucTW |
CNS11643.1992-4_IBM-eucTW |
CNS11643.1992-4 to IBM_eucTW |
CNS11643.1992-4-GL_IBM-eucTW |
CNS11643.1992-4-GL to IBM_eucTW |
CNS11643.1992-4-GR_IBM-eucTW |
CNS11643.1992-4-GR to IBM_eucTW |
IBM-eucTW_CNS11643.1992-3 |
IBM_eucTW to CNS11643.1992-3 |
IBM-eucTW_CNS11643.1992-3-GL |
IBM_eucTW to CNS11643.1992-3-GL |
IBM-eucTW_CNS11643.1992-3-GR |
IBM_eucTW to CNS11643.1992-3-GR |
IBM-eucTW_CNS11643.1992-4 |
IBM_eucTW to CNS11643.1992-4 |
IBM-eucTW_CNS11643.1992-4-GL |
IBM_eucTW to CNS11643.1992-4-GL |
IBM-eucTW_CNS11643.1992-4-GR |
IBM_eucTW to CNS11643.1992-4-GR |
IBM-eucCN_GB2312.1980-1 |
IBM-eucCN to GB2312.1980-1 |
IBM-eucCN_GB2312.1980-1-GL |
IBM-eucCN to GB2312.1980-1-GL |
IBM-eucCN_GB2312.1980-1-GR |
IBM-eucCN to GB2312.1980-1-GR |
IBM-937_csic |
IBM-937 to csic |
csic_IBM-937 |
csic to IBM-937 |
IBM-938_csic |
IBM-938 to csic |
csic_IBM-938 |
csic to IBM-938 |
IBM-eucTW_ccdc |
IBM-eucTW to ccdc |
ccdc_IBM-eucTW |
ccdc to IBM-eucTW |
IBM-eucTW_cns |
IBM-eucTW to cns |
cns_IBM-eucTW |
cnd to IBM-eucTW |
IBM-eucTW_csic |
IBM-eucTW to csic |
csic_IBM-eucTW |
csic to IBM-eucTW |
IBM-eucTW_sops |
IBM-ecuTW to sops |
sops_IBM-eucTW |
sops to IBM-eucTW |
IBM-eucTW_tca |
IBM-eucTW to tca |
tca_IBM-eucTW |
tca to IBM-eucTW |
big5_cns |
big5 to cns |
cns_big5 |
cns to big5 |
big5_csic |
big5 to csic |
csic_big5 |
csic to big5 |
big5_ttc |
big5 to ttc |
ttc_big5 |
ttc to big5 |
big5_ttcmin |
big5 to ttcmin |
ttcmin_big5 |
ttcmin to big5 |
big5_unicode |
big5 to unicode |
unicode_big5 |
unicode to big5 |
big5_wang |
big5 to wang |
wang_big5 |
wang to big5 |
ccdc_csic |
ccdc to csic |
csic_ccdc |
csic to_ccdc |
csic_sops |
csic to sops |
sops_csic |
sops to csic |
CNS11643.1986-1_big5 |
CNS11643.1986-1 to big5 |
big5_CNS11643.1986-1 |
big5 to CNS11643.1986-1 |
CNS11643.1986-1-GR_big5 |
CNS11643.1986-1-GR to big5 |
big5_CNS11643.1986-1-GR |
big5 to CNS11643.1986-1-GR |
CNS11643.1986-2_big5 |
CNS11643.1986-2 to big5 |
big5_CNS11643.1986-2 |
big5 to CNS11643.1986-2 |
CNS11643.1986-2-GR_big5 |
CNS11643.1986-2-GR to big5 |
big5_CNS11643.1986-2-GR |
big5 to CNS11643.1986-2-GR |
CNS11643.CT-GR_big5 |
CNS11643.CT-GR to big5 |
big5_CNS11643.CT-GR |
big5 to CNS11643.CT-GR |
IBM-sbdTW-GR_big5 |
IBM-sbdTW-GR to big5 |
big5_IBM-sbdTW-GR |
big5 to IBM-sbdTW-GR |
IBM-sbdTW.CT-GR_big5 |
IBM-sbdTW.CT-GR to big5 |
big5_IBM-sbdTW.CT-GR |
big5 to IBM-sbdTW.CT-GR |
IBM-sbdTW_big5 |
IBM-sbdTW to big5 |
big5_IBM-sbdTW |
big5 to IBM-sbdTW |
IBM-udcTW-GR_big5 |
IBM-udcTW-GR to big5 |
big5_IBM-udcTW-GR |
big5 to IBM-udcTW-GR |
IBM-udcTW.CT-GR_big5 |
IBM-udcTW.CT-GR to big5 |
big5_IBM-udcTW.CT-GR |
big5 to IBM-udcTW.CT-GR |
ISO8859-1_big5 |
ISO8859 to big5 |
big5_ISO8859-1 |
big5 to ISO8859 |
IBM-sbdTW_big5 |
IBM-sbdTW to big5 |
big5_IBM-sbdTW |
big5 to IBM-sbdTW |
big5_ASCII-GR |
big5 to ASCII-GR |
ASCII-GR_big5 |
ASCII-GR to big5 |
GBK_big5 |
GBK to big5 |
big5_GBK |
big5 to GBK |
GBK_IBM-eucTW |
GBK to IBM-eucTW |
IBM-eucTW_GBK |
IBM-eucTW to GBK |
CNS11643.1986-1_GBK |
CNS11643.1986-1 to GBK |
GBK_CNS11643.1986-1 |
GBK to CNS11643.1986-1 |
CNS11643.1986-2_GBK |
CNS11643.1986-2 to GBK |
GBK_CNS11643.1986-2 |
GBK to CNS11643.1986-2 |
CNS11643.1986-1-GR_GBK |
CNS11643.1986-1-GR to GBK |
GBK_CNS11643.1986-1-GR |
GBK to CNS11643.1986-1-GR |
CNS11643.1986-2-GR_GBK |
CNS11643.1986-2-GR to GBK |
GBK_CNS11643.1986-2-GR |
GBK to CNS11643.1986-2-GR |
CNS11643.1986-1-GL_GBK |
CNS11643.1986-1-GL to GBK |
GBK_CNS11643.1986-1-GL |
GBK to CNS11643.1986-1-GL |
CNS11643.1986-2-GL_GBK |
CNS11643.1986-2-GL to GBK |
GBK_CNS11643.1986-2-GL |
GBK to CNS11643.1986-2-GL |
CNS11643.CT-GR_GBK |
CNS11643.CT-GR to GBK |
GBK_CNS11643.CT-GR |
GBK to CNS11643.CT-GR |
GB2312.1980.CT-GR_GBK |
GB2312.1980.CT-GR to GBK |
GBK_GB2312.1980.CT-GR |
GBK to GB2312.1980.CT-GR |
GB2312.1980-0_GBK |
GBK2312.1980-0 to GBK |
GBK_GB2312.1980-0 |
GBK to GBK2312.1980-0 |
GB2312.1980-0-GR_GBK |
GB2312.1980-0-GR to GBK |
GBK_GB2312.1980-0-GR |
GBK to GB2312.1980-0-GR |
GB2312.1980-0-GL_GBK |
GB2312.1980-0-GL to GBK |
GBK_GB2312.1980-0-GL |
GBK to GB2312.1980-0-GL |
ASCII-GR_GBK |
ASCII-GR to GBK |
GBK_ASCII-GR |
GBK to ASCII-GR |
ISO8859-1_GBK |
ISO8859-1 to GBK |
GBK_ISO8859-1 |
GBK to ISO8859-1 |
IBM-eucCN_GBK |
IBM-eucCN to GBK |
GBK_IBM-eucCN |
GBK to IBM-eucCN |
List of Interchange Converters--7-bit
This converter provides conversion between internal code and 7-bit standard interchange formats (fold7). The fold7 name identifies encodings that can be used to pass text data through 7-bit mail protocols. The encodings are based on ISO2022. For more information about fold7 see "Understanding libiconv".
The fold7 converters convert characters from a code set to a canonical 7-bit encoding that identifies each character. This type of conversion is useful in networks where clients communicate with different code sets but use the same character sets. For example:
IBM-850 <--> ISO8859-1 |
Common Latin characters |
IBM-932 <-->IBM-eucJP |
Common Japanese characters |
The following escape sequences designate standard code sets:
01/11 02/04 04/00 |
GL JIS X0208.1978-0. |
01/11 02/04 02/08 04/01 |
GL left half of GB2312.1980-0. |
01/11 02/08 04/02 |
GL 7-bit ASCII or left half of ISO8859-1. |
01/11 02/14 04/01 |
GL right half of ISO8859-1. |
01/11 02/14 04/02 |
GL right half of ISO8859-2. |
01/11 02/14 04/03 |
GL right half of ISO8859-3. |
01/11 02/14 04/04 |
GL right half of ISO8859-4. |
01/11 02/14 04/06 |
GL right half of ISO8859-7. |
01/11 02/14 04/07 |
GL right half of ISO8859-6. |
01/11 02/14 04/08 |
GL right half of ISO8859-8. |
01/11 02/14 04/12 |
GL right half of ISO8859-5. |
01/11 02/14 04/13 |
GL right half of ISO8859-9. |
01/11 02/08 04/09 |
GL right half of JIS X0201.1976-0. |
01/11 02/08 04/10 |
GL left half of JIS X0201.1976. |
01/11 02/04 04/02 |
GL JIS X0208.1983-0. |
01/11 02/04 02/08 04/02 |
GL JIS X0208.1983-0. |
01/11 02/04 02/08 04/00 |
GL JISX0208.1978-0. |
01/11 02/05 02/15 03/01 M L 06/09 06/02 06/13 02/13 03/08 03/05 03/00 00/02 |
|
GL right half of IBM-850 unique characters. Characters common to ISO8859-1 do not use this escape sequence. |
01/11 02/05 02/15 03/02 M L 06/09 06/02 06/13 02/13 07/05 06/04 06/03 04/10 05/00 00/02 |
|
GL Japanese) IBM-udcJP) user-definable characters. |
01/11 02/04 02/08 04/03 |
GL KSC5601-1987. |
01/11 02/04 02/09 03/00 |
GL CNS11643-1986-1. |
01/11 02/04 02/10 03/01 |
GL CNS11643-1986-2. |
01/11 02/05 02/15 03/00 M L 05/05 05/04 04/06 02/13 03/07 00/02 |
UCS-2 encoded as base64; used only for those characters not encoded by any of the other 7-bit escape sequences listed above. |
When converting from a code set to fold7, the escape sequence used to designate the code set is chosen according to the order listed. For example, the JISX0208.1983-0 characters use 01/11 01/04 04/02 as the designation.
Files
The following list describes the fold7 converters that are found in the /usr/lib/nls/loc/iconv directory:
Converter |
Description |
fold7_IBM-850 |
Interchange format to IBM-850 |
fold7_IBM-921 |
Interchange format to IBM-921 |
fold7_IBM-922 |
Interchange format to IBM-922 |
fold7_IBM-932 |
Interchange format to IBM-932 |
fold7_IBM_1124 |
Interchange format to IBM-1124 |
fold7_IBM_1129 |
Interchange format to IBM-1129 |
fold7_IBM_eucCN |
Interchange format to IBM-eucCN |
fold7_IBM-eucJP |
Interchange format to IBM-eucJP |
fold7_IBM-eucKR |
Interchange format to IBM-eucKR |
fold7_IBM-eucTW |
Interchange format to IBM-eucTW |
fold7_ISO8859-1 |
Interchange format to ISO8859-1 |
fold7_ISO8859-2 |
Interchange format to ISO8859-2 |
fold7_ISO8859-3 |
Interchange format to ISO8859-3 |
fold7_ISO8859-4 |
Interchange format to ISO8859-4 |
fold7_ISO8859-5 |
Interchange format to ISO8859-5 |
fold7_ISO8859-6 |
Interchange format to ISO8859-6 |
fold7_ISO8859-7 |
Interchange format to ISO8859-7 |
fold7_ISO8859-8 |
Interchange format to ISO8859-8 |
fold7_ISO8859-9 |
Interchange format to ISO8859-9 |
fold7_TIS-620 |
Interchange format to TIS-620 |
fold7_UTF-8 |
Interchange format to UTF-8 |
fold7_big5 |
Interchange format to big5 |
fold7_GBK |
Interchange format to GBK |
IBM-921_fold7 |
IBM-921 to interchange format |
IBM-922_fold7 |
IBM-922 to interchange format |
IBM-850_fold7 |
IBM-850 to interchange format |
IBM-932_fold7 |
IBM-932 to interchange format |
IBM-1124_fold7 |
IBM-1124 to interchange format |
IBM-1129_fold7 |
IBM-1129 to interchange format |
IBM-eucCN_fold7 |
IBM-eucCN to interchange format |
IBM-eucJP_fold7 |
IBM-eucJP to interchange format |
IBM-eucKR_fold7 |
IBM-eucKR to interchange format |
IBM-eucTW_fold7 |
IBM-eucTW to interchange format |
ISO8859-1_fold7 |
ISO8859-1 to interchange format |
ISO8859-2_fold7 |
ISO8859-2 to interchange format |
ISO8859-3_fold7 |
ISO8859-3 to interchange format |
ISO8859-4_fold7 |
ISO8859-4 to interchange format |
ISO8859-5_fold7 |
ISO8859-5 to interchange format |
ISO8859-6_fold7 |
ISO8859-6 to interchange format |
ISO8859-7_fold7 |
ISO8859-7 to interchange format |
ISO8859-8_fold7 |
ISO8859-8 to interchange format |
ISO8859-9_fold7 |
ISO8859-9 to interchange format |
TIS-620_fold7 |
TIS-620 to interchange format |
UTF-8_fold7 |
UTF-8 to interchange format |
big5_fold7 |
big5 to interchange format |
GBK_fold7 |
GBK to interchange format |
List of Interchange Converters--8-bit
This converter provides conversions between internal code and 8-bit standard interchange formats (fold8). The fold8 name identifies encodings that can be used to pass text data through 8-bit mail protocols. The encodings are based on ISO2022. For more information about fold8 see "Understanding libiconv".
The fold8 converters convert characters from a specific code set encoding to a canonical 8-bit encoding that identifies each character. This type of conversion is useful in networks where clients communicate with different code sets but use the same character sets. For example:
IBM-850 <--> ISO8859-1 |
Common Latin characters |
IBM-932 <-->IBM-eucJP |
Common Japanese characters |
The following escape sequences designate standard code sets.
01/11 02/04 02/09 04/01 |
GR right half of GB2312.1980-0. |
01/11 02/13 04/01 |
GR right half of ISO8859-1. |
01/11 02/13 04/02 |
GR right half of ISO8859-2. |
01/11 02/13 04/03 |
GR right half of ISO8859-3. |
01/11 02/13 04/04 |
GR right half of ISO8859-4. |
01/11 02/13 04/06 |
GR right half of ISO8859-7. |
01/11 02/13 04/07 |
GR right half of ISO8859-6. |
01/11 02/13 04/08 |
GR right half of ISO8859-8. |
01/11 02/13 04/13 |
GR right half of ISO8859-5. |
01/11 02/13 04/13 |
GR right half of ISO8859-9. |
01/11 02/09 04/09 |
GR right half of JIS X0201.1976-1. |
01/11 02/04 02/09 04/02 |
GR JIS X0208.1983-1. |
01/11 02/04 02/09 04/00 |
GR JISX0208.1978-1. |
01/11 02/09 04/02 |
GR 7-bit ASCII or left half of ISO8859-1. |
01/11 02/05 02/15 03/01 M L 04/09 04/02 04/13 02/13 03/08 03/05 03/00 00/02 |
|
GR right half of IBM-850 unique characters. Characters common to ISO8859-1 should not use this escape sequence. |
01/11 02/05 02/15 03/02 M L 04/09 04/02 04/13 02/13 07/05 06/04 06/03 04/10 05/00 00/02 |
|
GR right half of Japanese user-definable characters. |
01/11 02/08 04/02 |
GL 7-bit ASCII or left half of ISO8859-1. |
01/11 02/14 04/01 |
GL right half of ISO8859-1. |
01/11 02/14 04/02 |
GL right half of ISO8859-2. |
01/11 02/14 04/03 |
GL right half of ISO8859-3. |
01/11 02/14 04/04 |
GL right half of ISO8859-4. |
01/11 02/14 04/06 |
GL right half of ISO8859-7. |
01/11 02/14 04/07 |
GL right half of ISO8859-6. |
01/11 02/14 04/08 |
GL right half of ISO8859-8. |
01/11 02/14 04/12 |
GL right half of ISO8859-5. |
01/11 02/14 04/13 |
GL right half of ISO8859-9. |
01/11 02/08 04/09 |
GL right half of JIS X0201.1976-0. |
01/11 02/08 04/10 |
GL left half of JIS X0201.1976. |
01/11 02/04 02/08 04/02 |
GL JIS X0208.1983-0. |
01/11 02/04 04/02 |
GL JIS X0208.1983-0. |
01/11 02/04 04/00 |
GL JIS X0208.1978-0. |
01/11 02/05 02/15 03/01 M L 06/09 06/02 06/13 02/13 03/08 03/05 03/00 00/02 |
|
GL right half of IBM-850 unique characters. Characters common to ISO8859-1 do not use this escape sequence. |
01/11 02/05 02/15 03/02 M L 06/09 06/02 06/13 02/13 07/05 06/04 06/03 04/10 05/00 00/02 |
|
GL Japanese (IBM-udcJP) user-definable characters. |
01/11 02/04 02/09 04/03 |
GR KSC5601-1987. |
01/11 02/04 02/09 03/00 |
GR CNS11643-1986-1. |
01/11 02/04 02/10 03/01 |
GR CNS11643-1986-2. |
01/11 02/05 02/15 03/02 M L 04/09 04/02 04/13 02/13 07/05 06/04 06/03 05/05 05/08 00/02 |
|
GR right half of Traditional Chinese user-definable characters. |
01/11 02/05 02/15 03/02 M L 04/09 04/02 04/13 02/13 07/03 06/02 06/04 05/05 05/08 00/02 |
|
GR right half of IBM-850 unique symbols. |
01/11 02/04 02/08 04/03 |
GL KSC5601-1987. |
01/11 02/05 02/15 03/02 M L 06/09 06/02 06/13 02/13 07/05 06/04 06/03 05/05 05/08 00/02 |
|
GL Traditional Chinese (IBM-udcTW) user-definable characters. |
01/11 02/05 02/15 03/02 M L 06/09 06/02 06/13 02/13 07/03 06/02 06/04 05/05 05/08 00/02 |
|
GL Traditional Chinese IBM-850 unique symbols (IBM-shdTW) user-definable characters. |
01/11 02/05 02/15 03/00 M L 05/05 05/04 04/06 02/13 03/08 00/02 |
|
UCS-2 encoded as UTF-8; used only for those characters not encoded by any of the above escape sequences listed above. |
When converting from a code set to fold8, the escape sequence used to designate the code set is chosen according to the order listed. For example, the JISX0208.1983-0 characters use 01/11 02/04 02/08 04/02 as the designation.
Files
The following list describes the fold8 converters found in the /usr/lib/nls/loc/iconv directory:
Converter |
Description |
fold8_IBM-850 |
Interchange format to IBM-850 |
fold8_IBM-921 |
Interchange format to IBM-921 |
fold8_IBM-922 |
Interchange format to IBM-922 |
fold8_IBM-932 |
Interchange format to IBM-932 |
fold8_IBM-1124 |
Interchange format to IBM-1124 |
fold8_IBM-1129 |
Interchange format to IBM-1129 |
fold8_IBM-eucCN |
Interchange format to IBM-eucCN |
fold8_IBM-eucJP |
Interchange format to IBM-eucJP |
fold8_IBM-eucKR |
Interchange format to IBM-eucKR |
fold8_IBM-eucTW |
Interchange format to IBM-eucTW |
fold8_IBM-eucCN |
Interchange fromat to IBM-eucCN |
fold8_ISO8859-1 |
Interchange format to ISO8859-1 |
fold8_ISO8859-2 |
Interchange format to ISO8859-2 |
fold8_ISO8859-3 |
Interchange format to ISO8859-3 |
fold8_ISO8859-4 |
Interchange format to ISO8859-4 |
fold8_ISO8859-5 |
Interchange format to ISO8859-5 |
fold8_ISO8859-6 |
Interchange format to ISO8859-6 |
fold8_ISO8859-7 |
Interchange format to ISO8859-7 |
fold8_ISO8859-8 |
Interchange format to ISO8859-8 |
fold8_ISO8859-9 |
Interchange format to ISO8859-9 |
fold8_TIS-620 |
Interchange format to TIS-620 |
fold8_UTF-8 |
Interchange format to UTF-8 |
fold8_big5 |
Interchange format to big5 |
fold8_GBK |
Interchange format to GBK |
IBM-921_fold8 |
IBM-921 to interchange format |
IBM-922_fold8 |
IBM-922 to interchange format |
IBM-850_fold8 |
IBM-850 to interchange format |
IBM-932_fold8 |
IBM-932 to interchange format |
IBM-1124_fold8 |
IBM-1124 to interchange format |
IBM-1129_fold8 |
IBM-1129 to interchange format |
IBM-eucCN_fold8 |
IBM-eucCN to interchange format |
IBM-eucJP_fold8 |
IBM-eucJP to interchange format |
IBM-eucKR_fold8 |
IBM-eucKR to interchange format |
IBM-eucTW_fold8 |
IBM-eucTW to interchange format |
IBM-eucCN_fold8 |
IBM-eucCN to interchange format |
ISO8859-1_fold8 |
ISO8859-1 to interchange format |
ISO8859-2_fold8 |
ISO8859-2 to interchange format |
ISO8859-3_fold8 |
ISO8859-3 to interchange format |
ISO8859-4_fold8 |
ISO8859-4 to interchange format |
ISO8859-5_fold8 |
ISO8859-5 to interchange format |
ISO8859-6_fold8 |
ISO8859-6 to interchange format |
ISO8859-7_fold8 |
ISO8859-7 to interchange format |
ISO8859-8_fold8 |
ISO8859-8 to interchange format |
ISO8859-9_fold8 |
ISO8859-9 to interchange format |
TIS-620_fold8 |
TIS-620 to interchange format |
UTF-8_fold8 |
UTF-8 to interchange format |
big5_fold8 |
big5 to interchange format |
GBK_fold8 |
GBK to interchange format |
List of Interchange Converters--Compound Text
Compound text interchange converters convert between compound text and internal code sets.
Compound text is an interchange encoding defined by the X Consortium. It is used to communicate text between X clients. Compound text is based on ISO2022 and can encode most character sets using standard escape sequences. It also provides extensions for encoding private character sets. The supported code sets provide a converter to and from compound text. The name used to identify the compound text encoding is ct.
The following escape sequences are used to designate standard code sets in the order listed below.
01/11 02/05 02/15 03/01 M L 04/09 04/02 04/13 02/13 03/08 03/05 03/00 00/02 |
|
GR right half of IBM-850 unique characters. Characters common to ISO8859-1 should not use this escape sequence. |
01/11 02/05 02/15 03/02 M L 04/09 04/02 04/13 02/13 07/05 06/04 06/03 04/10 05/00 00/02 |
|
GR right half of Japanese user-definable characters. |
01/11 02/05 02/15 03/01 M L 06/09 06/02 06/13 02/13 03/08 03/05 03/00 00/02 |
|
GL right half of IBM-850 unique characters. Characters common to ISO8859-1 do not use this escape sequence. |
01/11 02/05 02/15 03/02 M L 06/09 06/02 06/13 02/13 07/05 06/04 06/03 04/10 05/00 00/02 |
|
GL Japanese (IBM-udcJP) user-definable characters. |
Files
The following list describes the compound text converters that are found in the /usr/lib/nls/loc/iconv directory:
Converter |
Description |
ct_IBM-850 |
Interchange format to IBM-850 |
ct_IBM-921 |
Interchange format to IBM-921 |
ct_IBM-922 |
Interchange format to IBM-922 |
ct_IBM-932 |
Interchange format to IBM-932 |
ct_IBM-1124 |
Interchange format to IBM-1124 |
ct_IBM-1129 |
Interchange format to IBM-1129 |
ct_IBM-eucCN |
Interchange format to IBM-eucCN |
ct_IBM-eucJP |
Interchange format to IBM-eucJP |
ct_IBM-eucKR |
Interchange format to IBM-eucKR |
ct_IBM-eucTW |
Interchange format to IBM-eucTW |
ct_ISO8859-1 |
Interchange format to ISO8859-1 |
ct_ISO8859-2 |
Interchange format to ISO8859-2 |
ct_ISO8859-3 |
Interchange format to ISO8859-3 |
ct_ISO8859-4 |
Interchange format to ISO8859-4 |
ct_ISO8859-5 |
Interchange format to ISO8859-5 |
ct_ISO8859-6 |
Interchange format to ISO8859-6 |
ct_ISO8859-7 |
Interchange format to ISO8859-7 |
ct_ISO8859-8 |
Interchange format to ISO8859-8 |
ct_ISO8859-9 |
Interchange format to ISO8859-9 |
ct_TIS-620 |
Interchange format to TIS-620 |
ct_big5 |
Interchange format to big5 |
ct_GBK |
Interchange format to GBK |
IBM-850_ct |
IBM-850 to interchange format |
IBM-921_ct |
IBM-921 to interchange format |
IBM-922_ct |
IBM-922 to interchange format |
IBM-932_ct |
IBM-932 to interchange format |
IBM-1124_ct |
IBM-1124 to interchange format |
IBM-1129_ct |
IBM-1129 to interchange format |
IBM-eucCN_ct |
IBM-eucCN to interchange format |
IBM-eucJP_ct |
IBM-eucJP to interchange format |
IBM-eucKR_ct |
IBM-eucKR to interchange format |
IBM-eucTW_ct |
IBM-eucTW to interchange format |
ISO8859-1_ct |
ISO8859-1 to interchange format |
ISO8859-2_ct |
ISO8859-2 to interchange format |
ISO8859-3_ct |
ISO8859-3 to interchange format |
ISO8859-4_ct |
ISO8859-4 to interchange format |
ISO8859-5_ct |
ISO8859-5 to interchange format |
ISO8859-6_ct |
ISO8859-6 to interchange format |
ISO8859-7_ct |
ISO8859-7 to interchange format |
ISO8859-8_ct |
ISO8859-8 to interchange format |
ISO8859-9_ct |
ISO8859-9 to interchange format |
TIS-620_ct |
TIS-620 to interchange format |
big5_ct |
big5 to interchange format |
GBK_ct |
GBK to interchange format |
List of Interchange Converters--uucode
This converter provides the same mapping as the uuencode and uudecode Command.
During conversion from uucode, 62 bytes at a time (including a new-line character trailing the record) are converted, and generating 45 bytes in outbuf.
Files
The following list describes the uucode converters found in the /usr/lib/nls/loc/iconv directory:
Converter |
Description |
IBM-850_uucode |
IBM-850 to uucode |
IBM-921_uucode |
IBM-921 to uucode |
IBM-922_uucode |
IBM-922 to uucode |
IBM-932_uucode |
IBM-932 to uucode |
IBM-1124_uucode |
IBM-1124 to uucode |
IBM-1129_uucode |
IBM-1129 to uucode |
IBM-eucJP_uucode |
IBM-eucJP to uucode |
IBM-eucKR_uucode |
IBM-eucKR to uucode |
IBM-eucTW_uucode |
IBM-eucTW to uucode |
IBM-eucCN_uucode |
IBM-eucCN to uucode |
ISO8859-1_uucode |
ISO8859-1 to uucode |
ISO8859-2_uucode |
ISO8859-2 to uucode |
ISO8859-3_uucode |
ISO8859-3 to uucode |
ISO8859-4_uucode |
ISO8859-4 to uucode |
ISO8859-5_uucode |
ISO8859-5 to uucode |
ISO8859-6_uucode |
ISO8859-6 to uucode |
ISO8859-7_uucode |
ISO8859-7 to uucode |
ISO8859-8_uucode |
ISO8859-8 to uucode |
ISO8859-9_uucode |
ISO8859-9 to uucode |
TIS-620_uucode |
TIS-620 to uucode |
big5_uucode |
big5 to uucode |
GBK_uucode |
GBK to uucode |
uucode_IBM-850 |
uucode to IBM-850 |
uucode_IBM-921 |
uucode to IBM-921 |
uucode_IBM-922 |
uucode to IBM-922 |
uucode_IBM-932 |
uucode to IBM-932 |
uucode_IBM-1124 |
uucode to IBM-1124 |
uucode_IBM-1129 |
uucode to IBM-1129 |
uucode_IBM-eucCN |
uucode to IBM-eucCN |
uucode_IBM-eucJP |
uucode to IBM-eucJP |
uucode_IBM-eucKR |
uucode to IBM-eucKR |
uucode_IBM-eucTW |
uucode to IBM-eucTW |
uucode_ISO8859-1 |
uucode to ISO8859-1 |
uucode_ISO8859-2 |
uucode to ISO8859-2 |
uucode_ISO8859-3 |
uucode to ISO8859-3 |
uucode_ISO8859-4 |
uucode to ISO8859-4 |
uucode_ISO8859-5 |
uucode to ISO8859-5 |
uucode_ISO8859-6 |
uucode to ISO8859-6 |
uucode_ISO8859-7 |
uucode to ISO8859-7 |
uucode_ISO8859-8 |
uucode to ISO8859-8 |
uucode_ISO8859-9 |
uucode to ISO8859-9 |
uucode_TIS-1124 |
uucode to TIS-1129 |
uucode_big5 |
uucode to big5 |
uucode_GBK |
uucode to GBK |
List of UCS-2 Interchange Converters
UCS-2 is a universal, 16-bit encoding described in the "Code Set Overview". Conversions for each code set are provided in both directions, between the code set and UCS-2.
UCS-2 converters are found in /usr/lib/nls/loc/uconvTable and /usr/lib/nls/loc/uconv directories. The uconvdef command is used to generate new converters or to customize existing UCS-2 converters.
The /usr/lib/nls/loc/iconv/Universal_UCS_Conv converter is used to generate conversions from any code set X
to code set Y
by setting the proper links:
cd /usr/lib/nls/loc/iconv
ln -s /usr/lib/nls/loc/uconv/Universal_UCS_Conv X_Y
ln -s /usr/lib/nls/loc/uconv/UCSTBL X_UCS-2
ln-s /usr/lib/nls/loc/uconv/UCSTBL UCS-2_Y
ln -s /usr/lib/nls/loc/uconv/UCSTBL X
ln -s /usr/lib/nls/loc/uconv/UCSTBL Y
Converter |
Description |
ISO8859-1 |
UCS-2 <--> ISO Latin-1 |
ISO8859-2 |
UCS-2 <--> ISO Latin-2 |
ISO8859-3 |
UCS-2 <--> ISO Latin-3 |
ISO8859-4 |
UCS-2 <--> ISO Latin-4 |
ISO8859-5 |
UCS-2 <--> ISO Cyrillic |
ISO8859-6 |
UCS-2 <--> ISO Arabic |
ISO8859-7 |
UCS-2 <--> ISO Greek |
ISO8859-8 |
UCS-2 <--> ISO Hebrew |
ISO8859-9 |
UCS-2 <--> ISO Turkish |
JISX0201.1976-0 |
UCS-2 <--> Japanese JISX0201-0 |
JISX0208.1983-0 |
UCS-2 <--> Japanese JISX0208-0 |
CNS11643.1986-1 |
UCS-2 <--> Chinese CNS11643-1 |
CNS11643.1986-2 |
UCS-2 <--> Chinese CNS11643-2 |
KSC5601.1987-0 |
UCS-2 <--> Korean KSC5601-0 |
IBM-eucCN |
UCS-2 <--> Simplified Chinese EUC |
IBM-udcCN |
UCS-2 <--> Simplified Chinese user-defined characters |
IBM-sbdCN |
UCS-2 <--> Simplified Chinese IBM-specific characters |
GB2312.1980-0 |
UCS-2 <--> Simplified Chinese GB |
IBM-1381 |
UCS-2 <--> Simplified Chinese PC data code |
IBM-935 |
UCS-2 <--> Simplified Chinese EBCDIC |
IBM-936 |
UCS-2 <--> Simplified Chinese PC5550 |
IBM-eucJP |
UCS-2 <--> Japanese EUC |
IBM-eucKR |
UCS-2 <--> Korean EUC |
IBM-eucTW |
UCS-2 <--> Traditional Chinese EUC |
IBM-udcJP |
UCS-2 <--> Japanese user-defined characters |
IBM-udcTW |
UCS-2 <--> Traditional Chinese user-defined characters |
IBM-sbdTW |
UCS-2 <--> Traditional Chinese IBM-specific characters |
UTF-8 |
UCS-2 <--> UTF-8 |
IBM-437 |
UCS-2 <--> USA PC data code |
IBM-850 |
UCS-2 <--> Latin-1 PC data code |
IBM-852 |
UCS-2 <--> Latin-2 PC data code |
IBM-857 |
UCS-2 <--> Turkish PC data code |
IBM-860 |
UCS-2 <--> Portuguese PC data code |
IBM-861 |
UCS-2 <--> Icelandic PC data code |
IBM-863 |
UCS-2 <--> French Canadian PC data code |
IBM-865 |
UCS-2 <--> Nordic PC data code |
IBM-869 |
UCS-2 <--> Greek PC data code |
IBM-921 |
UCS-2 <--> Baltic Multilingual data code |
IBM-922 |
UCS-2 <--> Estonian data code |
IBM-932 |
UCS-2 <--> Japanese PC data code |
IBM-934 |
UCS-2 <--> Korea PC data code |
IBM-936 |
UCS-2 <--> People's Republic of China PC data code |
IBM-938 |
UCS-2 <--> Taiwanese PC data code |
IBM-942 |
UCS-2 <--> Extended Japanese PC data code |
IBM-944 |
UCS-2 <--> Korean PC data code |
IBM-946 |
UCS-2 <--> People's Republic of China SAA data code |
IBM-948 |
UCS-2 <--> Traditional Chinese PC data code |
IBM-1124 |
UCS-2 <--> Ukranian PC data code |
IBM-1129 |
UCS-2 <--> Vietnamese PC data code |
TIS-620 |
UCS-2 <--> Thailand PC data code |
IBM-037 |
UCS-2 <--> USA, Canada EBCDIC |
IBM-273 |
UCS-2 <--> Germany, Austria EBCDIC |
IBM-277 |
UCS-2 <--> Denmark, Norway EBCDIC |
IBM-278 |
UCS-2 <--> Finland, Sweden EBCDIC |
IBM-280 |
UCS-2 <--> Italy EBCDIC |
IBM-284 |
UCS-2 <--> Spain, Latin America EBCDIC |
IBM-285 |
UCS-2 <--> United Kingdom EBCDIC |
IBM-297 |
UCS-2 <--> France EBCDIC |
IBM-500 |
UCS-2 <--> International EBCDIC |
IBM-875 |
UCS-2 <--> Greek EBCDIC |
IBM-930 |
UCS-2 <--> Japanese Katakana-Kanji EBCDIC |
IBM-933 |
UCS-2 <--> Korean EBCDIC |
IBM-937 |
UCS-2 <--> Traditional Chinese EBCDIC |
IBM-939 |
UCS-2 <--> Japanese Latin-Kanji EBCDIC |
IBM-1026 |
UCS-2 <--> Turkish EBCDIC |
IBM-1112 |
UCS-2 <--> Baltic Multilingual EBCDIC |
IBM-1122 |
UCS-2 <--> Estonian EBCDIC |
IBM-1124 |
UCS-2 <--> Ukranian EBCDIC |
IBM-1129 |
UCS-2 <--> Vietnamese EBCDIC |
GBK |
UCS-2<--> Simplified Chinese |
TIS-620 |
UCS-2 <-->Thailand EBCDIC |
List of UTF-8 Interchange Converters
UTF-8 is a universal, multibyte encoding described in the "UCS-2 and UTF-8". Conversions for each code set are provided in both directions, between the code set and UTF-8.
UTF-8 converters are usually done by using the Universal_UCS_Conv (see "List of UCS-2 Interchange Converters" and /usr/lib/nls/loc/uconv/UTF-8 conversion.
Converter |
Description |
ISO8859-1 |
UTF-8 <--> ISO Latin-1 |
ISO8859-2 |
UTF-8 <--> ISO Latin-2 |
ISO8859-3 |
UTF-8 <--> ISO Latin-3 |
ISO8859-4 |
UTF-8 <--> ISO Latin-4 |
ISO8859-5 |
UTF-8 <--> ISO Cyrillic |
ISO8859-6 |
UTF-8 <--> ISO Arabic |
ISO8859-7 |
UTF-8 <--> ISO Greek |
ISO8859-8 |
UTF-8 <--> ISO Hebrew |
ISO8859-9 |
UTF-8 <--> ISO Turkish |
JISX0201.1976-0 |
UTF-8 <--> Japanese JISX0201-0 |
JISX0208.1983-0 |
UTF-8 <--> Japanese JISX0208-0 |
CNS11643.1986-1 |
UTF-8 <--> Chinese CNS11643-1 |
CNS11643.1986-2 |
UTF-8 <--> Chinese CNS11643-2 |
KSC5601.1987-0 |
UTF-8 <--> Korean KSC5601-0 |
IBM-eucCN |
UTF-8 <--> Simplified Chinese EUC |
IBM-eucJP |
UTF-8 <--> Japanese EUC |
IBM-eucKR |
UTF-8 <--> Korean EUC |
IBM-eucTW |
UTF-8 <--> Traditional Chinese EUC |
IBM-udcJP |
UTF-8 <--> Japanese user-defined characters |
IBM-udcTW |
UTF-8 <--> Traditional Chinese user-defined characters |
IBM-sbdTW |
UTF-8 <--> Traditional Chinese IBM-specific characters |
UCS-2 |
UTF-8 <--> UCS-2 |
IBM-437 |
UTF-8 <--> USA PC data code |
IBM-850 |
UTF-8 <--> Latin-1 PC data code |
IBM-852 |
UTF-8 <--> Latin-2 PC data code |
IBM-857 |
UTF-8 <--> Turkish PC data code |
IBM-860 |
UTF-8 <--> Portuguese PC data code |
IBM-861 |
UTF-8 <--> Icelandic PC data code |
IBM-863 |
UTF-8 <--> French Canadian PC data code |
IBM-865 |
UTF-8 <--> Nordic PC data code |
IBM-869 |
UTF-8 <--> Greek PC data code |
IBM-921 |
UTF-8 <--> Baltic Multilingual data code |
IBM-922 |
UTF-8 <--> Estonian data code |
IBM-932 |
UTF-8 <--> Japanese PC data code |
IBM-934 |
UTF-8 <--> Korea PC data code |
IBM-935 |
UTF-8 <--> Simplified Chinese EBCDIC |
IBM-936 |
UTF-8 <--> People's Republic of China PC data code |
IBM-938 |
UTF-8 <--> Taiwanese PC data code |
IBM-942 |
UTF-8 <--> Extended Japanese PC data code |
IBM-944 |
UTF-8 <--> Korean PC data code |
IBM-946 |
UTF-8 <--> People's Republic of China SAA data code |
IBM-948 |
UTF-8 <--> Traditional Chinese PC data code |
IBM-1124 |
UTF-8 <--> Ukranian PC data code |
IBM-1129 |
UTF-8 <--> Vietnamese PC data code |
TIS-620 |
UTF-8 <--> Thailand PC data code |
IBM-037 |
UTF-8 <--> USA, Canada EBCDIC |
IBM-273 |
UTF-8 <--> Germany, Austria EBCDIC |
IBM-277 |
UTF-8 <--> Denmark, Norway EBCDIC |
IBM-278 |
UTF-8 <--> Finland, Sweden EBCDIC |
IBM-280 |
UTF-8 <--> Italy EBCDIC |
IBM-284 |
UTF-8 <--> Spain, Latin America EBCDIC |
IBM-285 |
UTF-8 <--> United Kingdom EBCDIC |
IBM-297 |
UTF-8 <--> France EBCDIC |
IBM-500 |
UTF-8 <--> International EBCDIC |
IBM-875 |
UTF-8 <--> Greek EBCDIC |
IBM-930 |
UTF-8 <--> Japanese Katakana-Kanji EBCDIC |
IBM-933 |
UTF-8 <--> Korean EBCDIC |
IBM-937 |
UTF-8 <--> Traditional Chinese EBCDIC |
IBM-939 |
UTF-8 <--> Japanese Latin-Kanji EBCDIC |
IBM-1026 |
UTF-8 <--> Turkish EBCDIC |
IBM-1112 |
UTF-8 <--> Baltic Multilingual EBCDIC |
IBM-1122 |
UTF-8 <--> Estonian EBCDIC |
IBM-1124 |
UTF-8 <--> Ukranian EBCDIC |
IBM-1129 |
UTF-8 <--> Vietnamese EBCDIC |
IBM-1381 |
UTF-8 <--> Simplified Chinese PC data code |
GBK |
UTF-8<--> Simplified Chinese |
TIS-620 |
UTF-8 <--> Thailand EBCDIC |
List of Miscellaneous Converters
A set of low level converters used by the code set and interchange converters is provided. These converters are called miscellaneous converters. These low-level converters may be used by some of the interchange converters. However, the use of these converters is discouraged because they are intended for support of other converters.
Files
The following list describes the miscellaneous converters found in the /usr/lib/nls/loc/iconv and /usr/lib/nls/loc/iconvTable directories:
Converter |
Description |
IBM-932_JISX0201.1976-0 |
IBM-932 to JISX0201.1976-0 |
IBM-932_JISX0208.1983-0 |
IBM-932 to JISX0208.1983-0 |
IBM-932_IBM-udcJP |
IBM-932 to IBM-udcJP (Japanese user-defined characters) |
IBM-eucJP_JISX0201.1976-0 |
IBM-eucJP to JISX0201.1976-0 |
IBM-eucJP_JISX0208.1983-0 |
IBM-eucJP to JISX0208.1983-0 |
IBM-eucJP_IBM-udcJP |
IBM-eucJP to IBM-udcJP (Japanese user-defined characters) |
IBM-eucKR_KSC5601.1987-0 |
IBM_eucKR to KSC5601.1987-0 |
IBM-eucTW_CNS11643.1986-1 |
IBM-eucTW to CNS11643.1986.1 |
IBM-eucTW_CNS11643.1986-2 |
IBM-eucTW to CNS11643.1986-2 |
IBM-eucCN_GB2312.1980-0 |
IBM-eucCN to GB2312.1980-0 |
Related Information
National Language Support Overview for Programming, List of National Language Support Subroutines.
Code Sets Overview in AIX Kernel Extensions and Device Support Programming Concepts.
The iconv command, uuencode and uudecode commands.
The iconv_open subroutine, iconv subroutine, iconv_close subroutine.
[ Previous |
Next |
Contents |
Glossary |
Home |
Search ]