General Programming Concepts: Writing and Debugging Programs

Converters Overview for Programming

National Language Support (NLS) provides a base for internationalization in which data often can be changed from one code set to another. Support of several standard converters for this purpose is provided. This section discusses the following aspects of conversion:

Converters Introduction

Data sent by one program to another program residing on a remote host may require conversion from the code set of the source machine to that of the receiver. For example, when communicating with a VM system, the workstation converts its ISO8859-1 data to an EBCDIC form.

Code sets define graphic characters and control character assignments to code points. These coded characters must also be converted when a program obtains data in one code set but displays it in another code set.

Two interfaces for conversions are provided:

iconv command	Allows you to request a specific conversion by naming the FromCode and ToCode code sets.
libiconv functions (Understanding libiconv)	Allow applications to request converters by name.

The system provides ready-to-use libraries of converters. You supply the name of the converter you want to use. The converter libraries are found in the /usr/lib/nls/loc/iconv/* and /usr/lib/nls/loc/iconvTable/* directories.

In addition to code set converters, the converter library also provides a set of network interchange converters. In a network environment, the code sets of the communications systems and the protocols of communication determine how the data should be converted.

Interchange converters are used to convert data sent from one system to another. Conversions from one internal code set to another require code set converters. When data must be converted from a sender's code set to a receiver's code set or from 8-bit data to 7-bit data, a uniform interface is required. The iconv subroutines provide this interface.

Standard Converters

The system supports standard converters for use with the iconv command and subroutines. The following list describes the different types of converters:

Code Set Converter Types	Description
Table converter (List of PC, ISO, and EBCDIC Code Set Converters)	Converts single-byte stateless code sets. Performs a table translation from one byte to another byte.
Algorithmic converter (List of Multibyte Code Set Converters)	Performs a conversion that cannot be implemented using a simple single-byte mapping table. All multibyte converters are currently implemented in this way.

Interchange Converter Types	Description
List of Interchange Converters--7-bit	Converts between internal code sets and ISO2022 standard interchange formats (7-bit).
List of Interchange Converters--8-bit	Converts between internal code sets and ISO2022 standard interchange formats (8-bit).
List of Interchange Converters--Compound Text	Converts between compound text and internal code sets.
List of Interchange Converters--uucode	Provides the same mapping as that defined in the uuencode and uudecode command.
List of UCS-2 Interchange Converters	Converts between UCS-2 and other code sets.
List of UTF-8 Interchange Converters	Converts between UTF-8 and other code sets.

Miscellaneous Converters	Description
List of Miscellaneous Converters	Used by some of the converters listed above.

Understanding libiconv

The iconv application programming interface (API) consists of three subroutines that accomplish conversion:

iconv_open	Performs the initialization required to convert characters from the code set specified by the FromCode parameter to the code set specified by the ToCode parameter. The strings specified are dependent on the converters installed in the system. If initialization is successful, the converter descriptor, iconv_t, is returned in its initial state.
The iconv Subroutine	Invokes the converter function using the descriptor obtained from the iconv_open subroutine. The inbuf parameter points to the first character in the input buffer, and the inbytesleft parameter indicates the number of bytes to the end of the buffer being converted. The outbuf parameter points to the first available byte in the output buffer, and the outbytesleft parameter indicates the number of available bytes to the end of the buffer. For state-dependent encoding, the subroutine is placed in its initial state by a call for which the inbuf value is a null pointer. Subsequent calls with the inbuf parameter as something other than a null pointer cause the internal state of the function to be altered as necessary.
iconv_close	Closes the conversion descriptor specified by the cd variable and makes it usable again.

In a network environment, two factors determine how data should be converted:

Code sets of the sender and the receiver
Communication protocol (8-bit or 7-bit data)

The following table outlines the conversion methods and recommends how you should convert data in different situations. See the List of Interchange Converters--7-bit and the List of Interchange Converters--8-bit for more information.

Outline of Methods and Recommended Choices
	Communication with system using the same code set		Communication with system using different code set or receiver's code set is unknown
	Protocol		Protocol
Method to choose	7-bit only	8-bit	7-bit only	8-bit
as is	Not valid	Best choice	Not valid	Not valid if remote code set is unknown
fold7	OK	OK	Best choice	OK
fold8	Not valid	OK	Not valid	Best choice
uucode	Best choice	OK	Not valid	Not valid

If the sender uses the same code set as the receiver, there are two possibilities:

When protocol allows 8-bit data, the data can be sent without conversions.

When protocol allows only 7-bit data, the 8-bit code points must be mapped to 7-bit values. Use the iconv interface and one of the following methods:

List of Interchange Converters--uucode	Provides the same mapping as the uuencode and uudecode commands. This is the recommended method.
List of Interchange Converters--7-bit	Converts internal code sets using 7-bit data. This method passes ASCII without any change.

If the sender uses a code set different from the receiver, there are two possibilities:

When protocol allows only 7-bit data, use the fold7 method.

When protocol allows 8-bit data and you know the receiver's code set, use the iconv interface to convert the data. If you do not know the receiver's code set, use the following method:

List of Interchange Converters--8-bit

Converts internal code sets to standard interchange formats. The 8-bit data is transmitted and the information is preserved so that the receiver can reconstruct the data in its code set.

Using the iconv_open Subroutine

The following examples illustrate how to use the iconv_open subroutine in different situations:

Sender and receiver use the same code sets:
If the protocol allows 8-bit data, you can send data without converting it.
If the protocol allows only 7-bit data, do the following:
```
Sender:
 cd = iconv_open("uucode", nl_langinfo(CODESET));
 
 
Receiver:
 cd = iconv_open(nl_langinfo(CODESET), "uucode"); 
```

Sender and receiver use different code sets:

If the protocol allows 8-bit data and the receiver's code set is unknown, do the following:

Sender:
 cd = iconv_open("fold8", nl_langinfo(CODESET));
 
 
Receiver:
 cd = iconv_open(nl_langinfo(CODESET),"fold8" );

If the protocol allows only 7-bit data, do the following:

Sender:
 cd = iconv_open("fold7", nl_langinfo(CODESET));
 
 
Receiver:
 cd = iconv_open(nl_langinfo(CODESET), "fold7" );

How the iconv_open Subroutine Finds Converters

The iconv_open subroutine uses the LOCPATH environment variable to search for a converter whose name is in the form:

iconv/FromCodeSet_ToCodeSet

The FromCodeSet string represents the sender's code set, and the ToCodeSet string represents the receiver's code set. The underscore character separates the two strings.

Note: All setuid and setgid programs will ignore the LOCPATH environment variable.

Since the iconv converter is aloadable object module, a different object is required when running in the 64-bit environment. In the 64-bit environment, the iconv_open routine will use the LOCPATH environment variable to search for a converter whose name is in the form:

iconv/FromCodeSet_ToCodeSet__64.

The iconv library will automatically choose whether to load the standard converter object or the 64-bit converter object.

If the iconv_open subroutine does not find the converter, it uses the from,to pair to search for a file that defines a table-driven conversion. The file contains a conversion table created by the genxlt command.

The iconvTable converter uses the LOCPATH environment variable to search for a file whose name is in the form:

iconvTable/FromCodeSet_ToCodeSet

If the converter is found, it performs a load operation and is initialized. The converter descriptor, iconv_t, is returned in its initial state.

Converter Programs versus Tables

Converter programs are executable functions that convert data according to a set of rules. Converter tables are single-byte conversion tables that perform stateless conversions. Programs and tables are in separate directories:

/usr/lib/nls/loc/iconv	Converter programs
/usr/lib/nls/loc/iconvTable	Converter tables.

After a converter program is compiled and linked with the libiconv.a library, the program is placed in the /usr/lib/nls/loc/iconv directory.

To build a table converter, build a source converter table file. Use the genxlt command to compile translation tables into a format understood by the table converter. The output file is then placed in the /usr/lib/nls/loc/iconvTable directory.

Unicode and Universal Converters

Unicode (or UCS-2) conversion tables are found in:

$LOCPATH/uconvTable/*CodeSet*

The $LOCPATH/uconv/UCSTBL converter program is used to perform the conversion to and from UCS-2 using the iconv utilities. For the iconv utilities to use these uconvTable conversion tables, links must be set up within the $LOCPATH/iconv directory, for example, for code set "X."

ln -s /usr/lib/nls/loc/uconv/UCSTBL /usr/lib/nls/loc/iconv/X_UCS-2
ln -s /usr/lib/nls/loc/uconv/UCSTBL /usr/lib/nls/loc/iconv/UCS-2_X

A "Universal converter" program is provided that can be used to convert between any two code sets whose conversions to and from UCS-2 is defined. Given the following uconvTables:

X     -> UCS-2
UCS-2 -> Y

a universal conversion can be defined that maps

X -> UCS-2 -> Y

by use of the $LOCPATH/iconv/Universal_UCS_Conv. The conversion X->Y is set by defining links to the universal converter, for example:

ln-s /usr/lib/nls/loc/iconv/Universal_UCS_Conv /usr/lib/nls/loc/iconv/X_Y

Using Converters

The iconv interface is a set of subroutines used to open, perform, and close conversions:

Code Set Conversion Filter Example

The following example shows how you can use these subroutines to create a code set conversion filter that accepts the ToCode and FromCode parameters as input arguments:

#include <stdio.h>
#include <nl_types.h>
#include <iconv.h>
#include <string.h>
#include <errno.h>
#include <locale.h>
 
#define ICONV_DONE() (r>=0)
#define ICONV_INVAL() (r<0) && (errno==EILSEQ))
#define ICONV_OVER() (r<0) && (errno==E2BIG))
#define ICONV_TRUNC() (r<0) && (errno==EINVAL))
 
#define USAGE 1
#define ERROR 2
#define INCOMP 3
 
char ibuf[BUFSIZ], obuf[BUFSIZ];
 
extern int errno;
 
main (argc,argv)
int argc;
char **argv;
{

 size_t  ileft,oleft;
 nl_catd catd;
 iconv_t cd;
 int r;
 char *ip,*op;
 
 setlocale(LC_ALL,"");
 catd = catopen (argv[0],0);
 
 if(argc!=3){
  fprintf(stderr,
   catgets (catd,NL_SETD,USAGE,"usage;conv fromcode tocode\n"));
  exit(1);
 }
 
 cd=iconv_open(argv[2],argv[1]);
 
ileft=0;
 
while(!feof(stdin)) {

 /*
 * After the next operation,ibuf will
 * contain new data plus any truncated
 * data left from the previous read.
 */
 ileft+=fread(ibuf+ileft,1,BUFSIZ-ileft,stdin);
 do {
  ip=ibuf;
  op=obuf;
  oleft=BUFSIZ;
 
  r=iconv(cd,&ip,&ileft,&op,&oleft);
 
  if(ICONV_INVAL()){
   fprintf(stderr,
      catgets(catd,NL_SETD,ERROR,"invalid input\n"));
   exit(2);
 }
 
 fwrite(obuf,1,BUFSIZ-oleft,stdout);
 
 if(ICONV_TRUNC() || ICONV_OVER())
  /*
  *Data remaining in buffer-copy
  *it to the beginning
  */
 
  memcpy(ibuf,ip,ileft);
 
  /*
  *loop until all characters in the input
  *buffer have been converted.
  */
 } while(ICONV_OVER());
}
 
 if(ileft!=0){
  /*
  *This can only happen if the last call
  *to iconv() returned ICONV_TRUNC, meaning
  *the last data in the input stream was
  *incomplete.
  */
 fprintf(stderr,catgets(catd,NL_SETD,INCOMP,"input incomplete\n"));
 exit(3);
 }
 
 iconv_close(cd);
 exit(0);
}

Naming Converters

Code set names are in the form CodesetRegistry-CodesetEncoding where:

CodesetRegistry	Identifies the registration authority for the encoding. The CodesetRegistry must be made of characters from the portable code set (usually A-Z and 0-9).
CodesetEncoding	Identifies the coded character set defined by the registered authority.

The from,to variable used by the iconv command and iconv_open subroutine identifies a file whose name should be in the form /usr/lib/nls/loc/iconv/%f_%t or /usr/lib/nls/loc/iconvTable/%f_%t, where:

%f	Represents the FromCode set name.
%t	Represents the ToCode set name.

List of Converters

Converters change data from one code set to another. The sets of converters supported with the ICONV library are in the following sections. All converters shipped with the BOS Runtime Environment are located in the /usr/lib/nls/loc/iconv/* or /usr/lib/nls/loc/iconvTable/* directory.

These directories also contain private converters; that is, they are used by other converters. However, users and programs should only depend on the converters in the following lists.

Any converter shipped with the BOS Runtime Environment and not listed here should be considered private and subject to change or deletion. Converters supplied by other products can be placed in the /usr/lib/nls/loc/iconv/* or /usr/lib/nls/loc/iconvTable/* directory.

Programmers are encouraged to use registered code set names or code set names associated with an application. The X Consortium maintains a registry of code set names for reference. See the Code Set Overview for more information about code sets.

List of PC, ISO, and EBCDIC Code Set Converters

These converters provide conversion between PC, ISO, and EBCDIC single-byte stateless code sets. The following types of conversions are supported: PC to/from ISO, PC to/from EBCDIC, and ISO to/from EBCDIC.

Conversion is provided between compatible code sets such as Latin-1 to Latin-1 and Greek to Greek. However, conversion between different EBCDIC national code sets is not supported. For information about converting between incompatible character sets refer to the List of Interchange Converters--7-bit and the List of Interchange Converters--8-bit.

Conversion tables in the iconvTable directory are created by the genxlt command.

Compatible Code Set Names

The following table lists code set names that are compatible. Each line defines to/from strings that may be used when requesting a converter.

Note: The PC and ISO code sets are ASCII-based.

Code Set Compatibility
Character Set	Languages	PC	ISO	EBCDIC
Latin-1	U.S. English, Portuguese, Canadian French	IBM-850	ISO8859-1	IBM-037
Latin-1	Danish, Norwegian	IBM-850	ISO8859-1	IBM-277
Latin-1	Finnish, Swedish	IBM-850	ISO8859-1	IBM-278
Latin-1	Italian	IBM-850	ISO8859-1	IBM-280
Latin-1	Japanese	IBM-850	ISO8859-1	IBM-281
Latin-1	Spanish	IBM-850	ISO8859-1	IBM-284
Latin-1	U.K. English	IBM-850	ISO8859-1	IBM-285
Latin-1	German	IBM-850	ISO8859-1	IBM-273
Latin-1	French	IBM-850	ISO8859-1	IBM-297
Latin-1	Belgian, Swiss German	IBM-850	ISO8859-1	IBM-500
Latin-2	Croatian, Czechoslovakian, Hungarian, Polish, Romanian, Serbian Latin, Slovak, Slovene	IBM-852	ISO88859-2	IBM-870
Cyrillic	Bulgarian, Macedonian, Serbian Cyrillic, Russian	IBM-855	ISO8859-5	IBM-880 IBM-1025
Cyrillic	Russian	IBM-866	ISO8859-5	IBM-1025
Hebrew	Hebrew	IBM-856 IBM-862	ISO8859-8	IBM-424 IBM-803
Turkish	Turkish	IBM-857	ISO8859-9	IBM-1026
Arabic	Arabic	IBM-864 IBM-1046	ISO8859-6	IBM-420
Greek	Greek	IBM-869	ISO8859-7	IBM-875
Greek	Greek	IBM-869	ISO8859-7	IBM-875
Baltic	Lithuanian, Latvian, Estonian	IBM-921 IBM-922		IBM-1112 IBM-1122

Note: A character that exists in the source code set but does not exist in the target code set is converted to a converter-defined substitute character.

Files

The following table describes the inconvTable converters found in the /usr/lib/nls/loc/iconvTable directory:

iconvTable Converters
Converter Table	Description	Language
IBM-037_IBM-850	IBM-037 to IBM-850	U.S. English, Portuguese, Canadian-French
IBM-273_IBM-850	IBM-273 to IBM-850	German
IBM-277_IBM-850	IBM-277 to IBM-850	Danish, Norwegian
IBM-278_IBM-850	IBM-278 to IBM-850	Finnish, Swedish
IBM-280_IBM-850	IBM-280 to IBM-850	Italian
IBM-281_IBM-850	IBM-281 to IBM-850	Japanese-Latin
IBM-284_IBM-850	IBM-284 to IBM-850	Spanish
IBM-285_IBM-850	IBM-285 to IBM-850	U.K. English
IBM-297_IBM-850	IBM-297 to IBM-850	French
IBM-420_IBM_1046	IBM-420 to IBM-1046	Arabic
IBM-424_IBM-856	IBM-424 to IBM-856	Hebrew
IBM-424_IBM-862	IBM-424 to IBM-862	Hebrew
IBM-500_IBM-850	IBM-500 to IBM-850	Belgian, Swiss German
IBM-803_IBM-856	IBM-803 to IBM-856	Hebrew
IBM-803_IBM-862	IBM-803 to IBM-862	Hebrew
IBM-850_IBM-037	IBM-850 to IBM-037	U.S. English, Portuguese, Canadian-French
IBM-850_IBM-273	IBM-850 to IBM-273	German
IBM-850_IBM-277	IBM-850 to IBM-277	Danish, Norwegian
IBM-850_IBM-278	IBM-850 to IBM-278	Finnish, Swedish
IBM-850_IBM-280	IBM-850 to IBM-280	Italian
IBM-850_IBM-281	IBM-850 to IBM-281	Japanese-Latin
IBM-850_IBM-284	IBM-850 to IBM-284	Spanish
IBM-850_IBM-285	IBM-850 to IBM-285	U.K. English
IBM-850_IBM-297	IBM-850 to IBM-297	French
IBM-850_IBM-500	IBM-850 to IBM-500	Belgian, Swiss German
IBM-856_IBM-424	IBM-856 to IBM-424	Hebrew
IBM-856_IBM-803	IBM-856 to IBM-803	Hebrew
IBM-856_IBM-862	IBM-856 to IBM-862	Hebrew
IBM-862_IBM-424	IBM-862 to IBM-424	Hebrew
IBM-862_IBM-803	IBM-862 to IBM-803	Hebrew
IBM-862_IBM-856	IBM-862 to IBM-856	Hebrew
IBM-864_IBM-1046	IBM-864 to IBM-1046	Arabic
IBM-921_IBM-1112	IBM-921 to IBM-1112	Lithuanian, Latvian
IBM-922_IBM-1122	IBM-922 to IBM-1122	Estonian
IBM-1112_IBM-921	IBM-1121 to IBM-921	Lithuanian, Latvian
IBM-1122_IBM-922	IBM-1122 to IBM-922	Estonian
IBM-1046_IBM-420	IBM-1046 to IBM-420	Arabic
IBM-1046_IBM-864	IBM-1046 to IBM-864	Arabic
IBM-037_ISO8859-1	IBM-037 to ISO8859-1	U.S. English, Portuguese, Canadian French
IBM-273_ISO8859-1	IBM-273 to ISO8859-1	German
IBM-277_ISO8859-1	IBM-277 to ISO8859-1	Danish, Norwegian
IBM-278_ISO8859-1	IBM-278 to ISO8859-1	Finnish, Swedish
IBM-280_ISO8859-1	IBM-280 to ISO8859-1	Italian
IBM-281_ISO8859-1	IBM-281 to ISO8859-1	Japanese-Latin
IBM-284_ISO8859-1	IBM-284 to ISO8859-1	Spanish
IBM-285_ISO8859-1	IBM-285 to ISO8859-1	U.K. English
IBM-297_ISO8859-1	IBM-297 to ISO8859-1	French
IBM-420_ISO8859-6	IBM-420 to ISO8859-6	Arabic
IBM-424_ISO8859-8	IBM-424 to ISO8859-8	Hebrew
IBM-500_ISO8859-1	IBM-500 to ISO8859-1	Belgian, Swiss German
IBM-803_ISO8859-8	IBM-803 to ISO8859-8	Hebrew
IBM-852_ISO8859-2	IBM-852 to ISO8859-2	Croatian, Czechoslovakian, Hungarian, Polish, Romanian, Serbian Latin, Slovak, Slovene
IBM-855_ISO8859-5	IBM-855 to ISO8859-5	Bulgarian, Macedonian, Serbian Cyrillic, Russian
IBM-866_ISO8859-5	IBM-866 to ISO8859-5	Russian
IBM-869_ISO8859-7	IBM-869 to ISO8859-7	Greek
IBM-875_ISO8859-7	IBM-875 to ISO8859-7	Greek
IBM-870_ISO8859-2	IBM-870 to ISO8859-2	Croatian, Czechoslovakian, Hungarian, Polish, Romanian, Serbian, Slovak, Slovene
IBM-880_ISO8859-5	IBM-880 to ISO8859-5	Bulgarian, Macedonian, Serbian Cyrillic, Russian
IBM-1025_ISO8859-5	IBM-1025 to ISO8859-5	Bulgarian, Macedonian, Serbian Cyrillic, Russian
IBM-857_ISO8859-9	IBM-857 to ISO8859-9	Turkish
IBM-1026_ISO8859-9	IBM-1026 to ISO8859-9	Turkish
IBM-850_ISO8859-1	IBM-850 to ISO8859-1	Latin
IBM-856_ISO8859-8	IBM-856 to ISO8859-8	Hebrew
IBM-862_ISO8859-8	IBM-862 to ISO8859-8	Hebrew
IBM-864_ISO8859-6	IBM-864 to ISO8859-6	Arabic
IBM-1046_ISO8859-6	IBM-1046 to ISO8859-6	Arabic
ISO8859-1_IBM-850	ISO8859-1 to IBM-850	Latin
ISO8859-6_IBM-864	ISO8859-6 to IBM-864	Arabic
ISO8859-6_IBM-1046	ISO8859-6 to IBM-1046	Arabic
ISO8859-8_IBM-856	ISO8859-8 to IBM-856	Hebrew
ISO8859-8_IBM-862	ISO8859-8 to IBM-862	Hebrew
ISO8859-1_IBM-037	ISO8859-1 to IBM-037	U.S. English, Portuguese, Canadian French
ISO8859-1_IBM-273	ISO8859-1 to IBM-273	German
ISO8859-1_IBM-277	ISO8859-1 to IBM-277	Danish, Norwegian
ISO8859-1_IBM-278	ISO8859-1 to IBM-278	Finnish, Swedish
ISO8859-1_IBM-280	ISO8859-1 to IBM-280	Italian
ISO8859-1_IBM-281	ISO8859-1 to IBM-281	Japanese-Latin
ISO8859-1_IBM-284	ISO8859-1 to IBM-284	Spanish
ISO8859-1_IBM-285	ISO8859-1 to IBM-285	U.K. English
ISO8859-1_IBM-297	ISO8859-1 to IBM-297	French
ISO8859-1_IBM-500	ISO8859-1 to IBM-500	Belgian, Swiss German
ISO8859-2_IBM-852	ISO8859-2 to IBM-852	Croatian, Czechoslovakian, Hungarian, Polish, Romanian, Serbian Latin, Slovak, Slovene
ISO8859-2_IBM-870	ISO8859-2 to IBM-870	Croatian, Czechoslovakian, Hungarian, Polish, Romanian, Serbian Latin, Slovak, Slovene
ISO8859-5_IBM-855	ISO8859-5 to IBM-855	Bulgarian, Macedonian, Serbian Cyrillic, Russian
ISO8859-5_IBM-880	ISO8859-5 to IBM-880	Bulgarian, Macedonian, Serbian Cyrillic, Russian
ISO8859-5_IBM-1025	ISO8859-5 to IBM-1025	Bulgarian, Macedonian, Serbian Cyrillic, Russian
ISO8859-6_IBM-420	ISO8859-6 to IBM-420	Arabic
ISO8859-5_IBM-866	ISO8859-5 to IBM-866	Russian
ISO8859-7_IBM-869	ISO8859-7 to IBM-869	Greek
ISO8859-7_IBM-875	ISO8859-7 to IBM-875	Greek
ISO8859-8_IBM-424	ISO8859-8 to IBM-424	Hebrew
ISO8859-8_IBM-803	ISO8859-8 to IBM-803	Hebrew
ISO8859-9_IBM-857	ISO8859-9 to IBM-857	Turkish
ISO8859-9_IBM-1026	ISO8859-9 to IBM-1026	Turkish

List of Multibyte Code Set Converters

Multibyte code-set converters convert characters among the following code-sets:

PC multibyte code sets
EUC multibyte code sets (ISO-based)
EBCDIC multibyte code sets

The following table lists code set names that are compatible. Each line defines to/from strings that may be used when requesting a converter.

Code Set Compatibility
Language	PC	ISO	EBCDIC
Japanese	IBM-932	IBM-eucJP	IBM-930, IBM-939
Japanese (MS compatible)	IBM-943	IBM-eucJP	IBM-930, IBM-939
Korean	IBM-934	IBM-eucKR	IBM-933
Traditional Chinese	IBM-938, big-5	IBM-eucTW	IBM-937
Simplified Chinese	IBM-1381	IBM-eucCN	IBM-935

Conversions between Simplified and Traditional Chinese are provided (IBM-eucTW <--> IBM-eucCN and big5 <--> IBM-eucCN).
UTF-8 is an additional code set. See List of UTF-8 Interchange Converters for more information.

Files

The following list describes the Multibyte Code Set converters that are found in the /usr/lib/nls/loc/iconv directory.

Converter	Description
IBM-eucJP_IBM-932	IBM-eucJP to IBM-932
IBM-eucJP_IBM-943	IBM-eucJP to IBM-943
IBM-eucJP_IBM-930	IBM-eucJP to IBM-930
IBM-eucCN_IBM-936(PC5550)	IBM-eucCN to IBM-936(PC5550)
IBM-eucCN_IBM-935	IBM-eucCN to IBM-935
IBM-eucJP_IBM-939	IBM-eucJP to IBM-939
IBM-eucCN_IBM-1381	IBM-eucCN to IBM-1381
IBM-943_IBM-932	IBM-943 to IBM-932
IBM-932_IBM-943	IBM-932 to IBM-943
IBM-930_IBM-932	IBM-930 to IBM-932
IBM-930_IBM-943	IBM-930 to IBM-943
IBM-930_IBM-eucJP	IBM-930 to IBM-eucJP
IBM-932_IBM-eucJP	IBM-932 to IBM-eucJP
IBM-932_IBM-930	IBM-932 to IBM-930
IBM-943_IBM-eucJP	IBM-943 to IBM-eucJP
IBM-943_IBM-930	IBM-943 to IBM-930
IBM-936(PC5550)_IBM-935	IBM-936(PC5550) to IBM-935
IBM-936_IBM-935	IBM-936 to IBM-935
IBM-932_IBM-939	IBM-932 to IBM-939
IBM-939_IBM-932	IBM-939 to IBM-932
IBM-943_IBM-939	IBM-943 to IBM-939
IBM-939_IBM-943	IBM-939 to IBM-943
IBM-935_IBM-936(PC5550)	IBM-935 to IBM-936(PC5550)
IBM-935_IBM-936	IBM-935 to IBM-936
IBM-1381_IBM-935	IBM-1381 to IBM-935
IBM-935_IBM-1381	IBM-935 to IBM-1381
IBM-935_IBM-eucCN	IBM-935 to IBM-eucCN
IBM-936(PC5550)_IBM-eucCN	IBM-936(PC5550) to IBM-eucCN
IBM-eucTW_IBM-eucCN	IBM-eucTW to IBM-eucCN
big5_IBM-eucCN	big5 to IBM-eucCN
IBM-1381_IBM-eucCN	IBM-1381 to IBM-eucCN
IBM-939_IBM-eucJP	IBM-939 to IBM-eucJP
IBM-eucKR_IBM-934	IBM-eucKR to IBM-934
IBM-934_IBM-eucKR	IBM-934 to IBM-eucKR
IBM-eucKR_IBM-933	IBM-eucKR to IBM-933
IBM-933_IBM-eucKR	IBM-933 to IBM-eucKR
IBM-eucTW_IBM-937	IBM-eucTW to IBM-937
IBM-938_IBM-937	IBM-938 to IBM-937
big-5_IBM-937	big-5 to IBM-937
IBM-eucCN_IBM-eucTW	IBM-eucCN to IBM-eucTW
IBM-937_IBM-eucTW	IBM-937 to IBM-eucTW
IBM-937_IBM-938	IBM-937 to IBM-938
IBM-eucTW_IBM-938	IBM_eucTW to IBM_938
IBM-eucCN_big5	IBM-eucCN to big5
IBM-eucTW_big-5	IBM_eucTW to big-5
IBM-937_big-5	IBM-937 to big-5
CNS11643.1992-3_IBM-eucTW	CNS11643.1992-3 to IBM_eucTW
CNS11643.1992-3-GL_IBM-eucTW	CNS11643.1992-3-GL to IBM_eucTW
CNS11643.1992-3-GR_IBM-eucTW	CNS11643.1992-3-GR to IBM_eucTW
CNS11643.1992-4_IBM-eucTW	CNS11643.1992-4 to IBM_eucTW
CNS11643.1992-4-GL_IBM-eucTW	CNS11643.1992-4-GL to IBM_eucTW
CNS11643.1992-4-GR_IBM-eucTW	CNS11643.1992-4-GR to IBM_eucTW
IBM-eucTW_CNS11643.1992-3	IBM_eucTW to CNS11643.1992-3
IBM-eucTW_CNS11643.1992-3-GL	IBM_eucTW to CNS11643.1992-3-GL
IBM-eucTW_CNS11643.1992-3-GR	IBM_eucTW to CNS11643.1992-3-GR
IBM-eucTW_CNS11643.1992-4	IBM_eucTW to CNS11643.1992-4
IBM-eucTW_CNS11643.1992-4-GL	IBM_eucTW to CNS11643.1992-4-GL
IBM-eucTW_CNS11643.1992-4-GR	IBM_eucTW to CNS11643.1992-4-GR
IBM-eucCN_GB2312.1980-1	IBM-eucCN to GB2312.1980-1
IBM-eucCN_GB2312.1980-1-GL	IBM-eucCN to GB2312.1980-1-GL
IBM-eucCN_GB2312.1980-1-GR	IBM-eucCN to GB2312.1980-1-GR
IBM-937_csic	IBM-937 to csic
csic_IBM-937	csic to IBM-937
IBM-938_csic	IBM-938 to csic
csic_IBM-938	csic to IBM-938
IBM-eucTW_ccdc	IBM-eucTW to ccdc
ccdc_IBM-eucTW	ccdc to IBM-eucTW
IBM-eucTW_cns	IBM-eucTW to cns
cns_IBM-eucTW	cnd to IBM-eucTW
IBM-eucTW_csic	IBM-eucTW to csic
csic_IBM-eucTW	csic to IBM-eucTW
IBM-eucTW_sops	IBM-ecuTW to sops
sops_IBM-eucTW	sops to IBM-eucTW
IBM-eucTW_tca	IBM-eucTW to tca
tca_IBM-eucTW	tca to IBM-eucTW
big5_cns	big5 to cns
cns_big5	cns to big5
big5_csic	big5 to csic
csic_big5	csic to big5
big5_ttc	big5 to ttc
ttc_big5	ttc to big5
big5_ttcmin	big5 to ttcmin
ttcmin_big5	ttcmin to big5
big5_unicode	big5 to unicode
unicode_big5	unicode to big5
big5_wang	big5 to wang
wang_big5	wang to big5
ccdc_csic	ccdc to csic
csic_ccdc	csic to_ccdc
csic_sops	csic to sops
sops_csic	sops to csic
CNS11643.1986-1_big5	CNS11643.1986-1 to big5
big5_CNS11643.1986-1	big5 to CNS11643.1986-1
CNS11643.1986-1-GR_big5	CNS11643.1986-1-GR to big5
big5_CNS11643.1986-1-GR	big5 to CNS11643.1986-1-GR
CNS11643.1986-2_big5	CNS11643.1986-2 to big5
big5_CNS11643.1986-2	big5 to CNS11643.1986-2
CNS11643.1986-2-GR_big5	CNS11643.1986-2-GR to big5
big5_CNS11643.1986-2-GR	big5 to CNS11643.1986-2-GR
CNS11643.CT-GR_big5	CNS11643.CT-GR to big5
big5_CNS11643.CT-GR	big5 to CNS11643.CT-GR
IBM-sbdTW-GR_big5	IBM-sbdTW-GR to big5
big5_IBM-sbdTW-GR	big5 to IBM-sbdTW-GR
IBM-sbdTW.CT-GR_big5	IBM-sbdTW.CT-GR to big5
big5_IBM-sbdTW.CT-GR	big5 to IBM-sbdTW.CT-GR
IBM-sbdTW_big5	IBM-sbdTW to big5
big5_IBM-sbdTW	big5 to IBM-sbdTW
IBM-udcTW-GR_big5	IBM-udcTW-GR to big5
big5_IBM-udcTW-GR	big5 to IBM-udcTW-GR
IBM-udcTW.CT-GR_big5	IBM-udcTW.CT-GR to big5
big5_IBM-udcTW.CT-GR	big5 to IBM-udcTW.CT-GR
ISO8859-1_big5	ISO8859 to big5
big5_ISO8859-1	big5 to ISO8859
IBM-sbdTW_big5	IBM-sbdTW to big5
big5_IBM-sbdTW	big5 to IBM-sbdTW
big5_ASCII-GR	big5 to ASCII-GR
ASCII-GR_big5	ASCII-GR to big5
GBK_big5	GBK to big5
big5_GBK	big5 to GBK
GBK_IBM-eucTW	GBK to IBM-eucTW
IBM-eucTW_GBK	IBM-eucTW to GBK
CNS11643.1986-1_GBK	CNS11643.1986-1 to GBK
GBK_CNS11643.1986-1	GBK to CNS11643.1986-1
CNS11643.1986-2_GBK	CNS11643.1986-2 to GBK
GBK_CNS11643.1986-2	GBK to CNS11643.1986-2
CNS11643.1986-1-GR_GBK	CNS11643.1986-1-GR to GBK
GBK_CNS11643.1986-1-GR	GBK to CNS11643.1986-1-GR
CNS11643.1986-2-GR_GBK	CNS11643.1986-2-GR to GBK
GBK_CNS11643.1986-2-GR	GBK to CNS11643.1986-2-GR
CNS11643.1986-1-GL_GBK	CNS11643.1986-1-GL to GBK
GBK_CNS11643.1986-1-GL	GBK to CNS11643.1986-1-GL
CNS11643.1986-2-GL_GBK	CNS11643.1986-2-GL to GBK
GBK_CNS11643.1986-2-GL	GBK to CNS11643.1986-2-GL
CNS11643.CT-GR_GBK	CNS11643.CT-GR to GBK
GBK_CNS11643.CT-GR	GBK to CNS11643.CT-GR
GB2312.1980.CT-GR_GBK	GB2312.1980.CT-GR to GBK
GBK_GB2312.1980.CT-GR	GBK to GB2312.1980.CT-GR
GB2312.1980-0_GBK	GBK2312.1980-0 to GBK
GBK_GB2312.1980-0	GBK to GBK2312.1980-0
GB2312.1980-0-GR_GBK	GB2312.1980-0-GR to GBK
GBK_GB2312.1980-0-GR	GBK to GB2312.1980-0-GR
GB2312.1980-0-GL_GBK	GB2312.1980-0-GL to GBK
GBK_GB2312.1980-0-GL	GBK to GB2312.1980-0-GL
ASCII-GR_GBK	ASCII-GR to GBK
GBK_ASCII-GR	GBK to ASCII-GR
ISO8859-1_GBK	ISO8859-1 to GBK
GBK_ISO8859-1	GBK to ISO8859-1
IBM-eucCN_GBK	IBM-eucCN to GBK
GBK_IBM-eucCN	GBK to IBM-eucCN

List of Interchange Converters--7-bit

This converter provides conversion between internal code and 7-bit standard interchange formats (fold7). The fold7 name identifies encodings that can be used to pass text data through 7-bit mail protocols. The encodings are based on ISO2022. For more information about fold7 see Understanding libiconv.

The fold7 converters convert characters from a code set to a canonical 7-bit encoding that identifies each character. This type of conversion is useful in networks where clients communicate with different code sets but use the same character sets. For example:

IBM-850 <--> ISO8859-1	Common Latin characters
IBM-932 <-->IBM-eucJP	Common Japanese characters

The following escape sequences designate standard code sets:

Escape Sequence	Standard Code Set
01/11 02/04 04/00	GL JIS X0208.1978-0.
01/11 02/04 02/08 04/01	GL left half of GB2312.1980-0.
01/11 02/08 04/02	GL 7-bit ASCII or left half of ISO8859-1.
01/11 02/14 04/01	GL right half of ISO8859-1.
01/11 02/14 04/02	GL right half of ISO8859-2.
01/11 02/14 04/03	GL right half of ISO8859-3.
01/11 02/14 04/04	GL right half of ISO8859-4.
01/11 02/14 04/06	GL right half of ISO8859-7.
01/11 02/14 04/07	GL right half of ISO8859-6.
01/11 02/14 04/08	GL right half of ISO8859-8.
01/11 02/14 04/12	GL right half of ISO8859-5.
01/11 02/14 04/13	GL right half of ISO8859-9.
01/11 02/08 04/09	GL right half of JIS X0201.1976-0.
01/11 02/08 04/10	GL left half of JIS X0201.1976.
01/11 02/04 04/02	GL JIS X0208.1983-0.
01/11 02/04 02/08 04/02	GL JIS X0208.1983-0.
01/11 02/04 02/08 04/00	GL JISX0208.1978-0.
01/11 02/05 02/15 03/01 M L 06/09 06/02 06/13 02/13 03/08 03/05 03/00 00/02	GL right half of IBM-850 unique characters. Characters common to ISO8859-1 do not use this escape sequence.
01/11 02/05 02/15 03/02 M L 06/09 06/02 06/13 02/13 07/05 06/04 06/03 04/10 05/00 00/02	GL Japanese) IBM-udcJP) user-definable characters.
01/11 02/04 02/08 04/03	GL KSC5601-1987.
01/11 02/04 02/09 03/00	GL CNS11643-1986-1.
01/11 02/04 02/10 03/01	GL CNS11643-1986-2.
01/11 02/05 02/15 03/00 M L 05/05 05/04 04/06 02/13 03/07 00/02	UCS-2 encoded as base64; used only for those characters not encoded by any of the other 7-bit escape sequences listed above.

When converting from a code set to fold7, the escape sequence used to designate the code set is chosen according to the order listed. For example, the JISX0208.1983-0 characters use 01/11 01/04 04/02 as the designation.

Files

The following list describes the fold7 converters that are found in the /usr/lib/nls/loc/iconv directory:

Converter	Description
fold7_IBM-850	Interchange format to IBM-850
fold7_IBM-921	Interchange format to IBM-921
fold7_IBM-922	Interchange format to IBM-922
fold7_IBM-932	Interchange format to IBM-932
fold7_IBM-943	Interchange format to IBM-943
fold7_IBM_1124	Interchange format to IBM-1124
fold7_IBM_1129	Interchange format to IBM-1129
fold7_IBM_eucCN	Interchange format to IBM-eucCN
fold7_IBM-eucJP	Interchange format to IBM-eucJP
fold7_IBM-eucKR	Interchange format to IBM-eucKR
fold7_IBM-eucTW	Interchange format to IBM-eucTW
fold7_ISO8859-1	Interchange format to ISO8859-1
fold7_ISO8859-2	Interchange format to ISO8859-2
fold7_ISO8859-3	Interchange format to ISO8859-3
fold7_ISO8859-4	Interchange format to ISO8859-4
fold7_ISO8859-5	Interchange format to ISO8859-5
fold7_ISO8859-6	Interchange format to ISO8859-6
fold7_ISO8859-7	Interchange format to ISO8859-7
fold7_ISO8859-8	Interchange format to ISO8859-8
fold7_ISO8859-9	Interchange format to ISO8859-9
fold7_TIS-620	Interchange format to TIS-620
fold7_UTF-8	Interchange format to UTF-8
fold7_big5	Interchange format to big5
fold7_GBK	Interchange format to GBK
IBM-921_fold7	IBM-921 to interchange format
IBM-922_fold7	IBM-922 to interchange format
IBM-850_fold7	IBM-850 to interchange format
IBM-932_fold7	IBM-932 to interchange format
IBM-943_fold7	IBM-943 to interchange format
IBM-1124_fold7	IBM-1124 to interchange format
IBM-1129_fold7	IBM-1129 to interchange format
IBM-eucCN_fold7	IBM-eucCN to interchange format
IBM-eucJP_fold7	IBM-eucJP to interchange format
IBM-eucKR_fold7	IBM-eucKR to interchange format
IBM-eucTW_fold7	IBM-eucTW to interchange format
ISO8859-1_fold7	ISO8859-1 to interchange format
ISO8859-2_fold7	ISO8859-2 to interchange format
ISO8859-3_fold7	ISO8859-3 to interchange format
ISO8859-4_fold7	ISO8859-4 to interchange format
ISO8859-5_fold7	ISO8859-5 to interchange format
ISO8859-6_fold7	ISO8859-6 to interchange format
ISO8859-7_fold7	ISO8859-7 to interchange format
ISO8859-8_fold7	ISO8859-8 to interchange format
ISO8859-9_fold7	ISO8859-9 to interchange format
TIS-620_fold7	TIS-620 to interchange format
UTF-8_fold7	UTF-8 to interchange format
big5_fold7	big5 to interchange format
GBK_fold7	GBK to interchange format

List of Interchange Converters--8-bit

This converter provides conversions between internal code and 8-bit standard interchange formats (fold8). The fold8 name identifies encodings that can be used to pass text data through 8-bit mail protocols. The encodings are based on ISO2022. For more information about fold8 see Understanding libiconv.

The fold8 converters convert characters from a specific code set encoding to a canonical 8-bit encoding that identifies each character. This type of conversion is useful in networks where clients communicate with different code sets but use the same character sets. For example:

IBM-850 <--> ISO8859-1	Common Latin characters
IBM-932 <-->IBM-eucJP	Common Japanese characters

The following escape sequences designate standard code sets.

Escape Sequence	Standard Code Set
01/11 02/04 02/09 04/01	GR right half of GB2312.1980-0.
01/11 02/13 04/01	GR right half of ISO8859-1.
01/11 02/13 04/02	GR right half of ISO8859-2.
01/11 02/13 04/03	GR right half of ISO8859-3.
01/11 02/13 04/04	GR right half of ISO8859-4.
01/11 02/13 04/06	GR right half of ISO8859-7.
01/11 02/13 04/07	GR right half of ISO8859-6.
01/11 02/13 04/08	GR right half of ISO8859-8.
01/11 02/13 04/13	GR right half of ISO8859-5.
01/11 02/13 04/13	GR right half of ISO8859-9.
01/11 02/09 04/09	GR right half of JIS X0201.1976-1.
01/11 02/04 02/09 04/02	GR JIS X0208.1983-1.
01/11 02/04 02/09 04/00	GR JISX0208.1978-1.
01/11 02/09 04/02	GR 7-bit ASCII or left half of ISO8859-1.
01/11 02/05 02/15 03/01 M L 04/09 04/02 04/13 02/13 03/08 03/05 03/00 00/02	GR right half of IBM-850 unique characters. Characters common to ISO8859-1 should not use this escape sequence.
01/11 02/05 02/15 03/02 M L 04/09 04/02 04/13 02/13 07/05 06/04 06/03 04/10 05/00 00/02	GR right half of Japanese user-definable characters.
01/11 02/08 04/02	GL 7-bit ASCII or left half of ISO8859-1.
01/11 02/14 04/01	GL right half of ISO8859-1.
01/11 02/14 04/02	GL right half of ISO8859-2.
01/11 02/14 04/03	GL right half of ISO8859-3.
01/11 02/14 04/04	GL right half of ISO8859-4.
01/11 02/14 04/06	GL right half of ISO8859-7.
01/11 02/14 04/07	GL right half of ISO8859-6.
01/11 02/14 04/08	GL right half of ISO8859-8.
01/11 02/14 04/12	GL right half of ISO8859-5.
01/11 02/14 04/13	GL right half of ISO8859-9.
01/11 02/08 04/09	GL right half of JIS X0201.1976-0.
01/11 02/08 04/10	GL left half of JIS X0201.1976.
01/11 02/04 02/08 04/02	GL JIS X0208.1983-0.
01/11 02/04 04/02	GL JIS X0208.1983-0.
01/11 02/04 04/00	GL JIS X0208.1978-0.
01/11 02/05 02/15 03/01 M L 06/09 06/02 06/13 02/13 03/08 03/05 03/00 00/02	GL right half of IBM-850 unique characters. Characters common to ISO8859-1 do not use this escape sequence.
01/11 02/05 02/15 03/02 M L 06/09 06/02 06/13 02/13 07/05 06/04 06/03 04/10 05/00 00/02	GL Japanese (IBM-udcJP) user-definable characters.
01/11 02/04 02/09 04/03	GR KSC5601-1987.
01/11 02/04 02/09 03/00	GR CNS11643-1986-1.
01/11 02/04 02/10 03/01	GR CNS11643-1986-2.
01/11 02/05 02/15 03/02 M L 04/09 04/02 04/13 02/13 07/05 06/04 06/03 05/05 05/08 00/02	GR right half of Traditional Chinese user-definable characters.
01/11 02/05 02/15 03/02 M L 04/09 04/02 04/13 02/13 07/03 06/02 06/04 05/05 05/08 00/02	GR right half of IBM-850 unique symbols.
01/11 02/04 02/08 04/03	GL KSC5601-1987.
01/11 02/05 02/15 03/02 M L 06/09 06/02 06/13 02/13 07/05 06/04 06/03 05/05 05/08 00/02	GL Traditional Chinese (IBM-udcTW) user-definable characters.
01/11 02/05 02/15 03/02 M L 06/09 06/02 06/13 02/13 07/03 06/02 06/04 05/05 05/08 00/02	GL Traditional Chinese IBM-850 unique symbols (IBM-shdTW) user-definable characters.
01/11 02/05 02/15 03/00 M L 05/05 05/04 04/06 02/13 03/08 00/02	UCS-2 encoded as UTF-8; used only for those characters not encoded by any of the above escape sequences listed above.

When converting from a code set to fold8, the escape sequence used to designate the code set is chosen according to the order listed. For example, the JISX0208.1983-0 characters use 01/11 02/04 02/08 04/02 as the designation.

Files

The following list describes the fold8 converters found in the /usr/lib/nls/loc/iconv directory:

Converter	Description
fold8_IBM-850	Interchange format to IBM-850
fold8_IBM-921	Interchange format to IBM-921
fold8_IBM-922	Interchange format to IBM-922
fold8_IBM-932	Interchange format to IBM-932
fold8_IBM-943	Interchange format to IBM-943
fold8_IBM-1124	Interchange format to IBM-1124
fold8_IBM-1129	Interchange format to IBM-1129
fold8_IBM-eucCN	Interchange format to IBM-eucCN
fold8_IBM-eucJP	Interchange format to IBM-eucJP
fold8_IBM-eucKR	Interchange format to IBM-eucKR
fold8_IBM-eucTW	Interchange format to IBM-eucTW
fold8_IBM-eucCN	Interchange fromat to IBM-eucCN
fold8_ISO8859-1	Interchange format to ISO8859-1
fold8_ISO8859-2	Interchange format to ISO8859-2
fold8_ISO8859-3	Interchange format to ISO8859-3
fold8_ISO8859-4	Interchange format to ISO8859-4
fold8_ISO8859-5	Interchange format to ISO8859-5
fold8_ISO8859-6	Interchange format to ISO8859-6
fold8_ISO8859-7	Interchange format to ISO8859-7
fold8_ISO8859-8	Interchange format to ISO8859-8
fold8_ISO8859-9	Interchange format to ISO8859-9
fold8_TIS-620	Interchange format to TIS-620
fold8_UTF-8	Interchange format to UTF-8
fold8_big5	Interchange format to big5
fold8_GBK	Interchange format to GBK
IBM-921_fold8	IBM-921 to interchange format
IBM-922_fold8	IBM-922 to interchange format
IBM-850_fold8	IBM-850 to interchange format
IBM-932_fold8	IBM-932 to interchange format
IBM-943_fold8	IBM-943 to interchange format
IBM-1124_fold8	IBM-1124 to interchange format
IBM-1129_fold8	IBM-1129 to interchange format
IBM-eucCN_fold8	IBM-eucCN to interchange format
IBM-eucJP_fold8	IBM-eucJP to interchange format
IBM-eucKR_fold8	IBM-eucKR to interchange format
IBM-eucTW_fold8	IBM-eucTW to interchange format
IBM-eucCN_fold8	IBM-eucCN to interchange format
ISO8859-1_fold8	ISO8859-1 to interchange format
ISO8859-2_fold8	ISO8859-2 to interchange format
ISO8859-3_fold8	ISO8859-3 to interchange format
ISO8859-4_fold8	ISO8859-4 to interchange format
ISO8859-5_fold8	ISO8859-5 to interchange format
ISO8859-6_fold8	ISO8859-6 to interchange format
ISO8859-7_fold8	ISO8859-7 to interchange format
ISO8859-8_fold8	ISO8859-8 to interchange format
ISO8859-9_fold8	ISO8859-9 to interchange format
TIS-620_fold8	TIS-620 to interchange format
UTF-8_fold8	UTF-8 to interchange format
big5_fold8	big5 to interchange format
GBK_fold8	GBK to interchange format

List of Interchange Converters--Compound Text

Compound text interchange converters convert between compound text and internal code sets.

Compound text is an interchange encoding defined by the X Consortium. It is used to communicate text between X clients. Compound text is based on ISO2022 and can encode most character sets using standard escape sequences. It also provides extensions for encoding private character sets. The supported code sets provide a converter to and from compound text. The name used to identify the compound text encoding is ct.

The following escape sequences are used to designate standard code sets in the order listed below.

01/11 02/05 02/15 03/01 M L 04/09 04/02 04/13 02/13 03/08 03/05 03/00 00/02
	GR right half of IBM-850 unique characters. Characters common to ISO8859-1 should not use this escape sequence.
01/11 02/05 02/15 03/02 M L 04/09 04/02 04/13 02/13 07/05 06/04 06/03 04/10 05/00 00/02
	GR right half of Japanese user-definable characters.
01/11 02/05 02/15 03/01 M L 06/09 06/02 06/13 02/13 03/08 03/05 03/00 00/02
	GL right half of IBM-850 unique characters. Characters common to ISO8859-1 do not use this escape sequence.
01/11 02/05 02/15 03/02 M L 06/09 06/02 06/13 02/13 07/05 06/04 06/03 04/10 05/00 00/02
	GL Japanese (IBM-udcJP) user-definable characters.

Files

The following list describes the compound text converters that are found in the /usr/lib/nls/loc/iconv directory:

Converter	Description
ct_IBM-850	Interchange format to IBM-850
ct_IBM-921	Interchange format to IBM-921
ct_IBM-922	Interchange format to IBM-922
ct_IBM-932	Interchange format to IBM-932
ct_IBM-943	Interchange format to IBM-943
ct_IBM-1124	Interchange format to IBM-1124
ct_IBM-1129	Interchange format to IBM-1129
ct_IBM-eucCN	Interchange format to IBM-eucCN
ct_IBM-eucJP	Interchange format to IBM-eucJP
ct_IBM-eucKR	Interchange format to IBM-eucKR
ct_IBM-eucTW	Interchange format to IBM-eucTW
ct_ISO8859-1	Interchange format to ISO8859-1
ct_ISO8859-2	Interchange format to ISO8859-2
ct_ISO8859-3	Interchange format to ISO8859-3
ct_ISO8859-4	Interchange format to ISO8859-4
ct_ISO8859-5	Interchange format to ISO8859-5
ct_ISO8859-6	Interchange format to ISO8859-6
ct_ISO8859-7	Interchange format to ISO8859-7
ct_ISO8859-8	Interchange format to ISO8859-8
ct_ISO8859-9	Interchange format to ISO8859-9
ct_TIS-620	Interchange format to TIS-620
ct_big5	Interchange format to big5
ct_GBK	Interchange format to GBK
IBM-850_ct	IBM-850 to interchange format
IBM-921_ct	IBM-921 to interchange format
IBM-922_ct	IBM-922 to interchange format
IBM-932_ct	IBM-932 to interchange format
IBM-943_ct	IBM-943 to interchange format
IBM-1124_ct	IBM-1124 to interchange format
IBM-1129_ct	IBM-1129 to interchange format
IBM-eucCN_ct	IBM-eucCN to interchange format
IBM-eucJP_ct	IBM-eucJP to interchange format
IBM-eucKR_ct	IBM-eucKR to interchange format
IBM-eucTW_ct	IBM-eucTW to interchange format
ISO8859-1_ct	ISO8859-1 to interchange format
ISO8859-2_ct	ISO8859-2 to interchange format
ISO8859-3_ct	ISO8859-3 to interchange format
ISO8859-4_ct	ISO8859-4 to interchange format
ISO8859-5_ct	ISO8859-5 to interchange format
ISO8859-6_ct	ISO8859-6 to interchange format
ISO8859-7_ct	ISO8859-7 to interchange format
ISO8859-8_ct	ISO8859-8 to interchange format
ISO8859-9_ct	ISO8859-9 to interchange format
TIS-620_ct	TIS-620 to interchange format
big5_ct	big5 to interchange format
GBK_ct	GBK to interchange format

List of Interchange Converters--uucode

This converter provides the same mapping as the uuencode and uudecode Command.

During conversion from uucode, 62 bytes at a time (including a new-line character trailing the record) are converted, and generating 45 bytes in outbuf.

Files

The following list describes the uucode converters found in the /usr/lib/nls/loc/iconv directory:

Converter	Description
IBM-850_uucode	IBM-850 to uucode
IBM-921_uucode	IBM-921 to uucode
IBM-922_uucode	IBM-922 to uucode
IBM-932_uucode	IBM-932 to uucode
IBM-943_uucode	IBM-943 to uucode
IBM-1124_uucode	IBM-1124 to uucode
IBM-1129_uucode	IBM-1129 to uucode
IBM-eucJP_uucode	IBM-eucJP to uucode
IBM-eucKR_uucode	IBM-eucKR to uucode
IBM-eucTW_uucode	IBM-eucTW to uucode
IBM-eucCN_uucode	IBM-eucCN to uucode
ISO8859-1_uucode	ISO8859-1 to uucode
ISO8859-2_uucode	ISO8859-2 to uucode
ISO8859-3_uucode	ISO8859-3 to uucode
ISO8859-4_uucode	ISO8859-4 to uucode
ISO8859-5_uucode	ISO8859-5 to uucode
ISO8859-6_uucode	ISO8859-6 to uucode
ISO8859-7_uucode	ISO8859-7 to uucode
ISO8859-8_uucode	ISO8859-8 to uucode
ISO8859-9_uucode	ISO8859-9 to uucode
TIS-620_uucode	TIS-620 to uucode
big5_uucode	big5 to uucode
GBK_uucode	GBK to uucode
uucode_IBM-850	uucode to IBM-850
uucode_IBM-921	uucode to IBM-921
uucode_IBM-922	uucode to IBM-922
uucode_IBM-932	uucode to IBM-932
uucode_IBM-943	uucode to IBM-943
uucode_IBM-1124	uucode to IBM-1124
uucode_IBM-1129	uucode to IBM-1129
uucode_IBM-eucCN	uucode to IBM-eucCN
uucode_IBM-eucJP	uucode to IBM-eucJP
uucode_IBM-eucKR	uucode to IBM-eucKR
uucode_IBM-eucTW	uucode to IBM-eucTW
uucode_ISO8859-1	uucode to ISO8859-1
uucode_ISO8859-2	uucode to ISO8859-2
uucode_ISO8859-3	uucode to ISO8859-3
uucode_ISO8859-4	uucode to ISO8859-4
uucode_ISO8859-5	uucode to ISO8859-5
uucode_ISO8859-6	uucode to ISO8859-6
uucode_ISO8859-7	uucode to ISO8859-7
uucode_ISO8859-8	uucode to ISO8859-8
uucode_ISO8859-9	uucode to ISO8859-9
uucode_TIS-1124	uucode to TIS-1129
uucode_big5	uucode to big5
uucode_GBK	uucode to GBK

List of UCS-2 Interchange Converters

UCS-2 is a universal, 16-bit encoding described in the Code Set Overview. Conversions for each code set are provided in both directions, between the code set and UCS-2.

UCS-2 converters are found in /usr/lib/nls/loc/uconvTable and /usr/lib/nls/loc/uconv directories. The uconvdef command is used to generate new converters or to customize existing UCS-2 converters.

The /usr/lib/nls/loc/iconv/Universal_UCS_Conv converter is used to generate conversions from any code set X to code set Y by setting the proper links:

cd /usr/lib/nls/loc/iconv
ln -s /usr/lib/nls/loc/uconv/Universal_UCS_Conv X_Y
ln -s /usr/lib/nls/loc/uconv/UCSTBL X_UCS-2
ln-s /usr/lib/nls/loc/uconv/UCSTBL UCS-2_Y
ln -s /usr/lib/nls/loc/uconv/UCSTBL X
ln -s /usr/lib/nls/loc/uconv/UCSTBL Y

Converter	Description
ISO8859-1	UCS-2 <--> ISO Latin-1
ISO8859-2	UCS-2 <--> ISO Latin-2
ISO8859-3	UCS-2 <--> ISO Latin-3
ISO8859-4	UCS-2 <--> ISO Latin-4
ISO8859-5	UCS-2 <--> ISO Cyrillic
ISO8859-6	UCS-2 <--> ISO Arabic
ISO8859-7	UCS-2 <--> ISO Greek
ISO8859-8	UCS-2 <--> ISO Hebrew
ISO8859-9	UCS-2 <--> ISO Turkish
JISX0201.1976-0	UCS-2 <--> Japanese JISX0201-0
JISX0208.1983-0	UCS-2 <--> Japanese JISX0208-0
CNS11643.1986-1	UCS-2 <--> Chinese CNS11643-1
CNS11643.1986-2	UCS-2 <--> Chinese CNS11643-2
KSC5601.1987-0	UCS-2 <--> Korean KSC5601-0
IBM-eucCN	UCS-2 <--> Simplified Chinese EUC
IBM-udcCN	UCS-2 <--> Simplified Chinese user-defined characters
IBM-sbdCN	UCS-2 <--> Simplified Chinese IBM-specific characters
GB2312.1980-0	UCS-2 <--> Simplified Chinese GB
IBM-1381	UCS-2 <--> Simplified Chinese PC data code
IBM-935	UCS-2 <--> Simplified Chinese EBCDIC
IBM-936	UCS-2 <--> Simplified Chinese PC5550
IBM-eucJP	UCS-2 <--> Japanese EUC
IBM-eucKR	UCS-2 <--> Korean EUC
IBM-eucTW	UCS-2 <--> Traditional Chinese EUC
IBM-udcJP	UCS-2 <--> Japanese user-defined characters
IBM-udcTW	UCS-2 <--> Traditional Chinese user-defined characters
IBM-sbdTW	UCS-2 <--> Traditional Chinese IBM-specific characters
UTF-8	UCS-2 <--> UTF-8
IBM-437	UCS-2 <--> USA PC data code
IBM-850	UCS-2 <--> Latin-1 PC data code
IBM-852	UCS-2 <--> Latin-2 PC data code
IBM-857	UCS-2 <--> Turkish PC data code
IBM-860	UCS-2 <--> Portuguese PC data code
IBM-861	UCS-2 <--> Icelandic PC data code
IBM-863	UCS-2 <--> French Canadian PC data code
IBM-865	UCS-2 <--> Nordic PC data code
IBM-869	UCS-2 <--> Greek PC data code
IBM-921	UCS-2 <--> Baltic Multilingual data code
IBM-922	UCS-2 <--> Estonian data code
IBM-932	UCS-2 <--> Japanese PC data code
IBM-943	UCS-2 <--> Japanese PC data code
IBM-934	UCS-2 <--> Korea PC data code
IBM-936	UCS-2 <--> People's Republic of China PC data code
IBM-938	UCS-2 <--> Taiwanese PC data code
IBM-942	UCS-2 <--> Extended Japanese PC data code
IBM-944	UCS-2 <--> Korean PC data code
IBM-946	UCS-2 <--> People's Republic of China SAA data code
IBM-948	UCS-2 <--> Traditional Chinese PC data code
IBM-1124	UCS-2 <--> Ukranian PC data code
IBM-1129	UCS-2 <--> Vietnamese PC data code
TIS-620	UCS-2 <--> Thailand PC data code
IBM-037	UCS-2 <--> USA, Canada EBCDIC
IBM-273	UCS-2 <--> Germany, Austria EBCDIC
IBM-277	UCS-2 <--> Denmark, Norway EBCDIC
IBM-278	UCS-2 <--> Finland, Sweden EBCDIC
IBM-280	UCS-2 <--> Italy EBCDIC
IBM-284	UCS-2 <--> Spain, Latin America EBCDIC
IBM-285	UCS-2 <--> United Kingdom EBCDIC
IBM-297	UCS-2 <--> France EBCDIC
IBM-500	UCS-2 <--> International EBCDIC
IBM-875	UCS-2 <--> Greek EBCDIC
IBM-930	UCS-2 <--> Japanese Katakana-Kanji EBCDIC
IBM-933	UCS-2 <--> Korean EBCDIC
IBM-937	UCS-2 <--> Traditional Chinese EBCDIC
IBM-939	UCS-2 <--> Japanese Latin-Kanji EBCDIC
IBM-1026	UCS-2 <--> Turkish EBCDIC
IBM-1112	UCS-2 <--> Baltic Multilingual EBCDIC
IBM-1122	UCS-2 <--> Estonian EBCDIC
IBM-1124	UCS-2 <--> Ukranian EBCDIC
IBM-1129	UCS-2 <--> Vietnamese EBCDIC
GBK	UCS-2<--> Simplified Chinese
TIS-620	UCS-2 <-->Thailand EBCDIC

List of UTF-8 Interchange Converters

UTF-8 is a universal, multibyte encoding described in the UCS-2 and UTF-8. Conversions for each code set are provided in both directions, between the code set and UTF-8.

UTF-8 converters are usually done by using the Universal_UCS_Conv (see List of UCS-2 Interchange Converters and /usr/lib/nls/loc/uconv/UTF-8 conversion.

Converter	Description
ISO8859-1	UTF-8 <--> ISO Latin-1
ISO8859-2	UTF-8 <--> ISO Latin-2
ISO8859-3	UTF-8 <--> ISO Latin-3
ISO8859-4	UTF-8 <--> ISO Latin-4
ISO8859-5	UTF-8 <--> ISO Cyrillic
ISO8859-6	UTF-8 <--> ISO Arabic
ISO8859-7	UTF-8 <--> ISO Greek
ISO8859-8	UTF-8 <--> ISO Hebrew
ISO8859-9	UTF-8 <--> ISO Turkish
JISX0201.1976-0	UTF-8 <--> Japanese JISX0201-0
JISX0208.1983-0	UTF-8 <--> Japanese JISX0208-0
CNS11643.1986-1	UTF-8 <--> Chinese CNS11643-1
CNS11643.1986-2	UTF-8 <--> Chinese CNS11643-2
KSC5601.1987-0	UTF-8 <--> Korean KSC5601-0
IBM-eucCN	UTF-8 <--> Simplified Chinese EUC
IBM-eucJP	UTF-8 <--> Japanese EUC
IBM-eucKR	UTF-8 <--> Korean EUC
IBM-eucTW	UTF-8 <--> Traditional Chinese EUC
IBM-udcJP	UTF-8 <--> Japanese user-defined characters
IBM-udcTW	UTF-8 <--> Traditional Chinese user-defined characters
IBM-sbdTW	UTF-8 <--> Traditional Chinese IBM-specific characters
UCS-2	UTF-8 <--> UCS-2
IBM-437	UTF-8 <--> USA PC data code
IBM-850	UTF-8 <--> Latin-1 PC data code
IBM-852	UTF-8 <--> Latin-2 PC data code
IBM-857	UTF-8 <--> Turkish PC data code
IBM-860	UTF-8 <--> Portuguese PC data code
IBM-861	UTF-8 <--> Icelandic PC data code
IBM-863	UTF-8 <--> French Canadian PC data code
IBM-865	UTF-8 <--> Nordic PC data code
IBM-869	UTF-8 <--> Greek PC data code
IBM-921	UTF-8 <--> Baltic Multilingual data code
IBM-922	UTF-8 <--> Estonian data code
IBM-932	UTF-8 <--> Japanese PC data code
IBM-943	UTF-8 <--> Japanese PC data code
IBM-934	UTF-8 <--> Korea PC data code
IBM-935	UTF-8 <--> Simplified Chinese EBCDIC
IBM-936	UTF-8 <--> People's Republic of China PC data code
IBM-938	UTF-8 <--> Taiwanese PC data code
IBM-942	UTF-8 <--> Extended Japanese PC data code
IBM-944	UTF-8 <--> Korean PC data code
IBM-946	UTF-8 <--> People's Republic of China SAA data code
IBM-948	UTF-8 <--> Traditional Chinese PC data code
IBM-1124	UTF-8 <--> Ukranian PC data code
IBM-1129	UTF-8 <--> Vietnamese PC data code
TIS-620	UTF-8 <--> Thailand PC data code
IBM-037	UTF-8 <--> USA, Canada EBCDIC
IBM-273	UTF-8 <--> Germany, Austria EBCDIC
IBM-277	UTF-8 <--> Denmark, Norway EBCDIC
IBM-278	UTF-8 <--> Finland, Sweden EBCDIC
IBM-280	UTF-8 <--> Italy EBCDIC
IBM-284	UTF-8 <--> Spain, Latin America EBCDIC
IBM-285	UTF-8 <--> United Kingdom EBCDIC
IBM-297	UTF-8 <--> France EBCDIC
IBM-500	UTF-8 <--> International EBCDIC
IBM-875	UTF-8 <--> Greek EBCDIC
IBM-930	UTF-8 <--> Japanese Katakana-Kanji EBCDIC
IBM-933	UTF-8 <--> Korean EBCDIC
IBM-937	UTF-8 <--> Traditional Chinese EBCDIC
IBM-939	UTF-8 <--> Japanese Latin-Kanji EBCDIC
IBM-1026	UTF-8 <--> Turkish EBCDIC
IBM-1112	UTF-8 <--> Baltic Multilingual EBCDIC
IBM-1122	UTF-8 <--> Estonian EBCDIC
IBM-1124	UTF-8 <--> Ukranian EBCDIC
IBM-1129	UTF-8 <--> Vietnamese EBCDIC
IBM-1381	UTF-8 <--> Simplified Chinese PC data code
GBK	UTF-8<--> Simplified Chinese
TIS-620	UTF-8 <--> Thailand EBCDIC

List of Miscellaneous Converters

A set of low level converters used by the code set and interchange converters is provided. These converters are called miscellaneous converters. These low-level converters may be used by some of the interchange converters. However, the use of these converters is discouraged because they are intended for support of other converters.

Files

The following list describes the miscellaneous converters found in the /usr/lib/nls/loc/iconv and /usr/lib/nls/loc/iconvTable directories:

Converter	Description
IBM-932_JISX0201.1976-0	IBM-932 to JISX0201.1976-0
IBM-932_JISX0208.1983-0	IBM-932 to JISX0208.1983-0
IBM-932_IBM-udcJP	IBM-932 to IBM-udcJP (Japanese user-defined characters)
IBM-943_JISX0201.1976-0	IBM-943 to JISX0201.1976-0
IBM-943_JISX0208.1983-0	IBM-943 to JISX0208.1983-0
IBM-943_IBM-udcJP	IBM-943 to IBM-udcJP (Japanese user-defined characters
IBM-eucJP_JISX0201.1976-0	IBM-eucJP to JISX0201.1976-0
IBM-eucJP_JISX0208.1983-0	IBM-eucJP to JISX0208.1983-0
IBM-eucJP_IBM-udcJP	IBM-eucJP to IBM-udcJP (Japanese user-defined characters)
IBM-eucKR_KSC5601.1987-0	IBM_eucKR to KSC5601.1987-0
IBM-eucTW_CNS11643.1986-1	IBM-eucTW to CNS11643.1986.1
IBM-eucTW_CNS11643.1986-2	IBM-eucTW to CNS11643.1986-2
IBM-eucCN_GB2312.1980-0	IBM-eucCN to GB2312.1980-0

Related Information

Chapter 16, National Language Support, List of National Language Support Subroutines.

Code Set Overview in AIX 5L Version 5.1 Kernel Extensions and Device Support Programming Concepts.

The iconv command, uuencode and uudecode commands.

The iconv_open subroutine, iconv subroutine, iconv_close subroutine.