Converters Overview

AIX Version 4.3 System Management Guide: Operating System and Devices

Converters Overview

National Language Support (NLS) provides a base for internationalization to allow data to be changed from one code set to another. You may need to convert text files or message catalogs. There are several standard converters for this purpose.

This section discusses the following aspects of conversion:

Converters Introduction

When a program sends data to another program residing on a remote host, the data can require conversion from the code set of the source machine to that of the receiver. For example, when communicating with an IBM VM system, the system converts its ISO8859-1 data to EBCDIC. Code sets define character and control function assignments to code points. These coded characters must be converted when a program receives data in one code set but displays it in another code set.

There are two interfaces for doing conversions:

iconv command
Allows you to request a specific conversion by naming the from and to code sets.
libiconv functions
Allow applications to request converters by name.

The system provides a library of converters that is ready to use. You supply the name of the converter you want to use. The converter libraries are found in the following directories: /usr/lib/nls/loc/iconv/* and /usr/lib/nls/loc/iconvTable/*.

In addition to code set converters, the converter library also provides a set of network interchange converters. In a network environment, the code sets of the communications systems and the protocols of communication determine how the data should be converted.

Interchange converters are used to convert data sent from one system to another. Conversions done from one internal code set to another require code set converters. Whether data must be converted from a sender's code set to a receiver's code set, or 8-bit data must be converted into 7-bit data form, a uniform interface is required. The iconv subroutines provide this interface.

Standard Converters

There are standard converters for use with the iconv command and subroutines. The following list describes the different types of converters.

Converter Type

Table converter
Converts single-byte stateless code sets. Performs a table translation from one byte to another byte.
Multibyte converter
Provides conversions between multibyte code sets; for example, between Japanese PC Code (IBM-932) or Japanese AIX Code (IBM-eucJP) and IBM Japanese Host Code Sets (IBM-930 and IBM-939).

Interchange Converter Types

7-bit converter
Converts between internal code sets and standard interchange formats (7-bit).
8-bit converter
Converts between internal code sets and standard interchange formats (8-bit).
Compound text converter
Converts between compound text and internal code sets.
uucode converter
Provides the same mapping as the uuencode and uudecode commands.
Miscellaneous converters
Used by some of the converters listed above.

Understanding iconv Libraries

The iconv facility consists of a set of functions that contain the data and logic to convert from one code set to another. The utility also includes the iconv command, which converts data. A single system can have several converters. The LOCPATH environment variable determines the converter that the iconv subroutines use.

Note: All setuid and setgid programs ignore the LOCPATH environment variable.

Using the iconv Command

Any converter installed in the system can be used through the iconv command, which uses the iconv library. The iconv command acts as a filter for converting from one code set to another. For example, the following command filters data from PC Code (IBM-850) to ISO8859-1:

cat File | iconv -f IBM-850 -t ISO8859-1 | 
tftp -p - host /tmp/fo

The iconv command converts the encoding of characters read from either standard input or the specified file and then writes the results to standard output.

Universal UCS Converter

UCS-2 is a universal 16-bit encoding (see the code set overview in AIX Version 4.3 General Programming Concepts: Writing and Debugging Programs) that can be used as an interchange medium to provide conversion capability between virtually any code sets. The conversion can be accomplished using the Universal UCS Converter, which converts between any two code sets XXX and YYY as follows:

XXX <-> UCS-2 <-> YYY

The XXX and YYY conversions must be included in the supported List of UCS-2 Interchange Converters, and must be installed on the system.

The universal converter is installed as the file /usr/lib/nls/loc/iconv/Universal_UCS_Conv. A new conversion can be supported by creating a new link with the appropriate name in the /usr/lib/nls/loc/iconv directory. For example, to support new converters between IBM-850 and IBM-437, you can execute the following commands:


ln -s /usr/lib/nls/loc/iconv/Universal_UCS_Conv
/usr/lib/nls/loc/iconv/IBM-850_IBM-437
   
ln -s /usr/lib/nls/loc/iconv/Universal_UCS_Conv
/usr/lib/nls/loc/iconv/IBM-437_IBM-850

Attention: If a converter link is created for incompatible code sets (for example, ISO8859-1 and IBM-eucJP), and if the source data contains characters that don't exist in the target code set, significant data loss can result.

The conversion between multibyte and wide character code depends on the current locale setting. Do not exchange wide character codes between two processes, unless you have knowledge that each locale that might be used handles wide character codes in a consistent fashion. Most AIX locales use the Unicode character value as a wide character code, except locales based on the IBM-850 and IBM-eucTW codesets.