System Management Concepts: Operating System and Devices

Converters Overview

National Language Support (NLS) provides a base for internationalization to allow data to be changed from one code set to another. You might need to convert text files or message catalogs. There are several standard converters for this purpose.

This section covers the following aspects of conversion:

Converters Introduction

When a program sends data to another program residing on a remote host, the data can require conversion from the code set of the source machine to that of the receiver. For example, when communicating with an IBM VM system, the system converts its ISO8859-1 data to EBCDIC. Code sets define character and control function assignments to code points. These coded characters must be converted when a program receives data in one code set but displays it in another code set.

There are two interfaces for doing conversions:

iconv command
Allows you to request a specific conversion by naming the from and to code sets.
libiconv functions
Allow applications to request converters by name.

The system provides a library of converters that is ready to use. You supply the name of the converter you want to use. The converter libraries are found in the following directories: /usr/lib/nls/loc/iconv/* and /usr/lib/nls/loc/iconvTable/*.

In addition to code set converters, the converter library also provides a set of network interchange converters. In a network environment, the code sets of the communications systems and the protocols of communication determine how the data is converted.

Interchange converters are used to convert data sent from one system to another. Conversions done from one internal code set to another require code set converters. Whether data must be converted from a sender's code set to a receiver's code set, or 8-bit data must be converted into 7-bit data form, a uniform interface is required. The iconv subroutines provide this interface.

Standard Converters

There are standard converters for use with the iconv command and subroutines. The following list describes the different types of converters.

For a list of converters, see AIX 5L Version 5.1 General Programming Concepts: Writing and Debugging Programs.

Converter Type

Table converter
Converts single-byte stateless code sets. Performs a table translation from one byte to another byte
Multibyte converter
Provides conversions between multibyte code sets; for example, between Japanese PC Code (IBM-943 and IBM-932) or Japanese Code (IBM-eucJP) and IBM Japanese Host Code Sets (IBM-930 and IBM-939).

Interchange Converter Types

7-bit converter
Converts between internal code sets and standard interchange formats (7-bit)
8-bit converter
Converts between internal code sets and standard interchange formats (8-bit)
Compound text converter
Converts between compound text and internal code sets
uucode converter
Provides the same mapping as the uuencode and uudecode commands
Miscellaneous converters
Used by some of the converters listed previously.

Understanding iconv Libraries

The iconv facility consists of a set of functions that contain the data and logic to convert from one code set to another. The utility also includes the iconv command, which converts data. A single system can have several converters. The LOCPATH environment variable determines the converter that the iconv subroutines use.

Note: All setuid and setgid programs ignore the LOCPATH environment variable.

Universal UCS Converter

UCS-2 is a universal 16-bit encoding (see the code set overview in AIX 5L Version 5.1 General Programming Concepts: Writing and Debugging Programs) that can be used as an interchange medium to provide conversion capability between virtually any code sets. The conversion can be accomplished using the Universal UCS Converter, which converts between any two code sets XXX and YYY as follows:

XXX <-> UCS-2 <-> YYY

The XXX and YYY conversions must be included in the supported List of UCS-2 Interchange Converters, and must be installed on the system.

The universal converter is installed as the file /usr/lib/nls/loc/iconv/Universal_UCS_Conv. A new conversion can be supported by creating a new link with the appropriate name in the /usr/lib/nls/loc/iconv directory. For example, to support new converters between IBM-850 and IBM-437, you can execute the following commands:

ln -s /usr/lib/nls/loc/iconv/Universal_UCS_Conv
/usr/lib/nls/loc/iconv/IBM-850_IBM-437
 
ln -s /usr/lib/nls/loc/iconv/Universal_UCS_Conv
/usr/lib/nls/loc/iconv/IBM-437_IBM-850

Attention: If a converter link is created for incompatible code sets (for example, ISO8859-1 and IBM-eucJP), and if the source data contains characters that do not exist in the target code set, significant data loss can result.

The conversion between multibyte and wide character code depends on the current locale setting. Do not exchange wide character codes between two processes, unless you have knowledge that each locale that might be used handles wide character codes in a consistent fashion. Most locales for this operating system use the Unicode character value as a wide character code, except locales based on the IBM-850 and IBM-eucTW codesets.