General Programming Concepts: Writing and Debugging Programs

Chapter 16. National Language Support

National Language Support (NLS) provides commands and library subroutines for a single worldwide system base. An internationalized system has no built-in assumptions or dependencies on language-specific or cultural-specific conventions, such as:

Code sets
Character classifications
Character comparison rules
Character collation order
Numeric and monetary formatting
Date and time formatting
Message-text language

NLS Capabilities

An application that runs in an international environment must not have built-in assumptions about:

This information must be determined during application execution. NLS provides these capabilities and a base upon which new languages and code sets can be supported. As a result, programs can be ported across national language and locale boundaries. The POSIX.1 standard, the POSIX.2 standard, the ANSI/ISO C language standard, and the X/Open XPG specifications define standards for providing NLS support.

Locale-Specific and Culture-Specific Conventions

An internationalized program can process information correctly for different locations. (For example, the conventions for specifying date and time differ in the United States and England.) Similarly, the decimal point (radix character) and monetary symbols differ between the two countries. These types of language and cultural conventions for handling information are defined in a locale . For more information about locales, see Locale Overview for Programming.

User Messages in Native Languages

To facilitate translations of messages into various languages and make translated messages available to the program based on a user's locale, messages are kept separate from the programs by providing them in the form of message catalogs that a program can access at run time. To aid in this task, commands and subroutines are provided by the Message Facility. For more information, see Message Facility Overview for Programming.

Code Set Support

A character is any symbol used for the organization, control, or representation of data. A group of such symbols for describing a particular language make up a character set. A code set contains the encoding values for a character set. The encoding values in a code set provide the interface between the system and its input and output devices.

In the past,

the effort was directed at encoding the English alphabet. A 7-bit encoding method was adequate for this purpose because the number of English characters is not large. The C language defined the char data type to indicate a 7-bit character. A byte is an 8-bit quantity and is therefore used to represent a char data type value. The eighth bit was typically used for parity.

To support larger character sets, such as the Asian languages (for example, Chinese, Japanese, and Korean), additional code sets were developed that contained multibyte encodings. Because of multibyte encodings, the old concept of the char data type is no longer sufficient to represent a character. The C standard continues to refer to the char data type to mean a 7-bit character. However, the char data type really means a byte, either signed or unsigned.

An internationalized program must accurately read data generated in different code set environments and process the information accurately. You can use nl_langinfo(CODESET)

to obtain the current code set in a process. The return value is a char pointer that is the name of the code set in the system. Because code set names are not standard, programs should not depend on any specific value for this string. Knowing the current code set can aid in code-set conversion. NLS supplies converters that translate character encoding values found in different code sets. For more information, see Converters Overview for Programming.

Input Method Support

The input of characters becomes complicated for languages having large character sets. For example, in Chinese, Korean, and Japanese, where the number of characters is large, it is not possible to provide one-to-one key mapping for a keystroke to a character. However, a special input method enables the user to enter phonetic or stroke characters and have them converted into native-language characters. A keyboard map associated with each keyboard matches sequences of one or more keystrokes with the appropriate character encoding. For more information, see the Input Method Overview.

Overview of Chapter Contents

This NLS chapter contains the following information:

Related Information

For more information about code sets, Code Set Overview in AIX 5L Version 5.1 Kernel Extensions and Device Support Programming Concepts.

For information on maintaining the system in an NLS environment, see National Language Support Overview for System Management.

Character Set Description (charmap) Source File Format, Locale Definition Source File Format.

The setlocale subroutine.