National Language Support Guide and Reference

National Language Support Overview

Many system variables are used to establish the language environment of the system. These variables and their supporting commands, files, and other tools, are referred to as National Language Support (NLS).

Topics covered in this chapter are:

NLS provides commands and Standard C Library subroutines for a single worldwide system base. An internationalized system has no built-in assumptions or dependencies on language-specific or cultural-specific conventions such as:

Code sets
Character classifications
Character comparison rules
Character collation order
Numeric and monetary formatting
Date and time formatting
Message-text language.

All information pertaining to cultural conventions and language is obtained at process run time.

The following capabilities are provided by NLS to maintain a system running in an international environment:

Separation of Messages from Programs

To facilitate translations of messages into various languages and to make the translated messages available to the program based on a user's locale, it is necessary to keep messages separate from the programs and provide them in the form of message catalogs that a program can access at run time. To aid in this task, commands and subroutines are provided by the message facility. For more information, see Message Facility.

Conversion between Code Sets

A character is any symbol used for the organization, control, or representation of data. A group of such symbols used to describe a particular language make up a character set. A code set contains the encoding values for a character set. It is the encoding values in a code set that provide the interface between the system and its input and output devices.

Historically, the effort was directed at encoding the English alphabet. It was sufficient to use a 7-bit encoding method for this purpose because the number of English characters is not large. To support larger alphabets, such as the Asian languages, such as Chinese, Japanese, and Korean, additional code sets were developed that contained multibyte encodings.

A character is any symbol used for the organization, control, or representation of data. A group of such symbols for describing a particular language make up a character set. A code set contains the encoding values for a character set. The encoding values in a code set provide the interface between the system and its input and output devices.

An internationalized program must accurately read data generated in different code set environments and process the information accurately. You can use nl_langinfo(CODESET) to obtain the current code set in a process. The return value is a char pointer that is the name of the code set in the system. Because code set names are not standard, programs should not depend on any specific value for this string. Knowing the current code set can aid in code-set conversion. NLS supplies converters that translate character encoding values found in different code sets. For more information, see Converters Overview for Programming.

Input Method Support

The input of characters becomes complicated for languages having large character sets. For example, in Chinese, Korean, and Japanese, where the number of characters is large, it is not possible to provide one-to-one key mapping for a keystroke to a character. However, a special input method enables the user to enter phonetic or stroke characters and have them converted into native-language characters. A keyboard map associated with each keyboard matches sequences of one or more keystrokes with the appropriate character encoding. For more information, see Input Methods.