General Programming Concepts: Writing and Debugging Programs

National Language Support Subroutines Overview

When internationalizing programs using National Language Support (NLS), it is important that there be some guidelines for providing this support. The intent of this section is to guide programmers in developing portable internationalized programs. An understanding of the concepts explained in Locale Overview for Programming is a prerequisite to this section.

Introducing Locale Subroutines

Programs that perform locale-dependent processing, including user messages, must call the setlocale

subroutine at the beginning of the program. This call should be the first executable statement in the main program. Programs that do not call the setlocale subroutine in this way inherit the C

or POSIX locale. Such programs perform as in the C locale regardless of the setting of the LC_* and LANG environment variables.

Other subroutines are provided to determine the current settings for locale data formatting. For more information about these subroutines, see Locale Subroutines.

Introducing Time Formatting Subroutines

Programs that need to format or time into wide character code strings can use the wcsftime subroutine. Programs that need to convert multibyte strings into an internal time format can use the strptime subroutine. For more information about these subroutines, see Time Formatting Subroutines.

Introducing Monetary Formatting Subroutines

Programs that need to specify or access monetary quantities can call the strfmon subroutine. For more information about this subroutine, see Monetary Formatting Subroutines.

Introducing Multibyte and Wide Character Subroutines

The external representation of data is referred to as the file code representation of a character. When file code data is created in files or transferred between a computer and its I/O devices, a single character may be represented by one or several bytes. For processing strings of such characters, it is more efficient to convert these codes into a uniform-length representation. This converted form is intended for internal processing of characters. The internal representation of data is referred to as the process code or wide character code representation of the character.

NLS internationalization of programs is a blend of multibyte and wide character subroutines. A multibyte subroutine uses multibyte character sets. A wide character subroutine uses wide character sets. Multibyte subroutines have an mb prefix. Wide character subroutines have a wc prefix. The corresponding string-handling subroutines are indicated by the mbs and wcs prefixes, respectively. Deciding when to use multibyte or wide character subroutines can be made only after careful analysis.

If a program primarily uses multibyte subroutines, it may be necessary to convert the multibyte character codes to wide character codes to use certain wide character subroutines. If a program uses wide character subroutines, data may need to be converted to multibyte form for invoking subroutines. Both methods have drawbacks, depending on the program and the availability of standard subroutines to perform the required processing. For instance, there is no corresponding standard multibyte subroutine for the wide character display-column-width subroutine.

If a program can process its characters in multibyte code, this method should be used instead of converting the characters to wide character code.

For more information about the subroutines provided for converting between multibyte code and wide character code form, see Multibyte Code and Wide Character Code Conversion Subroutines.

wchar.h Header File

The wchar.h header file declares information necessary for programming with multibyte and wide character subroutines. The wchar.h header file declares the wchar_t, wctype_t, and wint_t

data types, as well as several functions for testing wide characters. Because the number of characters implemented as wide characters exceeds that of basic characters, it is not possible to classify all wide characters into the existing classes used for basic characters. Therefore, it is necessary to provide a way of defining additional classes specific to some locale. The action of these subroutines is affected by the current locale.

The wchar.h header file also declares subroutines for manipulating wide character strings (that is, wchar_t data type arrays). Array length is always determined in terms of the number of wchar_t elements in an array. A null wide character code ends an array. A pointer to a wchar_t or void array always points to the initial element of the array.

Note: If the number of wchar_t elements in an array exceeds the defined array length, unpredictable results can occur.

Introducing Internationalized Regular Expression Subroutines

Programs that contain internationalized regular expressions can use the regcomp, regexec, regerror, regfree, and fnmatch subroutines. For more information about these subroutines, see Internationalized Regular Expression Subroutines.