[ Previous | Next | Table of Contents | Index | Library Home | Legal | Search ]

Performance Management Guide


Programming Considerations

Historically, the C language has displayed a certain amount of provinciality in its interchangeable use of the words byte and character. Thus, an array declared char foo[10] is an array of 10 bytes. But not all of the languages in the world are written with characters that can be expressed in a single byte. Japanese and Chinese, for example, require two or more bytes to identify a particular graphic to be displayed. Therefore, we distinguish between a byte, which is 8 bits of data, and a character, which is the amount of information needed to represent a single graphic.

Two characteristics of each locale are the maximum number of bytes required to express a character in that locale and the maximum number of output display positions a single character can occupy. These values can be obtained with the MB_CUR_MAX and MAX_DISP_WIDTH macros. If both values are 1, the locale is one in which the equivalence of byte and character still holds. If either value is greater than 1, programs that do character-by-character processing, or that keep track of the number of display positions used, must use internationalization functions to do so.

Because the multibyte encodings consist of variable numbers of bytes per character, they cannot be processed as arrays of characters. To allow efficient coding in situations where each character has to receive extensive processing, a fixed-byte-width data type, wchar_t, has been defined. A wchar_t is wide enough to contain a translated form of any supported character encoding. Programmers can therefore declare arrays of wchar_t and process them with (roughly) the same logic they would have used on an array of char, using the wide-character analogs of the traditional libc.a functions.

Unfortunately, the translation from the multibyte form in which text is entered, stored on disk, or written to the display, to the wchar_t form, is computationally quite expensive. It should only be performed in situations in which the processing efficiency of the wchar_t form will more than compensate for the cost of translation to and from the wchar_t form.


[ Previous | Next | Table of Contents | Index | Library Home | Legal | Search ]