[ Previous | Next | Contents | Glossary | Home | Search ]
Motif 2.1 Programmer's Guide

Issues in Internationalized Applications

There are several important issues to keep in mind when designing an application so that it takes advantage of Motif's internationalization capabilities.

Internationalization and Localization

An internationalized application contains no code that is dependent on the user's language, the characters needed to represent that language, or any formats (such as date and currency) that the user expects to see and interact with. Motif accomplishes this by storing language and custom dependent information outside the application.

The following figure shows the kinds of information that should be external to an application to simplify internationalization.

Figure 7. Information External to the Application.

View figure.

Because the language and culture dependent information is separate from the application source code, the application does not need to be rewritten or recompiled to be marketed in a different countries. Instead, the only requirement is for the external information to be localized to accommodate local language and custom.

Localizing the application includes the process of translating certain parts of the external information into the appropriate language and storing the translated information in files that are then accessed by the application. In addition, the application may be told the format to use to display time, date, and the other language or culture dependent formats shown in the previous figure.

Every language consists of a set of characters that, either individually or in combination, represents meaningful words or concepts in the language. The set of characters is called a character set. The set of binary values needed to represent all the characters in a language is called a coded character set or, more simply, a code set.

Several attempts were started long ago to standardize code sets and continue to this day. The most commonly used code set for English is the American National Standard Code for Information Interchange (ASCII). It originally used a 7-bit encoding scheme plus an eighth bit for error control. Using 7 bits for character representation allows 128 unique binary values. Later versions use the eighth bit as a code bit allowing 255 characters. Both are fine for English and some other alphabetic languages, but neither is suitable for ideographic languages such as Chinese, Japanese, and Korean. Ideographic languages represent a concept or an idea as a single character; consequently, there are thousands of characters in these languages, and two or more bytes are needed to represent the characters.

Other standard code sets have been developed to accommodate other languages. The ISO8859 standard is perhaps the most commonly used of these. Different versions of the ISO8859 standard exist for various areas of the world. The following table shows a typical language and code set relationship for various areas. The code sets shown generally cover many more areas than are indicated, and the Table 7 is merely meant as a guide. (As an example, the ISO8859-3 code set covers, in addition to the languages indicated in the table, Afrikaans, Esperanto, German, Italian, Maltese, and Turkish. You can also use it for English.)

Table 7. Areas and Typical Code Sets

Area or Language Code Set
English ASCII, ISO8859-1
Western Europe ISO8859-1
Eastern Europe ISO8859-2
Dutch, Catalan, Spanish ISO8859-3
Northern Europe ISO8859-4
Russian, Ukrainian, Serbian ISO8859-5
Hebrew ISO8859-6
Greek ISO8859-7, 8, 9
Japan Shift JIS
Japan UJIS

See the specifications for the American National Standards Institute (ANSI) C programming language and the X/Open Portability Guide, Issue 3 (XPG3) for more information on standards involved in internationalization.

Obtaining Input

Special considerations must be made for the user of an application to input characters in the local written language. Virtually all applications require some action on the part of the user, often asking for input in one form or another. For example, an application can ask the user to input information in text form, such as name, home address, and so on. The user must then enter this information by typing it on the keyboard in the normal manner. This is done with relative ease in an English-based application but can become more complex when text in another language is desired.

Motif uses Xlib functions to provide the basic support for obtaining input in the Text widget.

The Problems

Many languages are expressed by means of an alphabet made up of characters or letters. The letters are arranged in groups to form meaningful words. A keyboard suitable for the language contains all the letters of the alphabet, plus the standard numerals and punctuation marks. The first problem arises when, as in English, standard spelling and usage requires two characters for each letter of the alphabet, while the standard keyboard contains only one key for each letter. The solution to this problem is a Shift key, which, when pushed in combination with another key, changes the character that key produces.

A somewhat more serious problem arises when the keyboard does not have all the alphabet characters. This can happen when a German user is using an English-based keyboard and needs a German character such as "".

A far more involved example is the case of defining a keyboard to use for the ideographic languages. Because thousands of characters are needed to represent an ideographic language, no reasonable keyboard can be constructed with a single key for each character.

The Solution

X and Motif solve these input problems by using an input method, which, in its simplest form, is a layer of mapping between the keyboard keys (or combinations of keys) that the user types and the text data that is passed to the application. For example, the Danish user with an English keyboard who needs the letter "" must enter a combination of keystrokes (this varies among vendors but could be Extend char O / as an example) rather than just one keystroke. This is very similar to the act of using the Shift key to access uppercase letters.

An ideographic language's input method is often based on the language's phonetics, but there are also input methods based on a common graphics property of certain characters. The graphics method involves defining a key to map to a common graphic symbol that is the basis for multiple characters. The phonetic method is more commonly used. It requires a phonetic (alphabet-based) writing system. The number of phonetic signs or characters is few enough that a unique key is assigned to each phoneme. Characters are entered by pressing the appropriate phonetic keys.

Note that the full definition of an input method actually includes the manner in which text is typed as well as the simple keyboard mapping. In one form of input method, text is simply typed at the spot where it is to appear. In another method, often used in languages where every character requires more than one keystroke, preliminary text appears in some secondary window on the screen until enough has been typed to uniquely specify a new character, which is then passed to the application. In several popular input methods, the user types a phonetic representation of a spoken word and the input method determines which characters are pronounced that way. If only one character meets this criterion, it is displayed. If more than one character meets the criterion, a list of all characters found is displayed and the user chooses the desired one. It is then passed to the application. See Section 11.4.1 for more information on input methods.

Displaying Output

Displaying the output produced by an application intended for international use also requires some consideration. In order to display text, it must have the appropriate content, encoding and fonts. For example, many languages, especially ideographic ones, require more than one font. Bitmaps and pixmaps must be localized as well. An icon that is an appropriate or meaningful symbol in one country may be totally inappropriate or meaningless in another.

Locales and Localization

A locale is the language environment determined by the application at run time. The X Portability Guide defines locale as a means of specifying three characteristics of a language environment that may be needed for localization: language, territory, and code set. Motif supports only one locale per application; that is, an application can set the locale only once, at start-up time.

Motif uses the locale to help find:

  1. Resource files

  2. UID files

  3. Bitmap files

  4. Fonts used to display text and labels

  5. Text input method

  6. Character size

    The ANSI C method of setting the locale in an application is to use the function setlocale. How setlocale obtains a language when the language is not explicitly referenced in the call to setlocale is system dependent. For example, on POSIX systems, the environment variable LANG is used. The locale name is also used to establish a path to the localized files of information. How this is actually accomplished is explained in Section 11.2.

  7. [ Previous | Next | Contents | Glossary | Home | Search ]