General Programming Concepts: Writing and Debugging Programs

Japanese Input Method (JIM)

The Japanese Input Method (JIM) is a sophisticated input method that provides Japanese input. The features include:

Supports Romaji to Kana character conversion (RKC).
Supports Kana to Kanji character conversion (KKC).
Includes Hankaku (half-width) and Zenkaku (full-width) character input.
Provides system and user dictionary lookup.
Provides runtime registration of a word to the user dictionary.
Requires Callback functions to support:
- Status and Pre-edit drawing
- All candidate menus
- JIS Kutan number input and IBM Kanji number input
Supports IBM-943, IBM-932 and IBM-eucJP code sets.
For internal processing, the JIM uses the IBM-932 code set. However, it supports any code set, such as IBM-eucJP, that can be converted from IBM-932.
Located in the /usr/lib/nls/loc/JP.im file.
All other localized input methods are aliases to this file.

The Japanese code sets consist of three character groups:

Katakana
Hiragana
Kanji

Katakana and Hiragana consist of about 50 characters each and form the set of phonetic characters referred to as Kana. All of the sounds in the Japanese language can be represented in Kana.

Kanji is a set of ideographs. A simple concept can be represented by a single Kanji character, while more complicated meanings can be formed with strings of Kanji characters. There are several thousand Kanji characters.

The Japanese also use the Roman alphabet. Called Romaji, the Roman alphabet consists of 26 characters. It is used mostly in technical and professional environments to represent technical vocabulary that does not exist in Japanese. A typical sentence is usually a mixture of Katakana, Hiragana, Kanji, Romaji, numbers, and other characters.

Japanese Character Processing

The Japanese Industrial Standard (JIS) specifies about 7000 Kanji characters processed by computer systems. Japanese products made by this manufacturer support all of the standard characters and more. Input of the characters is accomplished through:

Kana-to-Kanji conversion (KKC)
Romaji-to-Kana conversion (RKC)

The following special keys appear on the 106-key Japanese keyboard to allow for these conversions:

Special Japanese Keys
Key Function	Key Name	Description of Function
KKC Non-conversion key	muhenkan	Leaves Kana characters as is.
KKC Conversion key	henkan	Converts Kana to Kanji.
KKC All Candidates key	zenkouho	Shows all possible Kanji representatives.
RKC Romaji Mode key	romaji	Toggles RKC on and off.
Hiragana Shift key	hiragana	Becomes Hiragana shift state.
Katakana Shift key	katakana	Becomes Katakana shift state.
Romaji Shift key	eisu	Becomes Romaji shift state.

Note: Shift states are maintained until you press another shift key. The initial state is Romaji.

Kana-To-Kanji Conversion (KKC) Technology

The Japanese Input Method's (JIM) KKC technology is based on the fact that every Kanji character or set of Kanji characters has a phonetic sound or sounds that can be expressed by Katakana or Hiragana characters.

It is much easier to input Hiragana or Katakana characters than Kanji characters. The JIM analyzes the phonetic values of the Hiragana and Katakana characters to determine the best Kanji-character equivalent. Such phonetic analysis depends on the dictionary and tables provided to the JIM.

Input Modes

The JIM has three different modes that can be used to control the input processing:

Keyboard Mapping
Allows invocation of alphanumeric, Katakana, or Hiragana modes.
Character Size
Inputs in Zenkaku (full-width) or Hankaku (half-width) mode.
RKC off/on
Inputs Kana directly or invoke the pre-edit composing mode to input Kana with a combination of alphabetic characters. The pre-editing facility allows processing of characters before they are committed to the application.

When the keyboard mapping mode is alphanumeric and the character size mode is Hankaku, the JIM maps keys to Romaji characters. This mode combination is known as the "English" mode. Pre-editing is not needed in English mode and cannot be invoked regardless of the RKC mode setting. The other mode combinations may initiate pre-editing and characters generated in these modes are not ASCII.

The following keys are used to perform Kana-to-Kanji conversion by the JIM.

Keysym	Keyboard Mapping
Katakana	Katakana shift
Eisu_toggle	Alphanumeric shift
Hiragana	Hiragana shift

Keysym	Character Size
Zenkaku_Hankaku	Full-width or Half-width toggle
Hankaku	Half-width
Zenkaku	Full-width

Keysym	RKC on/off
Alt-Hiragana	Enables/Disables Romaji-to-Kana conversion
Romaji	*The same effect

* Keysyms unique to the manufacturer

The following keys are also used when the JIM is pre-editing a Kanji string.

Keysym	Kanji pre-edit
Muhenkan	Non-conversion - commit Kana
Henkan	Conversion - get next candidate
Kanji	Same as Henkan
BunsetsuYomi	*Moves back a phrase
MaeKouko	*Moves to previous candidate
LeftDouble	*Moves cursor two characters left
RightDouble	*Moves cursor two characters right
ErInput	*Discards the current pre-edit string

Keysym	Auxiliary pre-edit
Alt-Henkan	All candidates
Touroku	Runtime registration
ZenKouho	*All candidates (the same effect)
KanjiBangou	*Kanji Number Input
HenkanMenu	*Changes conversion mode

* Keysyms unique to the manufacturer

Keyboard Mapping

There are 3 possible keyboard mapping states: Alphanumeric (Romaji), Katakana and Hiragana. Each state is invoked by a keysym that acts as a locking shift key. The keysyms are Katakana, Eisu_toggle, and Hiragana shift.

When one of these keysyms is pressed, keyboard mapping enters the state associated with the key. This state is maintained until one of the other keysyms is pressed. The initial shift state is Eisu_toggle, which can be changed by customization.

When you invoke the Hiragana or Katakana state, each key is mapped to a phonetic character within the respective character set. For example, if you press q, a Hiragana character pronounced "ta" is produced during Hiragana shift state, a Katakana character pronounced "ta" is produced during Katakana shift state, or a Romaji "q" is produced during Eisu_toggle shift state. On Japanese IBM keyboards, the tops of keys show all three symbols.

Also, when keyboard mapping is in Hiragana state, the input method is automatically put into a composing pre-editing mode where each Hiragana character can be converted into a Kanji character. See Kanji Pre-edit for more information.

Some keys have two Hiragana or Katakana characters assigned. For example, the 7 key has large and small Hiragana characters both having the pronunciation "ya". These characters are not upper and lower case equivalents of each other since Kanji, Hiragana, and Katakana do not have uppercase and lowercases. The small characters are used to express special phonetic sounds. These characters can be distinguished by using the shift key.

Character Size

A subset of the Japanese character set is represented in both full-width and half-width. Kanji ideographic characters are usually full-width. The phonetic and ASCII characters have both full-width and half-width representations. The user controls character size by pressing the Zenkaku_Henkaku keysym which toggles between full-width and half-width.

Romaji-To-Kana Conversion (RKC)

For users familiar with alphanumeric keyboards, it is easier to key in the phonetic sounds rather than the Hiragana or Katakana characters. The JIM provides Romaji-to-Kana conversion (RKC), allowing the user to type in the phonetic sounds of Hiragana or Katakana characters on an alphanumeric keyboard.

Kanji Pre-edit

When operating in Romaji-To-Kana conversion mode, you must follow two steps to produce Kanji characters. First, the user inputs Hiragana characters by typing their Romaji phonetic characters. In this step, you produce a Hiragana character by typing 1 to 3 Romaji alphabetic keys that compose the phonetic sound of the Hiragana character. Second, convert the Hiragana characters to Kanji characters by pressing the Henkan key. Many Kanji characters may be associated with a single phonetic phrase. The Henkan key displays the most likely Kanji candidates. Repeated pressing of the Henkan key displays all the additional candidates.

For example, when entering the Kanji characters for the phonetic sound "k-a-n-j-i", you must do two things:

Set the keyboard mapping to the Hiragana state.
Enable Romaji-to-Kana mapping by pressing the Alt-Hiragana key. This action invokes the alphanumeric keyboard.

You may now press the keys that spell "kanji". As each phonetic sound is completed, a Hiragana character is displayed.

The Hiragana character is displayed with visual feedback to indicate that the JIM is composing in a pre-edit state. The character is underlined and shown in reverse video. This feedback facility is known as a callback. See Using Callbacks for more information.

To convert the Hiragana character within the pre-edit string to a Kanji character, press the Henkan key. The most likely candidate associated with the phonetic Hiragana sound is displayed. Pressing this key repeatedly shows other candidates.

During the composition process, the pre-edit string is partitioned into segments that can be considered Kanji words. Once a string of kana characters is converted into a candidate, it is treated as one of these convertible segments. While the pre-edit string is displayed, the JIM uses the cursor key and other keys to manipulate the string.

To commit the pre-edit string to the program, the user presses the Enter key. In this case, the Enter key code itself is not sent to the program, only the string.

The Muhenkan keysym can also be used to turn off pre-edit and commit the Hiragana or Katakana character directly to the program.

The Keyboard Shift-State Transition table depicts the shift state transition and the interaction of the RKC mode key with the shift states.

Table 16-7.

Character Encoding	Code Points	Description	Count
000xxxxx	00-1F	Controls	32
00100000	20	Space	1
0xxxxxxx	21-7E	7-bit ASCII	94
01111111	7F	Delete	1
10000000	80	Undefined	1
100xxxxx 01xxxxxx	[81-9F] [40-7E]	Double byte	1953
100xxxxx 1xxxxxxx	[81-9F] [80-FC]	Double byte	3844
10100000	A0	Undefined	1
1xxxxxxx	A1-DF	8-bit single byte	63
111xxxxx 01xxxxxx	[E0-FC] [40-7E]	Double byte	1827
111xxxxx 1xxxxxxx	[E0-FC] [80-FC]	Double byte	3596
11111101	FD	Undefined	1
11111110	FE	Undefined	1
11111111	FF	All ones	1

There are 4 types of auxiliary areas within the JIM.

All Candidates menu
Kanji Number Input dialog
Conversion Mode menu
Runtime Registration dialog

A Kana-to-Kanji conversion operation on a string of Hiragana or Katakana characters can yield from one to a hundred Kanji candidates. At worst, you would have to press the conversion key more than a hundred times to get the correct Kanji character.

In such cases, it is more convenient to find the correct character by requesting the All Candidates menu with the ZenKouho or the Alt-Henkan keysym. This menu appears if the current target (a Kanji word that the cursor is pointing to in the pre-edit area) has several alternative candidates associated with it. The menu contains multiple candidates for selection. The All Candidates menu disappears when the Reset keysym is pressed, the Enter key is pressed, or a candidate is selected.

A Kanji Number Input dialog prompts the user to select the Kanji character by entering 3 to 5 digits. The digits represent the code of the character. Online dictionaries allow a user to search for the code. The ordering formats for these dictionaries vary. For example, one dictionary lists codes by phonetic sound. Another dictionary orders codes by the number of strokes used to compose the character. The KanjiBangou keysym invokes this menu. The menu is terminated with either the Reset or Return keysym.

The HenkanMenu keysym invokes the Conversion Mode menu. Four items are displayed for selection. The most important items are the word-conversion mode and phrase-conversion mode. Make a selection by choosing a number and pressing the Return keysym. This menu is terminated when either a selection is made or the Reset keysym is pressed.

A runtime registration dialog prompts the user to input a Kana string and a Kanji string for registering the mapping of the strings in the user dictionary. Once the pair is registered, the JIM can use it as a conversion candidate. The menu is terminated with the Escape or Reset keysym.

The presentation of menus depends on the interface environment in which the JIM is operating. For example, some interfaces support scrolling menus that use the Page Down and Page Up keys. Discussion of these interfaces is outside the scope of this document.

Keymaps:

ja_JP.IBM-eucJP.imkeymap

Ja_JP.IBM-932.imkeymap

Ja_JP.IBM-943.imkeymap

Keysyms:

The JIM uses the keysyms in the XK_KATAKANA, XK_LATIN1, and XK_MISCELLANY groups.

Reserved Keysyms:

XK_BunsetsuYomi	0x1800ff05	Back a phrase to Yomi
XK_MaeKouho	0x1800ff04	Previous candidate
XK_ZenKouho	0x1800ff01	All candidates.
XK_KanjiBangou	0x1800ff02	Kanji number input.
XK_HenkanMenu	0x1800ff03	Changes conversion mode.
XK_LeftDouble	0x1800ff06	Moves cursor two characters left.
XK_RightDouble	0x1800ff07	Moves cursor two characters right.
XK_LeftPhrase	0x1800ff08	Reserved for future use.
XK_RightPhrase	0x1800ff09	Reserved for future use.
XK_ErInput	0x1800ff0a	Discards the current pre-edit string
XK_Resetreset	0x1800ff0b	Reset

The preceding keysyms are unique to the input method of this system.

XK_Kanji	Convert Hiragana to Kanji.
XK_Muhenkan	Cancels conversion.
XK_Romaji	Puts JIM in Romaji input mode.
XK_Hiragana	Puts JIM in Hiragana input mode.
XK_Katakana	Puts JIM in Katakana input mode.
XK_Zenkaku_Hankaku	Toggles between full-width and half-width character input mode.
XK_Touroku	Registers a word to the user dictionary.
XK_Eisu_toggle	Puts JIM in alphanumeric input mode.

Modifiers:

ShiftMask 0x01
LockMask 0x02
ControlMask 0x04
Mod1Mask (Left-Alt) 0x08
Mod2Mask (Right-Alt) 0x10
Internal Modifiers:

Kana 0x20
Romaji 0x40

Related Information

Understanding the ISO Code Sets (ISO Code Sets) in AIX 5L Version 5.1 Kernel Extensions and Device Support Programming Concepts.

ShiftMask	0x01
LockMask	0x02
ControlMask	0x04
Mod1Mask (Left-Alt)	0x08
Mod2Mask (Right-Alt)	0x10

Kana	0x20
Romaji	0x40