[ Previous | Next | Table of Contents | Index | Library Home | Legal | Search ]

General Programming Concepts: Writing and Debugging Programs


Japanese Input Method (JIM)

The Japanese Input Method (JIM) is a sophisticated input method that provides Japanese input. The features include:

The Japanese code sets consist of three character groups:

Katakana and Hiragana consist of about 50 characters each and form the set of phonetic characters referred to as Kana. All of the sounds in the Japanese language can be represented in Kana.

Kanji is a set of ideographs. A simple concept can be represented by a single Kanji character, while more complicated meanings can be formed with strings of Kanji characters. There are several thousand Kanji characters.

The Japanese also use the Roman alphabet. Called Romaji, the Roman alphabet consists of 26 characters. It is used mostly in technical and professional environments to represent technical vocabulary that does not exist in Japanese. A typical sentence is usually a mixture of Katakana, Hiragana, Kanji, Romaji, numbers, and other characters.

Japanese Character Processing

The Japanese Industrial Standard (JIS) specifies about 7000 Kanji characters processed by computer systems. Japanese products made by this manufacturer support all of the standard characters and more. Input of the characters is accomplished through:

The following special keys appear on the 106-key Japanese keyboard to allow for these conversions:

Special Japanese Keys
Key Function Key Name Description of Function
KKC Non-conversion key muhenkan Leaves Kana characters as is.
KKC Conversion key henkan Converts Kana to Kanji.
KKC All Candidates key zenkouho Shows all possible Kanji representatives.
RKC Romaji Mode key romaji Toggles RKC on and off.
Hiragana Shift key hiragana Becomes Hiragana shift state.
Katakana Shift key katakana Becomes Katakana shift state.
Romaji Shift key eisu Becomes Romaji shift state.

Note: Shift states are maintained until you press another shift key. The initial state is Romaji.

Kana-To-Kanji Conversion (KKC) Technology

The Japanese Input Method's (JIM) KKC technology is based on the fact that every Kanji character or set of Kanji characters has a phonetic sound or sounds that can be expressed by Katakana or Hiragana characters.

It is much easier to input Hiragana or Katakana characters than Kanji characters. The JIM analyzes the phonetic values of the Hiragana and Katakana characters to determine the best Kanji-character equivalent. Such phonetic analysis depends on the dictionary and tables provided to the JIM.

Input Modes

The JIM has three different modes that can be used to control the input processing:

When the keyboard mapping mode is alphanumeric and the character size mode is Hankaku, the JIM maps keys to Romaji characters. This mode combination is known as the "English" mode. Pre-editing is not needed in English mode and cannot be invoked regardless of the RKC mode setting. The other mode combinations may initiate pre-editing and characters generated in these modes are not ASCII.

The following keys are used to perform Kana-to-Kanji conversion by the JIM.

Keysym Keyboard Mapping
Katakana Katakana shift
Eisu_toggle Alphanumeric shift
Hiragana Hiragana shift

Keysym Character Size
Zenkaku_Hankaku Full-width or Half-width toggle
Hankaku Half-width
Zenkaku Full-width

Keysym RKC on/off
Alt-Hiragana Enables/Disables Romaji-to-Kana conversion
Romaji *The same effect

* Keysyms unique to the manufacturer

The following keys are also used when the JIM is pre-editing a Kanji string.

Keysym Kanji pre-edit
Muhenkan Non-conversion - commit Kana
Henkan Conversion - get next candidate
Kanji Same as Henkan
BunsetsuYomi *Moves back a phrase
MaeKouko *Moves to previous candidate
LeftDouble *Moves cursor two characters left
RightDouble *Moves cursor two characters right
ErInput *Discards the current pre-edit string

Keysym Auxiliary pre-edit
Alt-Henkan All candidates
Touroku Runtime registration
ZenKouho *All candidates (the same effect)
KanjiBangou *Kanji Number Input
HenkanMenu *Changes conversion mode

* Keysyms unique to the manufacturer

Keyboard Mapping

There are 3 possible keyboard mapping states: Alphanumeric (Romaji), Katakana and Hiragana. Each state is invoked by a keysym that acts as a locking shift key. The keysyms are Katakana, Eisu_toggle, and Hiragana shift.

When one of these keysyms is pressed, keyboard mapping enters the state associated with the key. This state is maintained until one of the other keysyms is pressed. The initial shift state is Eisu_toggle, which can be changed by customization.

When you invoke the Hiragana or Katakana state, each key is mapped to a phonetic character within the respective character set. For example, if you press q, a Hiragana character pronounced "ta" is produced during Hiragana shift state, a Katakana character pronounced "ta" is produced during Katakana shift state, or a Romaji "q" is produced during Eisu_toggle shift state. On Japanese IBM keyboards, the tops of keys show all three symbols.

Also, when keyboard mapping is in Hiragana state, the input method is automatically put into a composing pre-editing mode where each Hiragana character can be converted into a Kanji character. See Kanji Pre-edit for more information.

Some keys have two Hiragana or Katakana characters assigned. For example, the 7 key has large and small Hiragana characters both having the pronunciation "ya". These characters are not upper and lower case equivalents of each other since Kanji, Hiragana, and Katakana do not have uppercase and lowercases. The small characters are used to express special phonetic sounds. These characters can be distinguished by using the shift key.

Character Size

A subset of the Japanese character set is represented in both full-width and half-width. Kanji ideographic characters are usually full-width. The phonetic and ASCII characters have both full-width and half-width representations. The user controls character size by pressing the Zenkaku_Henkaku keysym which toggles between full-width and half-width.

Romaji-To-Kana Conversion (RKC)

For users familiar with alphanumeric keyboards, it is easier to key in the phonetic sounds rather than the Hiragana or Katakana characters. The JIM provides Romaji-to-Kana conversion (RKC), allowing the user to type in the phonetic sounds of Hiragana or Katakana characters on an alphanumeric keyboard.

Kanji Pre-edit

When operating in Romaji-To-Kana conversion mode, you must follow two steps to produce Kanji characters. First, the user inputs Hiragana characters by typing their Romaji phonetic characters. In this step, you produce a Hiragana character by typing 1 to 3 Romaji alphabetic keys that compose the phonetic sound of the Hiragana character. Second, convert the Hiragana characters to Kanji characters by pressing the Henkan key. Many Kanji characters may be associated with a single phonetic phrase. The Henkan key displays the most likely Kanji candidates. Repeated pressing of the Henkan key displays all the additional candidates.

For example, when entering the Kanji characters for the phonetic sound "k-a-n-j-i", you must do two things:

  1. Set the keyboard mapping to the Hiragana state.
  2. Enable Romaji-to-Kana mapping by pressing the Alt-Hiragana key. This action invokes the alphanumeric keyboard.

You may now press the keys that spell "kanji". As each phonetic sound is completed, a Hiragana character is displayed.

The Hiragana character is displayed with visual feedback to indicate that the JIM is composing in a pre-edit state. The character is underlined and shown in reverse video. This feedback facility is known as a callback. See Using Callbacks for more information.

To convert the Hiragana character within the pre-edit string to a Kanji character, press the Henkan key. The most likely candidate associated with the phonetic Hiragana sound is displayed. Pressing this key repeatedly shows other candidates.

During the composition process, the pre-edit string is partitioned into segments that can be considered Kanji words. Once a string of kana characters is converted into a candidate, it is treated as one of these convertible segments. While the pre-edit string is displayed, the JIM uses the cursor key and other keys to manipulate the string.

To commit the pre-edit string to the program, the user presses the Enter key. In this case, the Enter key code itself is not sent to the program, only the string.

The Muhenkan keysym can also be used to turn off pre-edit and commit the Hiragana or Katakana character directly to the program.

The Keyboard Shift-State Transition table depicts the shift state transition and the interaction of the RKC mode key with the shift states.

Table 16-7.

Character Encoding Code Points Description Count
000xxxxx 00-1F Controls 32
00100000 20 Space 1
0xxxxxxx 21-7E 7-bit ASCII 94
01111111 7F Delete 1
10000000 80 Undefined 1
100xxxxx 01xxxxxx [81-9F] [40-7E] Double byte 1953
100xxxxx 1xxxxxxx [81-9F] [80-FC] Double byte 3844
10100000 A0 Undefined 1
1xxxxxxx A1-DF 8-bit single byte 63
111xxxxx 01xxxxxx [E0-FC] [40-7E] Double byte 1827
111xxxxx 1xxxxxxx [E0-FC] [80-FC] Double byte 3596
11111101 FD Undefined 1
11111110 FE Undefined 1
11111111 FF All ones 1

There are 4 types of auxiliary areas within the JIM.

A Kana-to-Kanji conversion operation on a string of Hiragana or Katakana characters can yield from one to a hundred Kanji candidates. At worst, you would have to press the conversion key more than a hundred times to get the correct Kanji character.

In such cases, it is more convenient to find the correct character by requesting the All Candidates menu with the ZenKouho or the Alt-Henkan keysym. This menu appears if the current target (a Kanji word that the cursor is pointing to in the pre-edit area) has several alternative candidates associated with it. The menu contains multiple candidates for selection. The All Candidates menu disappears when the Reset keysym is pressed, the Enter key is pressed, or a candidate is selected.

A Kanji Number Input dialog prompts the user to select the Kanji character by entering 3 to 5 digits. The digits represent the code of the character. Online dictionaries allow a user to search for the code. The ordering formats for these dictionaries vary. For example, one dictionary lists codes by phonetic sound. Another dictionary orders codes by the number of strokes used to compose the character. The KanjiBangou keysym invokes this menu. The menu is terminated with either the Reset or Return keysym.

The HenkanMenu keysym invokes the Conversion Mode menu. Four items are displayed for selection. The most important items are the word-conversion mode and phrase-conversion mode. Make a selection by choosing a number and pressing the Return keysym. This menu is terminated when either a selection is made or the Reset keysym is pressed.

A runtime registration dialog prompts the user to input a Kana string and a Kanji string for registering the mapping of the strings in the user dictionary. Once the pair is registered, the JIM can use it as a conversion candidate. The menu is terminated with the Escape or Reset keysym.

The presentation of menus depends on the interface environment in which the JIM is operating. For example, some interfaces support scrolling menus that use the Page Down and Page Up keys. Discussion of these interfaces is outside the scope of this document.

Keymaps:

ja_JP.IBM-eucJP.imkeymap

Ja_JP.IBM-932.imkeymap

Ja_JP.IBM-943.imkeymap

Keysyms:

The JIM uses the keysyms in the XK_KATAKANA, XK_LATIN1, and XK_MISCELLANY groups.

Reserved Keysyms:


XK_BunsetsuYomi 0x1800ff05 Back a phrase to Yomi
XK_MaeKouho 0x1800ff04 Previous candidate
XK_ZenKouho 0x1800ff01 All candidates.
XK_KanjiBangou 0x1800ff02 Kanji number input.
XK_HenkanMenu 0x1800ff03 Changes conversion mode.
XK_LeftDouble 0x1800ff06 Moves cursor two characters left.
XK_RightDouble 0x1800ff07 Moves cursor two characters right.
XK_LeftPhrase 0x1800ff08 Reserved for future use.
XK_RightPhrase 0x1800ff09 Reserved for future use.
XK_ErInput 0x1800ff0a Discards the current pre-edit string
XK_Resetreset 0x1800ff0b Reset

The preceding keysyms are unique to the input method of this system.

XK_Kanji Convert Hiragana to Kanji.
XK_Muhenkan Cancels conversion.
XK_Romaji Puts JIM in Romaji input mode.
XK_Hiragana Puts JIM in Hiragana input mode.
XK_Katakana Puts JIM in Katakana input mode.
XK_Zenkaku_Hankaku Toggles between full-width and half-width character input mode.
XK_Touroku Registers a word to the user dictionary.
XK_Eisu_toggle Puts JIM in alphanumeric input mode.

Related Information

Understanding the ISO Code Sets (ISO Code Sets) in AIX 5L Version 5.1 Kernel Extensions and Device Support Programming Concepts.


[ Previous | Next | Table of Contents | Index | Library Home | Legal | Search ]