[ Previous | Next | Table of Contents | Index | Library Home |
Legal |
Search ]
Files Reference
Defines UCS-2 (Unicode)
conversion mappings for input to the uconvdef command.
Conversion mapping values are
defined using UCS-2 symbolic character names followed by character encoding
(code point) values for the multibyte code set. For example,
<U0020> \x20
represents the mapping between the
<U0020> UCS-2 symbolic character name for the space character
and the \x20 hexadecimal code point for the space character in
ASCII.
In addition to the code set
mappings, directives are interpreted by the uconvdef command to
produce the compiled table. These directives must precede the code set
mapping section. They consist of the following keywords surrounded by
< > (angle brackets), starting in column 1, followed by white space and
the value to be assigned to the symbol:
<code_set_name>
| The name of the coded character set, enclosed in quotation marks (" "),
for which the character set description file is defined.
|
<mb_cur_max>
| The maximum number of bytes in a multibyte character. The default
value is 1.
|
<mb_cur_min>
| An unsigned positive integer value that defines the minimum number of
bytes in a character for the encoded character set. The value is less
than or equal to <mb_cur_max>. If not specified, the
minimum number is equal to <mb_cur_max>.
|
<escape_char>
| The escape character used to indicate that the character following is
interpreted in a special way. This defaults to a backslash (\).
|
<comment_char>
| The character that, when placed in column 1 of a charmap line,
is used to indicate that the line is ignored. The default character is
the number sign (#).
|
<char_name_mask>
| A quoted string consisting of format specifiers for the UCS-2 symbolic
names. This must be a value of AXXXX, indicating an alphabetic
character followed by 4 hexadecimal digits. Also, the alphabetic
character must be a U, and the hexadecimal digits must represent the UCS-2
code point for the character. An example of a symbolic character name
based on this mask is <U0020> Unicode space character.
|
<uconv_class>
| Specifies the type of the code set. It must be one of
the following:
- SBCS
- Single-byte encoding
- DBCS
- Stateless double-byte, single-byte, or mixed encodings
- EBCDIC_STATEFUL
- Stateful double-byte, single-byte, or mixed encodings
- MBCS
- Stateless multibyte encoding
This type is used to
direct uconvdef on what type of table to build. It is also
stored in the table to indicate the type of processing algorithm in the UCS
conversion methods.
|
<locale>
| Specifies the default locale name to be used if locale information is
needed.
|
<subchar>
| Specifies the encoding of the default substitute character in the
multibyte code set.
|
The mapping definition section
consists of a sequence of mapping definition lines preceded by a
CHARMAP declaration and terminated by an END CHARMAP
declaration. Empty lines and lines containing
<comment_char> in the first column are ignored.
Symbolic character names in
mapping lines must follow the pattern specified in the
<char_name_mask>, except for the reserved symbolic name,
<unassigned>, that indicates the associated code points are
unassigned.
Each noncomment line of the
character set mapping definition must be in one of the following
formats:
- "%s %s %s/n",
<symbolic-name>, <encoding>, <comments>
For example:
<U3004> \x81\x57
This format defines a single symbolic character name and a corresponding
encoding.
The encoding part is expressed as
one or more concatenated decimal, hexadecimal, or octal constants in the
following formats:
- "%cd%d",
<escape_char>, <decimal byte value>
- "%cx%x",
<escape_char> , <hexadecimal byte value>
- "%c%o",
<escape_char>, <octal byte value>
Decimal constants are represented by two or more decimal digits preceded
by the escape character and the lowercase letter d, as in
\d97 or \d143. Hexadecimal constants are
represented by two or more hexadecimal digits preceded by an escape character
and the lowercase letter x, as in \x61 or
\x8f. Octal constants are represented by two or more octal
digits preceded by an escape character.
Each constant represents a
single-byte value. When constants are concatenated for multibyte
character values, the last value specifies the least significant octet and
preceding constants specify successively more significant octets.
- "%s.
. .%s %s %s/n", <symbolic-name>, <symbolic-name>,
<encoding>, <comments>
For example:
<U3003>...<U3006> \x81\x56
This format defines a range of symbolic character names and corresponding
encodings. The range is interpreted as a series of symbolic names
formed from the alphabetic prefix and all the values in the range defined by
the numeric suffixes.
The listed encoding value is
assigned to the first symbolic name, and subsequent symbolic names in the
range are assigned corresponding incremental values. For example, the
line:
<U3003>...<U3006> \x81\x56
is interpreted as:
<U3003> \x81\x56
<U3004> \x81\x57
<U3005> \x81\x58
<U3006> \x81\x59
- "<unassigned>
%s. . .%s %s/n", <encoding>, <encoding>,
<comments>
This format defines a range of one
or more unassigned encodings. For example, the line:
<unassigned> \x9b...\x9c
is interpreted as:
<unassigned> \x9b
<unassigned> \x9c
This command is part of Extended
Commands in BOS Extensions 1.
The uconvdef command.
Code Set
Overview in AIX 5L Version 5.1 Kernel Extensions and Device
Support Programming Concepts.
List of UCS-2
Interchange Converters in AIX 5L Version 5.1 General
Programming Concepts: Writing and Debugging Programs.
[ Previous | Next | Table of Contents | Index |
Library Home |
Legal |
Search ]