gendict (1) Linux Manual Page
NAME
gendict – Compiles word list into ICU string trie dictionary
SYNOPSIS
gendict [ –uchars | –bytes –transform transform ] [ -h, -?, –help ] [ -V, –version ] [ -c, –copyright ] [ -v, –verbose ] [ -i, –icudatadir directory ] input-file output-file
DESCRIPTION
gendict reads the word list from dictionary-file and creates a string trie dictionary file. Normally this data file has the .dict extension.
Words begin at the beginning of a line and are terminated by the first whitespace. Lines that begin with whitespace are ignored.
OPTIONS
-h,-?,–help- Print help about usage and exit.
-V,–version- Print the version of
gendictand exit. -c,–copyright- Embeds the standard ICU copyright into the output-file.
-v,–verbose- Display extra informative messages during execution.
-i,–icudatadirdirectory- Look for any necessary ICU data files in directory. For example, the file
pnames.icumust be located when ICU’s data is not built as a shared library. The default ICU data directory is specified by the environment variableICU_DATA. Most configurations of ICU do not require this argument. –uchars- Set the output trie type to UChar. Mutually exclusive with
–bytes. –bytes- Set the output trie type to Bytes. Mutually exclusive with
–uchars. –transform- Set the transform type. Should only be specified with
–bytes.Currently supported transforms are:offset-<hex-number>,which specifies an offset to subtract from all input characters. It should be noted that the offset transform also maps U+200D to 0xFF and U+200C to 0xFE, in order to offer compatibility to languages that require these characters. A transform must be specified for a bytes trie, and when applied to the non-value characters in the input-file must produce output between 0x00 and 0xFF. input-file- The source file to read.
output-file- The file to write the output dictionary to.
CAVEATS
The input-file is assumed to be encoded in UTF-8. The integers in the input-file that are used as values must be made up of ASCII digits. They may be specified either in hex, by using a 0x prefix, or in decimal. Either –bytes or –uchars must be specified.
ENVIRONMENT
ICU_DATA- Specifies the directory containing ICU data. Defaults to
${prefix}/share/icu/70.1/. Some tools in ICU depend on the presence of the trailing slash. It is thus important to make sure that it is present ifICU_DATAis set.
AUTHORS
Maxime Serrano
VERSION
1.0
COPYRIGHT
Copyright (C) 2012 International Business Machines Corporation and others
SEE ALSO
http://www.icu-project.org/userguide/boundaryAnalysis.html
