estcmd (1) Linux Manual Page
NAME
estcmd – command line interface of the core API
SYNOPSIS
estcmd create [-tr] [-apn|-acc] [-xs|-xl|-xh|-xh2|-xh3] [-sv|-si|-sa] [-attr name type] db
estcmd put [-tr] [-cl] [-ws] [-apn|-acc] [-xs|-xl|-xh||-xh2|-xh3] [-sv|-si|-sa] db [file]
estcmd out [-cl] [-pc enc] db expr
estcmd edit [-pc enc] db expr name [value]
estcmd get [-nl|-nb] [-pidx path] [-pc enc] db expr [attr]
estcmd list [-nl|-nb] [-lp] db
estcmd uriid [-nl|-nb] [-pidx path] [-pc enc] db expr
estcmd meta db [name [value]]
estcmd inform [-nl|-nb] db
estcmd optimize [-onp] [-ond] db
estcmd merge [-cl] db target
estcmd repair [-rst|-rsh] db
estcmd search [-nl|-nb] [-pidx path] [-ic enc] [-vu|-va|-vf|-vs|-vh|-vx|-dd] [-sn wnum hnum anum] [-kn num] [-um] [-ec rn] [-gs|-gf|-ga] [-cd] [-ni] [-sf|-sfr|-sfu|-sfi] [-hs] [-attr expr] [-ord expr] [-max num] [-sk num] [-aux num] [-dis name] [-sim id] db [phrase]
estcmd gather [-tr] [-cl] [-ws] [-no] [-fe|-ft|-fh|-fm] [-fx sufs cmd] [-fz] [-fo] [-rm sufs] [-ic enc] [-il lang] [-bc] [-lt num] [-lf num] [-pc enc] [-px name] [-aa name value] [-apn|-acc] [-xs|-xl|-xh|-xh2|-xh3] [-sv|-si|-sa] [-ss name] [-sd] [-cm] [-cs num] [-ncm] [-kn num] [-um] db [file|dir]
estcmd purge [-cl] [-no] [-fc] [-pc enc] [-attr expr] db [prefix]
estcmd extkeys [-no] [-fc] [-dfdb file] [-ncm] [-ni] [-kn num] [-um] [-attr expr] db [prefix]
estcmd words [-nl|-nb] [-dfdb file] [-kw|-kt] db
estcmd draft [-ft|-fh|-fm] [-ic enc] [-il lang] [-bc] [-lt num] [-kn num] [-um] [file]
estcmd break [-ic enc] [-il lang] [-apn|-acc] [-wt] [file]
estcmd iconv [-ic enc] [-il lang] [-oc enc] [file]
estcmd regex [-inv] [-repl str] expr [file]
estcmd scandir [-tf|-td] [-pa|-pu] [dir]
estcmd multi [-db db] [-nl|-nb] [-ic enc] [-gs|-gf|-ga] [-cd] [-ni] [-sf|-sfr|-sfu|-sfi] [-hs] [-hu] [-attr expr] [-ord expr] [-max num] [-sk num] [-aux num] [-dis name] [phrase]
estcmd randput [-ren|-rla|-reu|-ror|-rjp|-rch] [-cs num] db dnum
estcmd wicked db dnum
estcmd regression db
estcmd version
DESCRIPTION
estcmd is an aggregation of sub commands. The name of a sub command is specified by the first argument. Other arguments are parsed according to each sub command. The argument db specifies the path of an index.
estcmd create [-tr] [-apn|-acc] [-xs|-xl|-xh|-xh2|-xh3] [-sv|-si|-sa] [-attr name type] db- Create an index.
If-tris specified, a new index is created regardless if one exists.
If-apnis specified, N-gram analysis is performed against European text also.
If-accis specified, character category analysis is performed instead of N-gram analysis.
If-xsis specified, the index is tuned to register less than 50000 documents.
If-xlis specified, the index is tuned to register more than 300000 documents.
If-xhis specified, the index is tuned to register more than 1000000 documents.
If-xh2is specified, the index is tuned to register more than 5000000 documents.
If-xh3is specified, the index is tuned to register more than 10000000 documents.
If-svis specified, scores are stored as void.
If-siis specified, scores are stored as 32-bit integer.
If-sais specified, scores are stored as-is and marked not to be tuned when search.
-attrspecifies an attribute index and its data type. This option can be specified multiple times. estcmd put [-tr] [-cl] [-apn|-acc] [-xs|-xl|-xh|-xh2|-xh3] [-sv|-si|-sa] db [file]- Register a document of document draft to an index.
file specifies a target file. If it is omitted, the standard input is read.
If-tris specified, a new index is created regardless if one exists.
If-clis specified, regions of a overwritten document are cleaned up.
If-wsis specified, scores are weighted statically with score weighting attribute.
If-apnis specified, N-gram analysis is performed against European text also.
If-accis specified, character category analysis is performed instead of N-gram analysis.
If-xsis specified, the index is tuned to register less than 50000 documents.
If-xlis specified, the index is tuned to register more than 300000 documents.
If-xhis specified, the index is tuned to register more than 1000000 documents.
If-xh2is specified, the index is tuned to register more than 5000000 documents.
If-xh3is specified, the index is tuned to register more than 10000000 documents.
If-svis specified, scores are stored as void.
If-siis specified, scores are stored as 32-bit integer.
If-sais specified, scores are stored as-is and marked not to be tuned when search. estcmd out [-pc enc] [-cl] db expr- Remove information of a document from an index.
expr specifies the ID number, the URI, or the local path of a document.
If-clis specified, regions of the document are cleaned up.
-pcspecifies the encoding of file paths. By default, it is ISO-8859-1. estcmd edit [-pc enc] db expr name [value]- Edit an attribute of a document in an index.
expr specifies the ID number, the URI, or the local path of a document.
name specifies the name of an attribute.
value specifies the value of the attribute. If it is omitted, the attribute is removed.
-pcspecifies the encoding of the file path and the attribute value. By default, it is ISO-8859-1. estcmd get [-nl|-nb] [-pidx path] [-pc enc] db expr [attr]- Output document draft of a document in an index.
expr specifies the ID number, the URI, or the local path of a document.
If attr is specified, only the value of the attribute is output.
If-nlis specified, the index is opened without file locking.
If-nbis specified, file locking is performed without blocking.
-pidxspecifies the path of a pseudo index. This option can be specified multiple times.
-pcspecifies the encoding of file paths. By default, it is ISO-8859-1. estcmd list [-nl|-nb] [-lp] db- Output a list of all document in an index.
If-nlis specified, the index is opened without file locking.
If-nbis specified, file locking is performed without blocking.
If-lpis specified, local path equivalent to URL of "file://" is output. estcmd uriid [-nl|-nb] [-pidx path] [-pc enc] db expr- Output the ID number of a document specified by URI.
expr specifies the URI or the local path of a document.
If-nlis specified, the index is opened without file locking.
If-nbis specified, file locking is performed without blocking.
-pidxspecifies the path of a pseudo index. This option can be specified multiple times.
-pcspecifies the encoding of file paths. By default, it is ISO-8859-1. estcmd meta db [name [value]]- Handle meta data.
name specifies the name of a piece of meta data. If it is omitted, a list of all names is output.
value specifies the value of the meta data to be recorded. If it is omitted, the current value is output. If it is an empty string, the meta data is removed. estcmd inform [-nl|-nb] db- Output the number of documents and the number of unique words in an index.
If-nlis specified, the index is opened without file locking.
If-nbis specified, file locking is performed without blocking. estcmd optimize [-onp] [-ond] db- Optimize an index and clean up dispensable regions.
If-onpis specified, it is omitted to clean up dispensable regions.
If-ondis specified, it is omitted to optimize the database files. estcmd merge [-cl] db target- Merge another index.
target specifies the path of another index.
If-clis specified, regions of overwritten documents are cleaned up. estcmd repair [-rst|-rsh] db- Repair a broken index.
If-rstis specified, strict consistency check is performed.
If-rshis specified, consistency check is omitted. estcmd search [-nl|-nb] [-pidx path] [-ic enc] [-vu|-va|-vf|-vs|-vh|-vx|-dd] [-sn wnum hnum anum] [-kn num] [-um] [-ec rn] [-gs|-gf|-ga] [-cd] [-ni] [-sf|-sfr|-sfu|-sfi] [-hs] [-attr expr] [-ord expr] [-max num] [-sk num] [-aux num] [-dis name] [-sim id] db [phrase]- Search an index for documents.
phrase specifies the search phrase.
If-nlis specified, the index is opened without file locking.
If-nbis specified, file locking is performed without blocking.
-pidxspecifies the path of a pseudo index. This option can be specified multiple times.
-icspecifies the input encoding. By default, it is UTF-8.
If-vuis specified, TSV of ID number and URI are output.
If-vais specified, multipart format including attributes is output.
If-vfis specified, multipart format including document draft is output.
If-vsis specified, multipart format including attributes and snippets is output.
If-vhis specified, human readable format including attributes and snippets is output.
If-vxis specified, XML including including attributes and snippets is output.
If-ddis specified, document draft data are dumped and saved into separated files.
-snspecifies the number of whole width of snippet and width of strings picked up from the beginning of the text and width of strings picked up around each highlighted word.
-knspecifies the number of keywords to be extracted. By default, keyword extraction is not performed.
If-umis specified, morphological analyzers are used for keyword extraction.
-ecspecifies lower limit of similarity eclipse.
If-gsis specified, every key of N-gram is checked. By default, it is alternately.
If-gfis specified, keys of N-gram are checked every three.
If-gais specified, keys of N-gram are checked every four.
If-cdis specified, whether documents match the search phrase definitely is checked.
If-niis specified, TF-IDF tuning is omitted.
If-sfis specified, the phrase is treated as a simplified form.
If-sfris specified, the phrase is treated as a rough form.
If-sfuis specified, the phrase is treated as a union form.
If-sfiis specified, the phrase is treated as an intersection form.
If-hsis specified, score information is output as an attribute.
-attrspecifies an attribute search condition. This option can be specified multiple times.
-ordspecifies the order expression. By default, it is descending by score.
-maxspecifies the maximum number of shown documents. Negative means unlimited. By default, it is 10.
-skspecifies the number of documents to be skipped. By default, it is 0.
-auxspecifies permission to adopt result of the auxiliary index. If it is not more than 0, the auxiliary index is not used. By default, it is 32.
-disspecifies the name of the distinct attribute.
-simspecifies the ID number of the seed document for similarity search. estcmd gather [-tr] [-cl] [-ws] [-no] [-fe|-ft|-fh|-fm] [-fx sufs cmd] [-fz] [-fo] [-rm sufs] [-ic enc] [-il lang] [-bc] [-lt num] [-lf num] [-pc enc] [-px name] [-aa name value] [-apn|-acc] [-xs|-xl|-xh|-xh2|-xh3] [-sv|-si|-sa] [-ss name] [-sd] [-cm] [-cs num] [-ncm] [-kn num] [-um] db [file|dir]- Scan the local file system and register documents into an index.
If the third argument is the name of a file, a list of paths of target documents are read from it. If it is "-", the standard input is specified.
If the third argument is the name of a directory. All files under the directory are treated as target documents.
If-tris specified, a new index is created regardless if one exists.
If-clis specified, regions of overwritten documents are cleaned up.
If-wsis specified, scores are weighted statically with score weighting attribute.
If-nois specified, operations are printed but not executed actually.
If-feis specified, target files are treated as document draft. By default, the format is detected by the suffix of each document.
If-ftis specified, target files are treated as plain text.
If-fhis specified, target files are treated as HTML.
If-fmis specified, target files are treated as MIME.
If-fxis specified, target files with the specified suffixes are processed by the specified outer command. "*" matches any file. If the command is leaded by "T@", the output of the command is treated as plain text. If the command is leaded by "H@", the output of the command is treated as HTML. If the command is leaded by "M@", the output of the command is treated as MIME. Else, the output is treated as document draft. This option can be specified multiple times.
If-fzis specified, documents which do not corresponding to the condition of-fxare ignored.
If-fois specified, target files are not read. It is useful for efficient process of the outer command.
If-rmis specified, target files with the specified suffixes are removed. "*" matches any file. This option can be specified multiple times.
-icspecifies the input encoding. By default, it is detected automatically.
-ilspecifies the preferred input language. By default, English is preferred.
If-bcis specified, binary files are detected and ignored.
-ltspecifies the text size limitation by kilo bytes. By default, it is 128KB. If it is negative, the size is unlimited.
-lfspecifies the file size limitation by mega bytes. By default, it is 32MB. If it is negative, the size is unlimited.
-pcspecifies the encoding of file paths. By default, it is ISO-8859-1.
-pxspecifies the name of an attribute read from the list of paths. As the list of paths can be in TSV format, the first field is treated as the path of a target document, the second field and the followers are definitions of attribute values.-pxspecifies the name of each values of the second field and the followers. This option can be specified multiple times.
-aaspecifies the name and the value of an additional attribute. This option can be specified multiple times.
If-apnis specified, N-gram analysis is performed against European text also.
If-accis specified, character category analysis is performed instead of N-gram analysis.
If-xsis specified, the index is tuned to register less than 50000 documents.
If-xlis specified, the index is tuned to register more than 300000 documents.
If-xhis specified, the index is tuned to register more than 1000000 documents.
If-xh2is specified, the index is tuned to register more than 5000000 documents.
If-xh3is specified, the index is tuned to register more than 10000000 documents.
If-svis specified, scores are stored as void.
If-siis specified, scores are stored as 32-bit integer.
If-sais specified, scores are stored as-is and marked not to be tuned when search.
-ssspecifies the name of an attribute for substitute score.
If-sdis specified, the modification date of each file is recorded as an attribute.
If-cmis specified, documents whose modification date has not changed are ignored.
-csspecifies the size of cache memory by mega bytes. By default, it is 64MB.
If-ncmis specified, checking availability of the virtual memory is omitted.
-knspecifies the number of keywords to be extracted. By default, keyword extraction is not performed.
If-umis specified, morphological analyzers are used for keyword extraction. estcmd purge [-cl] [-no] [-fc] [-pc enc] [-attr expr] db [prefix]- Purge information of documents which do not exist on the file system.
If prefix is specified, only documents whose URIs are begins with it. It can be specified by the local path of a directory.
If-clis specified, regions of the deleted documents are cleaned up.
If-nois specified, operations are printed but not executed actually.
If-fcis specified, information of all target documents are deleted.
-pcspecifies the encoding of file paths. By default, it is ISO-8859-1.
-attrspecifies an attribute search condition. This option can be specified multiple times. estcmd extkeys [-no] [-fc] [-dfdb file] [-ncm] [-ni] [-kn num] [-um] [-attr expr] db [prefix]- Create a database of keywords extracted from documents.
If prefix is specified, only documents whose URIs are begins with it.
If-nois specified, operations are printed but not executed actually.
If-fcis specified, all target documents are processed whichever they have existing records or not.
-dfdbspecifies an outher database of document frequency. By default, document frequency is calculated dynamically according to the index.
If-ncmis specified, checking availability of the virtual memory is omitted.
If-niis specified, TF-IDF tuning is omitted.
-knspecifies the number of keywords to be extracted. By default, it is 32.
If-umis specified, morphological analyzers are used for keyword extraction.
-attrspecifies an attribute search condition. This option can be specified multiple times. estcmd words [-nl|-nb] [-dfdb file] [-kw|-kt] db- Output a list of all unique words and each record size which is treated as docuemnt frequency.
If-nlis specified, the index is opened without file locking.
If-nbis specified, file locking is performed without blocking.
-dfdbspecifies an outer database where the result is stored. By default, the result is output to the standard output as TSV. If the outer database already exists, the value of each record is incremented.
If-kwis specified, keywords and numbers of corresponding documents are output.
If-ktis specified, keywords and their related terms are output. estcmd draft [-ft|-fh|-fm] [-ic enc] [-il lang] [-bc] [-lt num] [-kn num] [-um] [file]- For test and debug.
estcmd break [-ic enc] [-il lang] [-apn|-acc] [-wt] [file]- For test and debug.
estcmd iconv [-ic enc] [-il lang] [-oc enc] [file]- For test and debug.
estcmd regex [-inv] [-repl str] expr [file]- For test and debug.
estcmd scandir [-tf|-td] [-pa|-pu] [dir]- For test and debug.
estcmd multi [-db db] [-nl|-nb] [-ic enc] [-gs|-gf|-ga] [-cd] [-ni] [-sf|-sfr|-sfu|-sfi] [-hs] [-hu] [-attr expr] [-ord expr] [-max num] [-sk num] [-aux num] [-dis name] [phrase]- For test and debug.
estcmd randput [-ren|-rla|-reu|-ror|-rjp|-rch] [-cs num] db dnum- For test and debug.
estcmd wicked db dnum- For test and debug.
estcmd regression db- For test and debug.
estcmd version- Show the version information.
All sub commands return 0 if the operation is success, else return 1. As for put, out, gather, purge, randput, wicked, and regression, they finish with closing the database when they catch the signal 1 (SIGHUP), 2 (SIGINT), 3 (SIGQUIT), 13 (SIGPIPE), or 15 (SIGTERM).
The data type of attribute indexes specified by -attr option of create sub command should be "seq" for sequencial type, "str" for string type, or "num" for number type.
Each pseudo index specified by -pidx option of search sub command and so on is a directory containing files of document draft. If you search a main index with pseudo indexes, meta search of the main index and pseudo indexes is performed.
The encoding name specified by -ic option should be such name registered to IETF as UTF-8, ISO-8859-1, and so on. The language name specified by -il option should be one of "en" (English), "ja" (Japanese, "zh" (Chinese), "ko" (Korean).
The outer command specified by -fx option of gather receives the path of the target document by the first argument and the path for output by the second argument. The original path of the target document is given as the value of the environment variable `ESTORIGFILE’.
Note that similarity search is very slow, by default. To improve the performance of similarity search, running "estcmd extkeys" beforehand is strongly recommended.
SEE ALSO
estconfig(1), estmaster(1), estcall(1), estwaver(1), estraier(3), estnode(3)
Please see http://hyperestraier.sourceforge.net/uguide-en.html for detail.
