pdf2djvu (1) Linux Manual Page
pdf2djvu – creates DjVu files from PDF files
Synopsis
- pdf2djvu [{-o
| –output} output-djvu-file] [option…] pdf-file…
- pdf2djvu {-i
| –indirect} index-djvu-file [option…] pdf-file…
- pdf2djvu {–version | –help | -h}
- pdf2djvu {-i
Description
Options
pdf2djvu accepts the following options:Document type, file names
-o, –output=output-djvu-file- Generate a bundled multi-page document. Write the file into output-djvu-file instead of standard output.
-i, –indirect=index-djvu-file
- Generate an indirect multi-page document. Use index-djvu-file as the index file name; put the component files into the same directory. The directory must exist and be writable.
–pageid-template=template
- Specifies the naming scheme for page identifiers. Consult the “TEMPLATE LANGUAGE” section for the template language description.
The default template is “p{page:04*}.djvu”.
For portability reasons, page identifiers:
- • must consist only of lowercase ASCII letters, digits, _, +, – and dot,
- • cannot start with a +, – or a dot,
- • cannot contain two consecutive dots,
- • must end with the .djvu or the .djv extension.
–pageid-prefix=prefix
- Equivalent to “–pageid-template=prefix{page:04*}.djvu”.
–page-title-template=template
- Specifies the template for page titles. Consult the “TEMPLATE LANGUAGE” section for the template language description.
The default is to set no page titles.
Resolution, page size
-d, –dpi=resolution- Specifies the desired resolution to resolution dots per inch. The default is 300 dpi. The allowed range is: 72 ≤ resolution ≤ 6000.
–media-box
- Use MediaBox to determine page size. CropBox is used by default.
–page-size=widthxheight
- Specifies the preferred page size to width pixels × height pixels. The actual page size may be altered in order to respect aspect ratio and DjVu limitations on resolution. (This option takes precedence over -d/–dpi.)
–guess-dpi
- Try to guess native resolution by inspecting embedded images. Use with care.
Image quality
–bg-slices=n+…+n, –bg-slices=n,…,n- Specifies the encoding quality of the IW44 background layer. This option is similar to the -slice option of c44. Consult the c44(1) manual page for details. The default is 72+11+10+10.
–bg-subsample=n
- Specifies the background subsampling ratio. The default is 3. Valid values are integers between 1 and 12, inclusive.
–fg-colors=default
- Try to preserve all the foreground layer colors. This is the default.
–fg-colors=web
- Reduce foreground layer colors to the web palette (216 colors). This option is not recommended.
–fg-colors=n
- Use GraphicsMagick to reduce number of distinct colors in the foreground layer to n. Valid values are integers between 1 and 4080. This option is not recommended.
–fg-colors=black
- Discard any color information from the foreground layer.
–monochrome
- Render pages as monochrome bitmaps. With this option, –bg-… and –fg-… options are not respected.
–loss-level=n
- Specifies the aggressiveness of the lossy compression. The default is 0 (lossless). Valid values are integers between 0 and 200, inclusive. This option is similar to the -losslevel option of cjb2; consult the cjb2(1) manual page for details. This option is respected only along with the –monochrome option.
–lossy
- Synonym for –loss-level=100.
–anti-alias
- Enable font and vector anti-aliasing. This option is not recommended.
Extraction
–no-metadata- Don’t extract the metadata.
By default:
- • The following entries of the document information dictionary are extracted: Title, Author, Subject, Creator, Producer, CreationDate, ModDate. Timestamps are formatted according to m[blue]RFC 3999m[][1], with date and time components separated by a single space.
- • The XMP metadata is extracted (or created) and updated accordingly.
- Note
If multiple input documents are specified, only metadata of the first one is taken into account.
–verbatim-metadata
- Keep the original metadata intact.
–no-outline
- Don’t extract the document outline.
–hyperlinks=border-avis
- Make hyperlink borders always visible.
By default, a hyperlink border is visible only when the mouse is over the hyperlink.
–hyperlinks=#RRGGBB
- Force the specified border color for hyperlinks.
–no-hyperlinks, –hyperlinks=none
- Don’t extract hyperlinks.
–no-text
- Don’t extract the text.
–words
- Extract the text. Record the location of every word. This is the default.
–lines
- Extract the text. Record the location of every line, rather that every word.
–crop-text
- Extract no text outside the page boundary.
–no-nfkc
- Don’t m[blue]NFKCm[][2]-normalize the text.
–filter-text=command-line
- Filter the text through the command-line. The provided filter must preserve whitespace, control characters and decimal digits.
This option implies –no-nfkc.
-p, –pages=page-range
- Specifies pages to convert. page-range is a comma-separated list of sub-ranges. Each sub-range is either a single page (e.g. 17) or a contiguous range of pages (e.g. 37-42). Pages are numbered from 1.
The default is to convert all pages.
Performance
-j, –jobs=n- Use n threads to perform conversion. The default is to use one thread.
-j0, –jobs=0
- Determine automatically how many threads to use to perform conversion.
Verbosity, help
-v, –verbose- Display more informational messages while converting the file.
-q, –quiet
- Don’t display informational messages while converting the file.
–version
- Output version information and exit.
-h, –help
- Display help and exit.
Environment
The following environment variables affects pdf2djvu on Unix systems: OMP_*
- Details of runtime behaviour with respect to parallelism can be controlled by several environment variables. Please refer to the m[blue]OpenMP API specificationm[][3] for details.
TMPDIR
- pdf2djvu makes heavy use of temporary files. It will store them in a directory specified by this variable. The default is /tmp.
Template Language
Template syntax
The template language is roughly modelled on the m[blue]Python string formatting syntaxm[][4]. A template is a piece of text which contains fields, surrounded by curly braces {}. Fields are replaced with appropriately formatted values when the template is evaluated. Moreover, {{ is replaced with a single { and }} is replaced with a single }.
Field syntax
Each field consists of a variable name, optionally followed by a shift, optionally followed by a format specification.The shift is a signed (i.e. starting with a + or – character) integer.
The format specification consists of a colon, followed by a width specification.
The width specification is a decimal integer defining the minimum field width. If not specified, then the field width will be determined by the content. Preceding the width specification with a zero (0) character enables zero-padding.
The width specification is optionally followed by an asterisk (*) character, which increases the minimum field width to the width of the longest possible content of the variable.
Available variables
page, spage- Page number in the PDF document.
dpage
- Page number in the DjVu document.
Implementation Details
Layer separation algorithm
Unless the –monochrome option is on, pdf2djvu uses the following naïve layer separation algorithm:- 1. For each page, do the following:
- 1. Raster the page into a pixmap, in the usual manner.
- 2. Raster the page into another pixmap, omitting the following page elements:
- • text,
- • 1 bit-per-pixel raster images,
- • vector elements (except fills of large areas).
- 3. Compare both pixmaps, pixel by pixel:
- 1. If their colors match, classify the pixel as a part of the background layer.
- 2. Otherwise, classify the pixel as a part of the foreground layer.
Bug Reports
If you find a bug in pdf2djvu, please report it at m[blue]the issue trackerm[][5].See Also
djvu(1), djvudigital(1), csepdjvu(1)Author
Jakub Wilk <jwilk [at] jwilk.net>- Author.
Notes
- 1.
- RFC 3999
- 2.
- NFKC
- 3.
- OpenMP API specification
- 4.
- Python string formatting syntax
- 5.
- the issue tracker
