(1) - Linux Manuals download and parse XML Entity definitions


download-entities - download and parse XML Entity definitions


 $ perl -i # interactive
 $ perl >
 $ perl
 # instead of
 $ perl


This script downloads the definitions of XML entities from or from whatever address you give it as an argument. The argument should be an URL (that LWP::UserAgent::get can access) pointing to a document with (absolute or relative) references to files ending with the ".ent" suffix. These files are expected to be DTD's with lines like

 <!ENTITY amp "&#38;" >

The script parses these files and prints the perl module to the standard output. If you wish, you can give ``file'' as another argument to the script and it will then print it to ``file''. You can also specify the output file in the environment variable "OUTPUT_FILE".

The index and the output file are distinguished by the presence of ``://'' substring. If you want to use a locally stored index file (the one with the .ent references), you can access it by saying

 perl file:///path/to/index.html

Note that the script currently distinguishes between relative and absolute paths by looking at whether the href contains a ``://'' substring. This can lead to crashes when the links look like href=``/path/file.ent''.

Also, the script assumes the links have exactly the format href=``...'' - with double quotes.

Interactive download

In case you run into problems downloading the documents, you can try to run the script with the "-i" or "--interactive" option. This will let you skip downloads or enter alternative URLs for individual documents.

The interactive mode is also triggered when the "INTERACTIVE" environment variable is set to a true value (in Perl sense).


Beside the "--interactive" option, this script also accepts the "--timeout" option. It specifies the timeout for LWP::UserAgent in seconds when downloading. The same is controlled by the "DOWNLOAD_TIMEOUT" environment variable. The defaule (180s) timeout is used when not specified.

 # 10 seconds timeout - croak on failure
 perl --timeout 10 > XML/Entities/
 # 5 seconds timeout - croak on failure
 DOWNLOAD_TIMEOUT=5 perl > XML/Entities/
 # 1 second timeout - ask on failure
 perl --interactive --timeout 1 > XML/Entities/


This script has dependencies that the "XML::Entities" module does not and are therefore not mentioned in the META.yml file. These are "LWP::UserAgent", "File::Basename" and "Fatal".


Copyright 2010 Jan Oldrich Kruza <sixtease [at]>. All rights reserved.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.