download-entities (1) - Linux Man Pages
download-entities: download and parse XML Entity definitions
download-entities - download and parse XML Entity definitions
$ perl download-entities.pl -i # interactive $ perl download-entities.pl > output-file.pm $ perl download-entities.pl output-file.pm # instead of http://www.w3.org/2003/entities/iso9573-2003/ $ perl download-entities.pl http://my.server.com/entities.html
DESCRIPTIONThis script downloads the definitions of XML entities from http://www.w3.org/2003/entities/iso9573-2003/ or from whatever address you give it as an argument. The argument should be an URL (that LWP::UserAgent::get can access) pointing to a document with (absolute or relative) references to files ending with the ".ent" suffix. These files are expected to be DTD's with lines like
<!ENTITY amp "&" >
The script parses these files and prints the perl module to the standard output. If you wish, you can give ``file'' as another argument to the script and it will then print it to ``file''. You can also specify the output file in the environment variable "OUTPUT_FILE".
The index and the output file are distinguished by the presence of ``://'' substring. If you want to use a locally stored index file (the one with the .ent references), you can access it by saying
perl download.pl file:///path/to/index.html
Note that the script currently distinguishes between relative and absolute paths by looking at whether the href contains a ``://'' substring. This can lead to crashes when the links look like href=``/path/file.ent''.
Interactive downloadIn case you run into problems downloading the documents, you can try to run the script with the "-i" or "--interactive" option. This will let you skip downloads or enter alternative URLs for individual documents.
OptionsBeside the "--interactive" option, this script also accepts the "--timeout" option. It specifies the timeout for LWP::UserAgent in seconds when downloading. The same is controlled by the "DOWNLOAD_TIMEOUT" environment variable. The defaule (180s) timeout is used when not specified.
# 10 seconds timeout - croak on failure perl download-entities.pl --timeout 10 > XML/Entities/Data.pm # 5 seconds timeout - croak on failure DOWNLOAD_TIMEOUT=5 perl download-entities.pl > XML/Entities/Data.pm # 1 second timeout - ask on failure perl download-entities.pl --interactive --timeout 1 > XML/Entities/Data.pm
DependenciesThis script has dependencies that the "XML::Entities" module does not and are therefore not mentioned in the META.yml file. These are "LWP::UserAgent", "File::Basename" and "Fatal".
COPYRIGHTCopyright 2010 Jan Oldrich Kruza <sixtease [at] cpan.org>. All rights reserved.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.