download-entities (1) - Linux Man Pages

download-entities: download and parse XML Entity definitions

NAME

download-entities - download and parse XML Entity definitions

SYNOPSIS

 $ perl download-entities.pl -i # interactive
 $ perl download-entities.pl > output-file.pm
 $ perl download-entities.pl output-file.pm
 
 # instead of http://www.w3.org/2003/entities/iso9573-2003/
 $ perl download-entities.pl http://my.server.com/entities.html

DESCRIPTION

This script downloads the definitions of XML entities from http://www.w3.org/2003/entities/iso9573-2003/ or from whatever address you give it as an argument. The argument should be an URL (that LWP::UserAgent::get can access) pointing to a document with (absolute or relative) references to files ending with the ".ent" suffix. These files are expected to be DTD's with lines like

 <!ENTITY amp "&#38;" >

The script parses these files and prints the perl module to the standard output. If you wish, you can give ``file'' as another argument to the script and it will then print it to ``file''. You can also specify the output file in the environment variable "OUTPUT_FILE".

The index and the output file are distinguished by the presence of ``://'' substring. If you want to use a locally stored index file (the one with the .ent references), you can access it by saying

 perl download.pl file:///path/to/index.html

Note that the script currently distinguishes between relative and absolute paths by looking at whether the href contains a ``://'' substring. This can lead to crashes when the links look like href=``/path/file.ent''.

Also, the script assumes the links have exactly the format href=``...'' - with double quotes.

Interactive download

In case you run into problems downloading the documents, you can try to run the script with the "-i" or "--interactive" option. This will let you skip downloads or enter alternative URLs for individual documents.

The interactive mode is also triggered when the "INTERACTIVE" environment variable is set to a true value (in Perl sense).

Options

Beside the "--interactive" option, this script also accepts the "--timeout" option. It specifies the timeout for LWP::UserAgent in seconds when downloading. The same is controlled by the "DOWNLOAD_TIMEOUT" environment variable. The defaule (180s) timeout is used when not specified.

 # 10 seconds timeout - croak on failure
 perl download-entities.pl --timeout 10 > XML/Entities/Data.pm
 # 5 seconds timeout - croak on failure
 DOWNLOAD_TIMEOUT=5 perl download-entities.pl > XML/Entities/Data.pm
 # 1 second timeout - ask on failure
 perl download-entities.pl --interactive --timeout 1 > XML/Entities/Data.pm

Dependencies

This script has dependencies that the "XML::Entities" module does not and are therefore not mentioned in the META.yml file. These are "LWP::UserAgent", "File::Basename" and "Fatal".

COPYRIGHT

Copyright 2010 Jan Oldrich Kruza <sixtease [at] cpan.org>. All rights reserved.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.