libraptor (3) Linux Manual Page
NAME
libraptor – Raptor RDF parser and serializer library
SYNOPSIS
#include <raptor.h>
raptor_init();
raptor_parser *p=raptor_new_parser(rdfxml);
raptor_set_statement_handler(p,NULL,print_triples);
raptor_uri *file_uri=raptor_new_uri(http://example.org/);
raptor_parse_file(p,file_uri,base_uri);
raptor_parse_uri(p,uri,NULL);
raptor_free_parser(p);
raptor_free_uri(file_uri);
raptor_finish();
cc file.c -lraptor
DESCRIPTION
The Raptor library provides a high-level interface to a set of parsers and serializers that generate Resource Description Framework (RDF) triples by parsing syntaxes or serialize the triples into syntaxes.
The supported parsing syntaxes are RDF/XML, N-Triples, Turtle, RSS tag soup including Atom 0.3 and the serializing syntaxes are RDF/XML, N-Triples and RSS 1.0. The RDF/XML parser can use either expat or libxml XML parsers for providing the SAX event stream. The library functions are arranged in an object-oriented style with constructors, destructors and method calls. The statements and error messages are delivered via callback functions.
Raptor contains a URI-reference parsing and resolving (not retrieval) class (raptor_uri) sufficient for dealing with URI-references inside RDF. This functionality is modular and can be transparently replaced with another existing and compatible URI implementation.
It also provides a URI-retrieval class (raptor_www) for wrapping existing library such as libcurl, libxml2 or BSD libfetch that provides full or partial retrieval of data from URIs and an I/O stream abstraction (raptor_iostream) for supportin serializing to a variety of outputs.
Raptor uses Unicode strings for RDF literals and URIs and preserves them throughout the library. It uses the UTF-8 encoding of Unicode at the API for passing in or returning Unicode strings. It is intended that the preservation of Unicode for URIs will support Internationalized Resource Identifiers (IRIs) which are still under development and standardisation.
LIBRARY INITIALISATION AND CLEANUP
raptor_init()raptor_finish()- Initialise and cleanup the library. These must be called before any raptor class such as raptor_parser, raptor_uri is created or used.
PARSER CLASS
This class provides the functionality of turning syntaxes into RDF triples – RDF parsing.
PARSER CONSTRUCTORS
raptor_parser* raptor_new_parser(name)- Create a new raptor parser object for the parser with name name currently either "rdfxml", "turtle" or "rss-tag-soup" for the RSS Tag Soup parser.
raptor_parser* raptor_new_parser_for_content(raptor_uri *uri, const char *mime_type, const unsigned char *buffer, size_t len, const unsigned char *identifier)- Create a new raptor parser object for a syntax identified by URI uri, MIME type mime_type, some initial content buffer of size len or content with identifier identifier. See the raptor_guess_parser_name description for further details.
PARSER DESTRUCTOR
void raptor_free_parser(raptor_parser *parser)- Destroy a Raptor parser object.
PARSER MESSAGE CALLBACK METHODS
Several methods can be registered for the parser that return a variable-argument message in the style of printf(3). These also return a raptor_locator that can contain URI, file, line, column and byte counts of where the message is about. This structure can be used with the raptor_format_locator, raptor_print_locator functions below or the structures fields directly, which are defined in raptor.h
void raptor_set_fatal_error_handler(raptor_parser*parser, void *user_data, raptor_message_handler handler)- Set the parser fatal error handler callback.
void raptor_set_error_handler(raptor_parser*parser, void *user_data, raptor_message_handlerhandler)- Set the parser non-fatal error handler callback.
void raptor_set_warning_handler(raptor_parser*parser, void *user_data, raptor_message_handlerhandler)- Set the parser warning message handler callback.
raptor_set_namespace_handler(raptor_parser*parser, void* user_data, raptor_namespace_handler handler)- Set the namespace declaration handler callback.
PARSER STATEMENT CALLBACK METHOD
The parser allows the registration of a callback function to return the statements to the application.
void raptor_set_statement_handler(raptor_parser*parser, void *user_data, raptor_statement_handlerhandler)- Set the statement callback function for the parser. The raptor_statement structure is defined in raptor.h and includes fields for the subject, predicate, object of the statements along with their types and for literals, language and datatype.
PARSER PARSING METHODS
These methods perform the entire parsing in one method. Statements warnings, errors and fatal errors are delivered via the registered statement, error etc. handler functions.
In both of these methods, the base URI is required for the RDF/XML parser (name "rdfxml") and Turtle parser (name "turtle"). The N-Triples parser (name "ntriples") or RSS Tag Soup parser (name "rss-tag-soup") do not use this.
int raptor_parse_file(raptor_parser*parser, raptor_uri *uri, raptor_uri *base_uri)- Parse the given filename (a URI like file:filename) according to the optional base URI base_uri. If uri is NULL, read from standard input and base_uri is then required.
int raptor_parse_file_stream(raptor_parser*parser, FILE*stream, const char*filename, raptor_uri *base_uri)- Parse the given C FILE* stream according to the base URI base_uri (required). filename is optional and if given, is used for error messages via the raptor_locator structure.
int raptor_parse_uri(raptor_parser*parser, raptor_uri*uri, raptor_uri *base_uri)- Parse the URI according to the base URI base_uri, or NULL if not needed. If no base URI is given, the uri is used. This method depends on the raptor_www subsystem (see
WWW Classsection below) and an existing underlying URI retrieval implementation such as libcurl, libxml or BSD libfetch to retrieve the content.
PARSER CHUNKED PARSING METHODS
These methods perform the parsing in parts by working on multiple chunks of memory passed by the application. Statements warnings, errors and fatal errors are delivered via the registered statement, error etc. handler functions.
int raptor_start_parse(raptor_parser*parser, const char *uri)- Start a parse of chunked content with the base URI uri or NULL if not needed. The base URI is required for the RDF/XML parser (name "rdfxml") and Turtle parser (name "turtle"). The N-Triples parser (name "ntriples") or RSS Tag Soup parser (name "rss-tag-soup") do not use this.
int raptor_parse_chunk(raptor_parser*parser, const unsigned char *buffer, size_tlen, intis_end)- Parse the memory at buffer of size len returning statements via the statement handler callback. If is_end is non-zero, it indicates the end of the parsing stream. This method can only be called after
raptor_start_parse().
PARSER UTILITY METHODS
const char* raptor_get_mime_type(raptor_parser*rdf_parser)- Return the MIME type for the parser.
void raptor_set_parser_strict(raptor_parser *parser, intis_strict)- Set the parser to strict (is_strict not zero) or lax (default) mode. The detail of the strictness can be controlled by raptor_set_feature.
int raptor_set_feature(raptor_parser *parser, raptor_featurefeature, intvalue)- Set a parser feature feature to a particular value. Returns non 0 on failure or if the feature is unknown. The current defined parser features are:
Feature Values
RAPTOR_FEATURE_ALLOW_BAGIDBoolean (non 0 true)
RAPTOR_FEATURE_ALLOW_NON_NS_ATTRIBUTESBoolean (non 0 true)
RAPTOR_FEATURE_ALLOW_OTHER_PARSETYPESBoolean (non 0 true)
RAPTOR_FEATURE_ALLOW_RDF_TYPE_RDF_LISTBoolean (non 0 true)
RAPTOR_FEATURE_ASSUME_IS_RDFBoolean (non 0 true)
RAPTOR_FEATURE_CHECK_RDF_IDBoolean (non 0 true)
RAPTOR_FEATURE_HTML_LINKBoolean (non 0 true)
RAPTOR_FEATURE_HTML_TAG_SOUPBoolean (non 0 true)
RAPTOR_FEATURE_MICROFORMATSBoolean (non 0 true)
RAPTOR_FEATURE_NON_NFC_FATALBoolean (non 0 true)
RAPTOR_FEATURE_NORMALIZE_LANGUAGEBoolean (non 0 true)
RAPTOR_FEATURE_NO_NETBoolean (non 0 true)
RAPTOR_FEATURE_RELATIVE_URISBoolean (non 0 true)
RAPTOR_FEATURE_SCANNINGBoolean (non 0 true)
RAPTOR_FEATURE_WARN_OTHER_PARSETYPESBoolean (non 0 true)
RAPTOR_FEATURE_WWW_TIMEOUTInteger
RAPTOR_FEATURE_WWW_HTTP_CACHE_CONTROLString
RAPTOR_FEATURE_WWW_HTTP_USER_AGENTString If the allow_bagid feature is true (default true) then the RDF/XML parser will support the rdf:bagID attribute that was removed from the RDF/XML language when it was revised. This support may be removed in future. If the allow_non_ns_attributes feature is true (default true), then the RDF/XML parser will allow non-XML namespaced attributes to be accepted as well as rdf: namespaced ones. For example, ‘about’ and ‘ID’ will be interpreted as if they were rdf:about and rdf:ID respectively. If the allow_other_parsetypes feature is true (default true) then the RDF/XML parser will allow unknown parsetypes to be present and will pass them on to the user. Unimplemented at present. If the allow_rdf_type_rdf_list feature is true (default false) then the RDF/XML parser will generate the idList rdf:type rdf:List triple in the handling of rdf:parseType="Collection". This triple was removed during the revising of RDF/XML after collections were initially added. If the assume_is_rdf feature is true (default false), then the RDF/XML parser will assume the content is RDF/XML, not require that rdf:RDF root element, and immediately interpret the content as RDF/XML. If the check_rdf_id feature is true (default true) then rdf:ID values will be checked for duplicates and cause an error if found. if the html_link feature is true (default true), look for head <link> to type rdf/xml for GRDDL parser If the html_tag_soup feature is true (default true), use a lax HTML parser if an XML parser fails when read HTML for GRDDL parser. If the microformats feature is true (default true), look for microformats for GRDDL parser. If the non_nfc_fatal feature is true (default false) then illegal Unicode Normal Form C in literals will give a fatal error, otherwise it gives a warning. If the normalize_language feature is true (default true) then XML language values such as from xml:lang will be normalized to lowercase. If the no_net feature is true (default false) then network requests are denied. If the scanning feature is true (default false), then the RDF/XML parser will look for embedded rdf:RDF elements inside the XML content, and not require that the XML start with an rdf:RDF root element. If the www_timeout feature is set to an integer larger than 0, it sets the timeout in seconds for internal WWW URI requests for the GRDDL parser. If the www_http_cache_control feature is set to a string value (default none), it is sent as the value of the HTTP Cache-Control: header in requests. If the www_http_user_agent feature is set to a string value, it is sent as the value of the HTTP User-Agent: header in requests. raptor_parser_set_feature_string(raptor_parser *parser, raptor_feature feature, const unsigned char *value)- Set a parser feature feature to a particular string value. Returns non 0 on failure or if the feature is unknown. The current defined parser features are given in
raptor_set_featureand at present only take integer values. If an integer value feature is set with this function, value is interpreted as an integer and then that value is used.int raptor_get_feature(raptor_parser*parser, raptor_featurefeature)- Get parser feature integer values. The allowed feature values and types are given under
raptor_features_enumerate.const unsigned char* raptor_parser_get_feature_string(raptor_parser *parser, raptor_feature feature)- Get parser feature string values. The allowed feature values and types are given under
raptor_features_enumerate.unsigned int raptor_get_feature_count(void)- Get the count of features defined. Prefered to the compile time-only symbol
RAPTOR_FEATURE_LASTwhich returns the maximum value, not the count. Added raptor_get_need_base_uriint raptor_feature_value_type(const raptor_feature feature)- Get a raptor feature value tyype – integer or string.
raptor_locator* raptor_get_locator(raptor_parser*rdf_parser)- Return the current raptor_locator object for the parser. This is a public structure defined in raptor.h that can be used directly, or formatted via raptor_print_locator.
void raptor_get_name(raptor_parser *parser)- Return the string short name for the parser.
void raptor_get_label(raptor_parser *parser)- Return a string label for the parser.
void raptor_set_default_generate_id_parameters(raptor_parser*rdf_parser, char *prefix, intbase)- Control the default method for generation of IDs for blank nodes and bags. The method uses a short string prefix and an integer base to generate the identifier which is not guaranteed to be a strict concatenation. If prefix is NULL, the default is used. If base is less than 1, it is initialised to 1.
void raptor_set_generate_id_handler(raptor_parser*parser, void *user_data, raptor_generate_id_handlerhandler)- Allow full customisation of the generated IDs by setting a callback handler and associated user_data that is called whenever a blank node or bag identifier is required. The memory returned is deallocated inside raptor. Some systems require this to be allocated inside the same library, in which case the
raptor_alloc_memoryfunction may be useful.void raptor_parser_set_uri_filter(raptor_parser*parser, raptor_uri_filter_func filter, void* user_data)- Set the URI filter function filter for URIs retrieved during parsing by the the raptor_parser.
int raptor_get_need_base_uri(raptor_parser*rdf_parser)- Get a boolean whether this parser needs a base URI to start parsing.
unsigned char* raptor_parser_generate_id(raptor_parser*rdf_parser, raptor_genid_type type)- Generate an ID for a parser of type type, either
RAPTOR_GENID_TYPE_BNODEIDorRAPTOR_GENID_TYPE_BAGID. This uses any configuration set byraptor_set_generate_id_handler.void raptor_set_graph_handler(raptor_parser*parser, void* user_data, raptor_graph_handler handler)- Set the graph handler callback.
PARSER UTILITY FUNCTIONS
int raptor_parsers_enumerate(const unsigned intcounter, const char **name, const char **label)- Return the parser name/label for a parser with a given integer counter, returning non-zero if no such parser at that offset exists. The counter should start from 0 and be incremented by 1 until the function returns non-zero.
int raptor_syntaxes_enumerate(const unsigned intcounter, const char **name, const char **label, const char **mime_type, const unsigned char **uri-string)- Return the name, label, mime type or URI string (all optional) for a parser syntax with a given integer counter, returning non-zero if no such syntax parser at that offset exists. The counter should start from 0 and be incremented by 1 until the function returns non-zero.
int raptor_features_enumerate(const raptor_featurefeature, const char **name, raptor_uri **uri, const char **label)- Return the name, URI, string label (all optional) for a parser feature, returning non-zero if no such feature exists. Raptor features have URIs that are constructed from the URI http://feature.librdf.org/raptor- and the name so for example feature scanForRDF has URI http://feature.librdf.org/raptor-scanForRDF
int raptor_syntax_name_check(const char *name)- Check name is a known syntax name.
const char* raptor_guess_parser_name(raptor_uri *uri, const char *mime_type, const unsigned char *buffer, size_t len, const unsigned char *identifier)- Guess a parser name for a syntax identified by URI uri, MIME type mime_type, some initial content buffer of size len or with content identifier identifier. All of these parameters are optional and only used if not NULL. The parser is chosen by scoring the hints that are given.
raptor_feature raptor_feature_from_uri(raptor_uri *uri)- Turn a URI uri into a raptor feature identifier, or <0 if the feature is unknown. The URIs are described below raptor_set_feature.
STATEMENT UTILITY FUNCTIONS
int raptor_statement_compare(const raptor_statement *s1, const raptor_statement *s2)- Compare two statements and return an ordering between them.
void raptor_print_statement(const raptor_statement* conststatement, FILE *stream)- Print a raptor statement object in a simple format for debugging only. The format of this output is not guaranteed to remain the same between releases.
void raptor_print_statement_as_ntriples(const raptor_statement*statement, FILE *stream)- Print a raptor statement object in N-Triples format, using all the escapes as defined in http://www.w3.org/TR/rdf-testcases/#ntriples
raptor_statement_part_as_counted_string(const void *term, raptor_identifier_typetype, raptor_uri*literal_datatype, const unsigned char *literal_language, size_t*len_p)char* raptor_statement_part_as_string(const void *term, raptor_identifier_typetype, raptor_uri*literal_datatype, const unsigned char *literal_language)- Turns part of raptor statement into N-Triples format, using all the escapes as defined in http://www.w3.org/TR/rdf-testcases/#ntriples The part (subject, predicate, object) of the raptor_statement is passed in as term, the part type (subject_type, predicate_type, object_type) is passed in as type. When the part is a literal, the literal_datatype and literal_language fields are set, otherwise NULL (usually object_datatype, object_literal_language).
- If
raptor_statement_part_as_counted_stringis used, the length of the returned string is stored in *len_p if not NULL.
LOCATOR UTILITY FUNCTIONS
int raptor_format_locator(char *buffer, size_tlength, raptor_locator*locator)- This method takes a raptor_locator object as passed to an error, warning or other handler callback and formats it into the buffer of size length bytes. If buffer is NULL or length is insufficient for the size of the formatted locator, returns the number of additional bytes required in the buffer to write the locator.
In particular, if this form is used:
length=raptor_format_locator(NULL, 0, locator) it will return in length the size of a buffer that can be allocated for locator and a second call will perform the formatting:
raptor_format_locator(buffer, length, locator) void raptor_print_locator(FILE *stream, raptor_locator*locator)- This method takes a raptor_locator object as passed to an error, warning or other handler callback, formats and prints it to the given stdio stream.
int raptor_locator_line(raptor_locator *locator)- Returns the line number in a locator structure or <0 if not available.
int raptor_locator_column(raptor_locator *locator)- Returns the column number in a locator structure or <0 if not available.
int raptor_locator_byte(raptor_locator *locator)- Returns the byte offset in a locator structure or <0 if not available.
const char * raptor_locator_file(raptor_locator *locator)- Returns the filename in a locator structure or NULL if not available. Note the returned pointer is to a shared string that must be copied if needed.
const char * raptor_locator_uri(raptor_locator *locator)- Returns the URI string in a locator structure or NULL if not available. Note this does not return a raptor_uri* pointer and the returned pointer is to a shared string that must be copied if needed.
N-TRIPLES UTILITY FUNCTIONS
void raptor_print_ntriples_string(FILE*stream, const char*string, const chardelim)- This is a standalone function that prints the given string according to N-Triples escaping rules, expecting to be terminated by delimiter delim which is usually either ‘, dq or <. If a null delimiter
