dirfile_open (3) - Linux Manuals

dirfile_open: open or create a dirfile


dirfile_cbopen, dirfile_open --- open or create a dirfile


#include <getdata.h>
DIRFILE* dirfile_cbopen(const char *dirfilename, unsigned long flags, gd_parser_callback_t sehandler, void *extra);
DIRFILE* dirfile_open(const char *dirfilename, unsigned long flags);


The dirfile_cbopen() function opens or creates the dirfile specified by dirfilename, returning a DIRFILE object associated with it. Opening a dirfile will cause the library to read and parse the dirfile's format file (see dirfile-format(5)).

If not NULL, sehandler should be a pointer to a function which will be called whenever a syntax error is encountered during parsing the format file. Specify NULL for this parameter if no callback function is to be used. The caller may use this function to correct the error or modify the error handling of the format file parser. See The Callback Function section below for details on this function. The extra argument allows the caller to pass data to the callback function. The pointer will be passed to the callback function verbatim.

The dirfile_open() function is equivalent to dirfile_cbopen(), with sehandler and extra set to NULL.

The flags argument should include one of the access modes: GD_RDONLY (read-only) or GD_RDWR (read-write), and may also contain zero or more of the following flags, bitwise-or'd together:

Specifies that raw data on disk is stored as big-endian data (most significant byte first). Specifying this flag along with the contradictory GD_LITTLE_ENDIAN will cause the library to assume that the endianness of the data is opposite to that of the native architecture.

This flag is ignored completely if an ENDIAN directive occurs in the dirfile format file, unless GD_FORCE_ENDIAN is also specified.

An empty dirfile will be created, if one does not already exist. This will create both the dirfile directory and an empty format file. The directory will have have mode S_IRWXU | S_IRWXG | S_IRWXO (0777), modified by the caller's umask value (see umask(2)). The format file will have mode S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH (0666), also modified by the caller's umask.

The owner of the dirfile directory and format file will be the effective user ID of the caller. Group ownership follows the rules outlined in mkdir(2).

Ensure that this call creates a dirfile: when specified along with GD_CREAT, the call will fail if the dirfile specified by dirfilename already exists. Behaviour of this flag is undefined if GD_CREAT is not specified. This flag suffers from all the limitations of the O_EXCL flag as indicated in open(2).
Specifies that ENCODING directives (see dirfile-format(5)) found in the dirfile format file should be ignored. The encoding scheme specified in flags will be used instead (see below).
Specifies that ENDIAN directives (see dirfile-format(5)) found in the dirfile format file should be ignored. When specified with one of GD_BIG_ENDIAN or GD_LITTLE_ENDIAN, the endianness specified will be assumed. If this flag is specified with neither of those flags, the dirfile will be assumed to have the endianness of the native architecture.
If the dirfile format metadata specifies more than one field with the same name, all but one of them will be ignored by the parser. Without this flag, parsing would fail with the GD_E_FORMAT error, possibly resulting in invocation of the registered callback function. Which of the duplicate fields is kept is not specified. As a result, this flag is typically only useful in the case where identical copies of a field specification line are present.

No indication is provided to indicate whether a duplicate field has been discarded. If finer grained control is required, the caller should handle GD_E_FORMAT_DUPLICATE suberrors itself with an appropriate callback function.

Specifies that raw data on disk is stored as little-endian data (least significant byte first). Specifying this flag along with the contradictory GD_BIG_ENDIAN will cause the library to assume that the endianness of the data is opposite to that of the native architecture.

This flag is ignored completely if an ENDIAN directive occurs in the dirfile format file, unless GD_FORCE_ENDIAN is also specified.

Specifies that unrecognised lines found during the parsing of the format file should always cause a fatal error. Without this flag, if a VERSION directive (see dirfile-format(5)) indicates that the dirfile being opened conforms Standards Version newer than the version understood by the library, unrecognised lines will be silently ignored.
When dirfile metadata is flushed to disc (either explicitly via dirfile_metaflush() or dirfile_flush() or implicitly by closing the dirfile), an attempt will be made to create a nicer looking format file (from a human-readable standpoint). What this explicitly means is not part of the API, and any particular behaviour should not be relied on. If the dirfile is opened read-only, this flag is ignored.
If dirfilename specifies an already existing dirfile, it will be truncated before opening. Since dirfile_cbopen() decides whether dirfilename specifies an existing dirfile before attempting to parse the dirfile, dirfilename is considered to specify an existing dirfile if it refers to a directory containing a regular file called format, regardless of the content or form of that file.

Truncation occurs by deleting every regular file in the specified directory, whether the files were referred to by the dirfile before truncation or not. Accordingly, this flag should be used with caution. Subdirectories are left untouched. Notably, this operation does not consider the presence of subdirfiles declared by INCLUDE directives. If the dirfile does not exist, this flag is ignored.

Specifies that whenever an error is triggered by the library when working on this dirfile, the corresponding error string, which can be retrieved by calling get_error_string(3), should be written on standard error by the library. Without this flag, GetData writes nothing to standard error. (GetData never writes to standard output.)

The flags argument may also be bitwise or'd with one of the following symbols indicating the default encoding scheme of the dirfile. Like the endianness flags, the choice of encoding here is ignored if the encoding is specified in the dirfile itself, unless GD_FORCE_ENCODED is also specified. If none of these symbols is present, GD_AUTO_ENCODED is assumed, unless the dirfile_cbopen() call results in creation or truncation of the dirfile. In that case, GD_UNENCODED is assumed. See dirfile-encoding(5) for details on dirfile encoding schemes.

Specifies that the encoding type is not known in advance, but should be detected by the GetData library. Detection is accomplished by searching for raw data files with extensions appropriate to the encoding scheme. This method will notably fail if the the library is called via putdata(3) to create a previously non-existent raw field unless a read is first successfully performed on the dirfile. Once the library has determined the encoding scheme for the first time, it remembers it for subsequent calls.
Specifies that raw data files are compressed using the Burrows-Wheeler block sorting text compression algorithm and Huffman coding, as implemented in the bzip2 format.
Specifies that raw data files are compressed using Lempel-Ziv coding (LZ77) as implemented in the gzip format.
Specifies that raw data files are compressed using the Lempel-Ziv Markov Chain Algorithm (LZMA) as implemented in the xz container format.
Specifies that raw data files are compressed using the slimlib library.
Specifies that raw data files are encoded as text files containing one data sample per line.
Specifies that raw data files are not encoded, but written verbatim to disk.

The Callback Function

The caller-supplied sehandler function is called whenever the format file parser encounters a syntax error (i.e. whenever it would return the GD_E_FORMAT error). This callback may be used to correct the error, or to tell the parser how to recover from it.

This function should take two pointers as arguments, and return an int:

int sehandler(gd_parser_data_t *pdata, void *extra);
The extra parameter is the pointer supplied to dirfile_cbopen(), passed verbatim to this function. It can be used to pass caller data to the callback. GetData does not inspect this pointer, not even to check its validity. If the caller needs to pass no data to the callback, it may be NULL.

The gd_parser_data_t type is a structure with at least the following members:

typedef struct {
  const DIRFILE* dirfile;
  int suberror;
  int linenum;
  const char* filename;
  char* line;

} gd_parser_data_t;
The pdata->dirfile member will be a pointer to a DIRFILE object suitable only for passing to get_error_string(). Notably, the caller should not assume this pointer will be the same as the pointer eventually returned by dirfile_cbopen(), nor that it will be valid after the callback function returns.

The pdata->suberror parameter will be one of the following symbols indicating the type of syntax error encountered:

The line was indecipherable. Typically this means that the line contained neither a reserved word, nor a field type.
The specified field name was invalid.
The samples-per-frame of a RAW field was out-of-range.
The data type of a RAW field was unrecognised.
The first bit of a BIT field was out-of-range.
The last bit of a BIT field was out-of-range.
An invalid character was found in the line, or a character escape sequence was malformed.
The specified field name already exists.
The byte sex specified by an ENDIAN directive was unrecognised.
An unexpected character was encountered in a complex literal.
The parent of a metafield was defined in another fragment.
An attempt was made to add a RAW metafield.
The number of fields of a LINCOM field was out-of-range.
An insufficient number of tokens was found on the line.
The parent of a metafield was not found.
The number of bits of a BIT field was out-of-range.
The protection level specified by a PROTECT directive was unrecognised.
A field was specified with the reserved name INDEX.
The last token of the line was unterminated.

pdata->filename and pdata->linenum members contains the name of the fragment and line number where the syntax error was encountered. The first line in a fragment is line one.

The pdata->line member contains a copy of the line containing the syntax error. This line may be freely modified by the callback function. It will then be reparsed if the callback function returns the symbol GD_SYNTAX_RESCAN (see below). Space is available for at least GD_MAX_LINE_LENGTH characters, including the terminating NUL.

The callback function should return one of the following symbols, which tells the parser how to subsequently handle the error:

The parser should immediately abort parsing the format file and fail with the error GD_E_FORMAT. This is the default behaviour, if no callback function is provided (or if the parser is invoked by calling dirfile_open()).
The parser should continue parsing the format file. However, once parsing has finished, the parser will fail with the error GD_E_FORMAT, even if no further syntax errors are encountered. This behaviour may be used by the caller to identify all lines containing syntax errors in the format file, instead of just the first one.
The parser should ignore the line containing the syntax error completely, and carry on parsing the format file. If no further errors are encountered, the dirfile will be successfully opened.
The parser should rescan the line argument, which replaces the line which originally contained the syntax error. The line is assumed to have been corrected by the callback function. If the line still contains a syntax error, the callback function will be called again. The callback function handles only syntax errors. The parser may still abort early, if a different kind of library error is encountered. Furthermore, although a line may contain more than one syntax error, the parser will only ever report one syntax error per line, even if the callback function returns GD_SYNTAX_CONTINUE.


A call to dirfile_cbopen() or dirfile_open() always returns a pointer to a newly allocated DIRFILE object. The DIRFILE object is an opaque structure containing the parsed dirfile metadata. If an error occurred, the dirfile error will be set to a non-zero error value. The DIRFILE object will also be internally flagged as invalid. Possible error values are:
The library was asked to create or truncate a dirfile opened read-only (i.e. GD_CREAT or GD_TRUNC was specified in flags along with GD_RDONLY).
The library was unable to allocate memory.
The reference field specified by a /REFERENCE directive in the format file (see dirfile-format(5)) was not found, or was not a RAW field.
The registered callback function, sehandler, returned an unrecognised response.
The library was unable to create the dirfile, or the dirfile exists and both GD_CREAT and GD_EXCL were specified.
A syntax error occurred in the format file. See also The Callback Function section above.
An internal error occurred in the library while trying to perform the task. This indicates a bug in the library. Please report the incident to the GetData developers.
The dirfile format file could not be opened, or dirfilename does not specify a valid dirfile.
A file specified in an /INCLUDE directive could not be opened.
The library was unable to truncate the dirfile. The dirfile error may be retrieved by calling get_error(3). A descriptive error string for the last error encountered can be obtained from a call to get_error_string(3). When finished with it, the DIRFILE object should be deallocated with a call to dirfile_close(3), even if the open failed.


GetData's parser assumes it is running on an ASCII-compatible platform. Format file parsing will fail gloriously on an EBCDIC platform.