Handling Sparse Files on Linux

Sparse files are common in Linux/Unix and are also supported by Windows (e.g. NTFS) and macOSes (e.g. HFS+). Sparse files uses storage efficiently when the files have a lot of holes (contiguous ranges of bytes having the value of zero) by storing only metadata for the holes instead of using real disk blocks. They are especially in case like allocating VM images.

The following image illustrate the structure of a sparse file (image by: User:Sven on Wikimedia).

In this post, we will discuss some common tools and libraries for handling sparse files in Linux environments.

Command line tools for handling sparse files

Linux has a bunch set of tools that can make or handle sparse files.

Create sparse files

You may use truncate or the general dd to create sparse (almost empty) files.

truncate shrinks or extends the size of a file to the specified size. So if the file already exists, truncate only appends holes to its end. If the files does not exist yet, truncate will create the file by default. For example, the following command will create a 20GB empty sparse file or extend/shrink it to 20GB if it already exists.

truncate -s 20g ./vmdisk0

The common dd tools can make sparse files too by dding from /dev/zero. For example, to create a 20GB size vmdisk0, dd can do as follows.

dd if=/dev/zero of=./vmdisk0 bs=1k seek=20480k count=1

Archive or copy sparse files

To efficiently handle sparse files, the kernel and tools should support the SEEK_HOLE/SEEK_DATA functionalities. For details, please check SEEK_HOLE and SEEK_DATA: efficiently archive/copy large sparse files.

If you are using a Linux system with kernel greater or equal to version 3.1, the kernel and tools in it will like already support sparse files. A set of tools that may be used: rsync, tar, cp and more.

Library functions for handling sparse files programmatically

There are a set of C functions available for handling sparse files. Other programming libraries may be built above of them. Some of those that can be used are as follows.

lseek()

If what you want is to create an empty sparse file, lseek could be enough.

off_t lseek(int fd, off_t offset, int whence);

Here is one example of C function using lseek(). The idea is to create a file, seek to the required size and close the file. There will be naturally a large hole in the file.

// -1 on fail
// 0 on success
int create_sparse_file(char *path, uint64_t size)
{
    int fd = 0; 
    fd = open(path, O_RDWR|O_CREAT, 0666);
    if (fd == -1) {
        return -1;
    }    
    if (lseek(fd, size - 1, SEEK_CUR) == -1) {
        return -1;
    }    
    write(fd, "\0", 1);
    close(fd);
    return 0;
}

Check more in lseek() manual.

truncate() and ftruncate()

The truncate() and ftruncate() functions cause the regular file named by path or referenced by fd to be truncated to a size of precisely length bytes.

If the file previously was larger than this size, the extra data is lost. If the file previously was shorter, it is extended, and the extended part reads as null bytes (‘\0’).

int truncate(const char *path, off_t length);
int ftruncate(int fd, off_t length); 

Check more in truncate() manual.

fallocate()

fallocate() allows the caller to directly manipulate the allocated disk space for the file referred to by fd for the byte range starting at offset and continuing for len bytes.

int fallocate(int fd, int mode, off_t offset, off_t len);

Check more in fallocate() manual.

Eric Ma

Eric is a systems guy. Eric is interested in building high-performance and scalable distributed systems and related technologies. The views or opinions expressed here are solely Eric's own and do not necessarily represent those of any third parties.

2 comments:

  1. Worth mentioning, also, is probably the fallocate(1) shell command, and especially its -d (–dig-holes) command-line flag.

    `fallocate -d $file` will analyze and “re-sparsify” $file, by deallocating any of its disk blocks which contain runs of zeroes. Useful if it’s been accidentally expanded to its full size by running it through a tool that didn’t preserve the file’s sparsenes.

Leave a Reply

Your email address will not be published. Required fields are marked *