| | |

Handling Sparse Files on Linux

Sparse files are common in Linux/Unix and are also supported by Windows (e.g. NTFS) and macOSes (e.g. HFS+). Sparse files uses storage efficiently when the files have a lot of holes (contiguous ranges of bytes having the value of zero) by storing only metadata for the holes instead of using real disk blocks. They are especially in case like allocating VM images.

The following image illustrate the structure of a sparse file (image by: User:Sven on Wikimedia).

In this post, we will discuss some common tools and libraries for handling sparse files in Linux environments.

Command line tools for handling sparse files

Linux has a bunch set of tools that can make or handle sparse files.

Create sparse files

You may use truncate or the general dd to create sparse (almost empty) files.

truncate shrinks or extends the size of a file to the specified size. So if the file already exists, truncate only appends holes to its end. If the files does not exist yet, truncate will create the file by default. For example, the following command will create a 20GB empty sparse file or extend/shrink it to 20GB if it already exists.

truncate -s 20g ./vmdisk0

The common dd tools can make sparse files too by dding from /dev/zero. For example, to create a 20GB size vmdisk0, dd can do as follows.

dd if=/dev/zero of=./vmdisk0 bs=1k seek=20480k count=1

Archive or copy sparse files

To efficiently handle sparse files, the kernel and tools should support the SEEK_HOLE/SEEK_DATA functionalities. For details, please check SEEK_HOLE and SEEK_DATA: efficiently archive/copy large sparse files.

If you are using a Linux system with kernel greater or equal to version 3.1, the kernel and tools in it will like already support sparse files. A set of tools that may be used: rsync, tar, cp and more.

Library functions for handling sparse files programmatically

There are a set of C functions available for handling sparse files. Other programming libraries may be built above of them. Some of those that can be used are as follows.

lseek()

If what you want is to create an empty sparse file, lseek could be enough.

off_t lseek(int fd, off_t offset, int whence);

Here is one example of C function using lseek(). The idea is to create a file, seek to the required size and close the file. There will be naturally a large hole in the file.

// -1 on fail
// 0 on success
int create_sparse_file(char *path, uint64_t size)
{
    int fd = 0; 
    fd = open(path, O_RDWR|O_CREAT, 0666);
    if (fd == -1) {
        return -1;
    }    
    if (lseek(fd, size - 1, SEEK_CUR) == -1) {
        return -1;
    }    
    write(fd, "\0", 1);
    close(fd);
    return 0;
}

Check more in lseek() manual.

truncate() and ftruncate()

The truncate() and ftruncate() functions cause the regular file named by path or referenced by fd to be truncated to a size of precisely length bytes.

If the file previously was larger than this size, the extra data is lost. If the file previously was shorter, it is extended, and the extended part reads as null bytes (‘\0’).

int truncate(const char *path, off_t length);
int ftruncate(int fd, off_t length); 

Check more in truncate() manual.

fallocate()

fallocate() allows the caller to directly manipulate the allocated disk space for the file referred to by fd for the byte range starting at offset and continuing for len bytes.

int fallocate(int fd, int mode, off_t offset, off_t len);

Check more in fallocate() manual.

Similar Posts

  • Fedora 中文字体设置

    Fedora 一直有中文字体难看的问题, 尤其是在英文环境中. 使用本文中的配置方法可以得到令人满意的中文效果. 此方案中使用字体都为开源且在Fedora源中自带. 此方案对 Fedora 9 – 20 有效. 对于后续版本支持我会确认并更新此文章. 此方案对Gnome, KDE都有效. Firefox 中也有中文难看的问题, 后面会提到. 快速配置方法 如果你想马上配置好,请使用如下命令。此方法测试使用效果良好。 # yum install cjkuni-ukai-fonts cjkuni-uming-fonts # wget https://raw.githubusercontent.com/zma/config_files/master/others/local.conf \ -O /etc/fonts/local.conf 相关英文字体配置可以参考:Improving Fedora Font Rendering with Open Software and Fonts Only. Fedora 系统中文字体的配置方案 使用uming和ukai字体,即AR PL UMing CN等. 中文字体和等宽字体效果如图所示(点击看大图, Firefox 中文字体设置在后面会提到). 方法如下: 安装字体 首先安装这两个字体: cjkuni-ukai-fonts cjkuni-uming-fonts (在Fedora…

  • Formatting code shortcuts in Eclipse

    Formatting code shortcuts in Eclipse. Shortcut: Ctrl + Shift + F No need to select the code. Read more: C++ cout formatting output Large-but-correctly-aligned-and-optimized code is faster than less-bytes-per-instruction/opcode-packed code How to get the assembly code for OCaml code generated by ocamlopt? Google Chrome keyboard and mouse shortcuts for Linux and Windows In Vim, are…

  • Beautiful Desktop – Gnome 2 of OSX style on Linux

    Screenshots of the desktop: Metacity theme: Humanoid-OSX-Blend GTK 2.x theme: OSX-theme-mod 0.9 GNOME Icon: OSX 3.3 X11 Mouse theme: MacOSX PantherX Mouse Theme[For Baghira] 1.1 Wallpaper: Elementary The system: Fedora 10 Gnome 2.24.3 Updated on 6 Mar. 2010. Screenshots added. Read more: Configuring Mouse Cursor Style for QT Applications in GNOME / MATE Desktop Configuring…

  • mrcc – A Distributed C Compiler System on MapReduce

    The mrcc project’s homepage is here: mrcc project. Abstract mrcc is an open source compilation system that uses MapReduce to distribute C code compilation across the servers of the cloud computing platform. mrcc is built to use Hadoop by default, but it is easy to port it to other could computing platforms, such as MRlite,…

2 Comments

  1. Worth mentioning, also, is probably the fallocate(1) shell command, and especially its -d (–dig-holes) command-line flag.

    `fallocate -d $file` will analyze and “re-sparsify” $file, by deallocating any of its disk blocks which contain runs of zeroes. Useful if it’s been accidentally expanded to its full size by running it through a tool that didn’t preserve the file’s sparsenes.

Leave a Reply

Your email address will not be published. Required fields are marked *