PUMA: A MapReduce Benchmark Suite

MapReduce is a well-known programming model designed for generating and processing large data. There are various MapReduce implementations. One widely known and used one may be Hadoop. Benchmarking MapReduce frameworks gets to be important.

Faraz Ahmad et al. developed a benchmark suite: PUMA MapReduce Benchmark.

During our work on MapReduce, we developed a benchmark suite which represents a broad range of MapReduce applications exhibiting application characteristics with high/low computation and high/low shuffle volumes. There are a total of 13 benchmarks, out of which Tera-Sort, Word-Count, and Grep are from Hadoop distribution. The rest of the benchmarks were developed in-house and are currently not part of the Hadoop distribution.

One good point of the benchmark is that it provides both the source code and datasets, which makes reproducing and comparing the benchmarking results easier.

The benchmark source code and datasets can be downloaded here.

Eric Ma

Eric is a systems guy. Eric is interested in building high-performance and scalable distributed systems and related technologies. The views or opinions expressed here are solely Eric's own and do not necessarily represent those of any third parties.


  1. Update: the new links for homepage for the PUMA and datasets are updated in the post.

Leave a Reply

Your email address will not be published. Required fields are marked *