MapReduce is a well-known programming model designed for generating and processing large data. There are various MapReduce implementations. One widely known and used one may be Hadoop. Benchmarking MapReduce frameworks gets to be important.
Faraz Ahmad et al. developed a benchmark suite: PUMA MapReduce Benchmark.
During our work on MapReduce, we developed a benchmark suite which represents a broad range of MapReduce applications exhibiting application characteristics with high/low computation and high/low shuffle volumes. There are a total of 13 benchmarks, out of which Tera-Sort, Word-Count, and Grep are from Hadoop distribution. The rest of the benchmarks were developed in-house and are currently not part of the Hadoop distribution.
One good point of the benchmark is that it provides both the source code and datasets, which makes reproducing and comparing the benchmarking results easier.
The benchmark source code and datasets can be downloaded here.