PUMA: A MapReduce Benchmark Suite

ByEric Ma Dec 20, 2012Sep 5, 2020

MapReduce is a well-known programming model designed for generating and processing large data. There are various MapReduce implementations. One widely known and used one may be Hadoop. Benchmarking MapReduce frameworks gets to be important.

Faraz Ahmad et al. developed a benchmark suite: PUMA MapReduce Benchmark.

During our work on MapReduce, we developed a benchmark suite which represents a broad range of MapReduce applications exhibiting application characteristics with high/low computation and high/low shuffle volumes. There are a total of 13 benchmarks, out of which Tera-Sort, Word-Count, and Grep are from Hadoop distribution. The rest of the benchmarks were developed in-house and are currently not part of the Hadoop distribution.

One good point of the benchmark is that it provides both the source code and datasets, which makes reproducing and comparing the benchmarking results easier.

The benchmark source code and datasets can be downloaded here.

Computing systems | Storage systems | Systems

Big Data Benchmark from AMPLab of UC Berkeley

ByEric Ma Mar 17, 2014Sep 5, 2020

Benchmarks are important to understand the performance and quantitative and qualitative comparison of different systems. Many analytic frameworks, such as Hive, Impala and Shark, are designed and implemented these years and become fundamental software for processing big data. How to benchmark these big data analytic systems is an interesting problem. The Big Data Benchmark The…

How to output function stack in Linux Kernel

ByWeiwei Jia Mar 24, 2018Jan 7, 2020

In Linux Kernel, we usually trace/debug what kind of events will trigger the phenomena we find in the system. For example, what kind of event will trigger the fact that the timeslice of one process will be very short. In order to solve these kind of problems, we need to output the function stack. Currently,…

Linux | QA | Tutorial

How to find the disk where root / is on in Bash on Linux?

ByEric Ma Mar 24, 2018Aug 2, 2020

Question: how to find the disk where the Linux’s root(/) is on in Bash? The root may be on a LVM volume or on a raw disk. 2 cases: One example: # df -hT | grep /$ /dev/sda4 ext4 48G 32G 14G 71% / For another example: # df -hT | grep /$ /dev/mapper/fedora-root ext4…

Programming

How to Measure Time Accurately in Programs

ByEric Ma Sep 5, 2013Aug 30, 2020

It is quite common to measure the time in programs using APIs like clock() and gettimeofday(). We may also want to measure the time “accurately” for certain purposes, such as measuring a small piece of code’s execution time for performance analysis, or measuring the time in time-sensitive game software. It is hard to measure the…

How to convert a latex file to a single page html

ByEric Ma Mar 24, 2018Mar 24, 2018

How to convert a latex file to a single page html? htlatex is a good choice. On Fedora, install it by yum install texlive-tex4ht. To generate the HTML page from a latex file doc.tex: htlatex doc You can use the latex2html (can be installed on Fedora by yum install latex2html). $ latex2html -split +0 -info…

Storage systems | Systems

How to handle missing blocks and blocks with corrupt replicas in HDFS?

ByEric Ma Mar 24, 2018Feb 20, 2020

One of HDFS cluster’s hdfs dfsadmin -report reports: Under replicated blocks: 139016 Blocks with corrupt replicas: 9 Missing blocks: 0 The “Under replicated blocks” can be re-replicated automatically after some time. How to handle the missing blocks and blocks with corrupt replicas in HDFS? Understanding these blocks A block is “with corrupt replicas” in HDFS…

3 Comments

Eric Zhiqiang Ma says:

Feb 23, 2014 at 8:33 pm

Update: the new links for homepage for the PUMA and datasets are updated in the post.

Reply
Ewan says:

Nov 16, 2015 at 1:01 pm

The links on Google don’t work.

Reply
1. Eric Zhiqiang Ma says:
  
  Nov 19, 2015 at 12:24 am
  
  Hi Evan, thanks for reporting the broken links. I have updated the post with the updated links.
  
  Reply

Similar Posts

3 Comments

Leave a Reply Cancel reply