Eric Ma

Eric is a systems guy. Eric is interested in building high-performance and scalable distributed systems and related technologies. The views or opinions expressed here are solely Eric's own and do not necessarily represent those of any third parties.

Colossus: Successor to the Google File System (GFS)

Colossus is the successor to the Google File System (GFS) as mentioned in the paper on Spanner at OSDI 2012. Colossus is also used by spanner to store its tablets. The information about Colossus is slim compared with GFS which is published in the paper at SOSP 2003. There is still some information about Colossus […]

Conference Ranking by Average Number of Citations in the Last 5 Years, 2012

I am trying to find out the top conferences that have the largest average number of citations in the last 5 years on the Internet but fail to find one. However, there are many rankings about the overall citations and numbers of publications. Hence, it is not hard to calculate the average number of citations […]

Hadoop Installation Tutorial (Hadoop 1.x)

Update: If you are new to Hadoop and trying to install one. Please check the newer version: Hadoop Installation Tutorial (Hadoop 2.x). Hadoop mainly consists of two parts: Hadoop MapReduce and HDFS. Hadoop MapReduce is a programming model and software framework for writing applications, which is an open-source variant of MapReduce that is initially designed […]

Reading List for Distributed Systems and Cloud Computing

Understanding the literature is usually the first step to do research, which is the same for systems research on cloud computing. A reading list may help a lot to those that just start in cloud computing research. Prof. Lin Gu, my PhD supervisor, compiled a reading list for system research on cloud computing. The reading […]

Conferences on Cloud Computing 2013

This post lists important conferences related to Cloud Computing in year 2013. SOSP 2013 SOSP’13: The 24th ACM Symposium on Operating Systems Principles. November 3-6, 2013, Nemacolin Woodlands Resort, Pennsylvania. The biennial ACM Symposium on Operating Systems Principles is the world’s premier forum for researchers, developers, programmers, and teachers of computer systems technology. Academic and […]

Setting Up a Git Server Using Gitosis

Update: Since gitosis is not maintained and supported, please check out gitolite for setting up a new git server. (see the comment from Sitaram Chamarty, the gitolite author, the author of gitolite.) Gitosis is a piece of software writen by Tommi Virtanen for hosting git repositories. It manages multiple repositories under the same user account. […]

Hadoop Default Ports

Hadoop’s namenode and datanodes expose a bunch of TCP ports used by Hadoop’s daemons to communicate to each other or listen directly to users’ requests. These ports information are needed by both the Hadoop users and cluster administrators to write programs or configure firewalls/gateways accordingly. A post written by Philip Zeyliger from Cloudera’s blog summarizes the […]

Conferences on Cloud Computing 2012

This post lists important conferences on Cloud Computing in year 2012. OSDI 2012 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’12) October 8–10, 2012, Hollywood, CA “The tenth OSDI seeks to present innovative, exciting research in computer systems. OSDI brings together professionals from academic and industrial backgrounds in what has become a […]

Pitfalls and Lessons on Configuing and Tuning Hadoop

This post lists pitfalls and lessons learning when configuring and tuning Hadoop. Hadoop with IPv6 Hadoo doesn’t support IPv6 currently (up to 0.20.2 and 0.21.0): Hadoop and IPv6. The performance of the cluster may suffer from turning IPv6 on in clusters: mail archive. One good practice is to disable IPv6 on servers in the Hadoop […]

Conferences on Cloud Computing 2011

This post lists important conferences on Cloud Computing in year 2011. ACM Symposium on Cloud Computing October 27 and 28, 2011, Cascais, Portugal Submission Deadline: April 30, 2011 23rd ACM Symposium on Operating Systems Principles (SOSP) October 23-26, 2011, Cascais, Portugal Submission deadline: March 18, 2011, 11:59 PM GMT EuroSys 2011 April 10-13, 2011. Salzburg, […]

mrcc – A Distributed C Compiler System on MapReduce

The mrcc project’s homepage is here: mrcc project. Abstract mrcc is an open source compilation system that uses MapReduce to distribute C code compilation across the servers of the cloud computing platform. mrcc is built to use Hadoop by default, but it is easy to port it to other could computing platforms, such as MRlite, […]