Large-scale Data Storage and Processing System in Datacenters

ByEric Ma Dec 11, 2012Aug 30, 2020

Research on Cloud Computing has made big progresses and many excellent large-scale systems have been designed in recent years. I compiled a list of some large-scale data storage and processing systems in datacenters as follows.

Storage systems

Google File System (GFS): http://research.google.com/archive/gfs.html
HDFS implementation: https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
Colossus (GFS2): Colossus: Successor to the Google File System (GFS)
BigTable: http://research.google.com/archive/bigtable.html
Megastore: http://research.google.com/pubs/pub36971.html
Spanner: http://research.google.com/archive/spanner.html
Dynamo: http://dl.acm.org/citation.cfm?id=1294281
RAMCloud: http://dl.acm.org/citation.cfm?id=1965751 and http://dl.acm.org/citation.cfm?id=2043560

Compute systems

MapReduce: http://research.google.com/archive/mapreduce.html
Hadoop implementation: Hadoop MapReduce Tutorials
Sawzall: http://research.google.com/archive/sawzall.html
FlumeJava: http://dl.acm.org/citation.cfm?id=1806638
Pig latin: http://dl.acm.org/citation.cfm?id=1376726
Dryad/DryadLINQ: http://research.microsoft.com/en-us/projects/dryad/
Pregel: http://dl.acm.org/citation.cfm?id=1807184 and http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html
Dremel: http://research.google.com/pubs/pub36632.html
Storm: https://blog.twitter.com/2011/a-storm-is-coming-more-details-and-plans-for-release and https://github.com/nathanmarz/storm/wiki
Spark: https://www.usenix.org/conference/nsdi12/resilient-distributed-datasets-fault-tolerant-abstraction-memory-cluster-computing and http://spark-project.org/
DVM: IEEE Transactions on Computers paper and VEE paper

Resource management

Mesos: http://mesos.apache.org/documentation/latest/architecture/

How to add a “status bar” to screen on Linux?

ByEric Ma Mar 24, 2018Oct 7, 2019

I noticed that some guys’ screen console has a status bar with tab numbers. That will be very useful for 1) know you are using screen rather than a normal terminal. 2) which tab you are working in. Below is my ~/.screenrc: hardstatus alwayslastline hardstatus string ‘%{= kG}[ %{G}%H %{g}][%= %{=kw}%?%-Lw%?%{r}(%{W}%n*%f%t%?(%u)%?%{r})%{w}%?%+Lw%?%?%= %{g}][%{B}%Y-%m-%d %{W}%c %{g}]’ It…

Manage Linux console screen by commands?

ByEric Ma Mar 24, 2018Mar 24, 2018

How to manage Linux console screen by commands? When the screen will be blanked? Put the screen into powerdown mode or power off the screen? 2 tools are useful for managing the console screen on Linux: setterm – set terminal attributes.vbetool – run real-mode video BIOS code to alter hardware state. When the screen will…

How to use iptables to limit rates new SSH incoming connections from each IP on Linux?

ByEric Ma Mar 24, 2018Mar 24, 2018

How to use iptables to limit rates new SSH incoming connections from each IP on Linux? For example, at most 6 SSH connection attempts every 60 seconds. You may use these rules (skip the first one, if you have set the basic rules): for tables in iptables ip6tables ; do # Allow established inbound connections…

How to process a file line by line in PHP?

ByQ A Mar 24, 2018

In PHP, how to process a file line by line? The file is a plain text line like input.txt. As an example, the process can be to just print it out. In PHP, you can use this code snippet to process a file line by line: if ( ($fhandle = fopen(“./input.txt”, “r”) !== FALSE )…

Linux

Setting Up VPN-like Network Between Several Clusters Using iptables

ByEric Ma Jul 13, 2013Aug 30, 2020

It is common to connect servers with only internal IPs from several clusters. VPN is a common technique for this. With iptables, we can implement many functions of VPN with possibly higher performance. The slides here give a brief introduction to how to set up a VPN-like network between 2 clusters which connect to each…

Linux

How to Set Default Entry in Grub2 and Grub

ByEric Ma Jul 13, 2013Aug 30, 2020

Linux booting is usually controlled by Grub or the new Grub2. Setting the default booting entry is a frequent operations. Here, we introduce how to set the default entry in Grub2 and Grub. Setting the default booting entry in grub2 Note1: With some version of grub2, the grub2-set-default method and the script below may not…

2 Comments

Pingback: 近些年数据中心云存储相关的系统整理 | 撤退的逃兵
Zhiqiang Ma says:

Aug 8, 2013 at 12:00 am

The Memcache and TAO from Facebook are also very interesting, scalable and real systems: http://www.systutorials.com/qa/364/cache-at-facebook

Reply

Storage systems

Compute systems

Resource management

Similar Posts

2 Comments

Leave a Reply Cancel reply