Colossus: Successor to the Google File System (GFS)

ByEric Ma Nov 29, 2012Aug 2, 2020

Colossus is the successor to the Google File System (GFS) as mentioned in the paper on Spanner at OSDI 2012. Colossus is also used by spanner to store its tablets. The information about Colossus is slim compared with GFS which is published in the paper at SOSP 2003. There is still some information about Colossus on the Web. Here, I list some of them.

Storage Architecture and Challenges

A talk on Faculty Summit, July 29, 2010, by Andrew Fikes, Principal Engineer. The slides.

Some interesting points:

Storage Software: Colossus

Next-generation cluster-level file system

Automatically sharded metadata layer

Data typically written using Reed-Solomon (1.5x)

Client-driven replication, encoding and replication

Metadata space has enabled availability analyses

Why Reed-Solomon?

Cost. Especially w/ cross cluster replication.

Field data and simulations show improved MTTF

More flexible cost vs. availability choices

A peek behind the VM at the Google Storage infrastructure

An online talk on Google Cloud storage by Dean Hildebrand, Technical Director, and Denis Serenyi, Teach Lead, Google Cloud Storage. The talk gives quite some details on how Colossus works. View the online talk.

Since GFS time, Google has scaled a lot and there’s a lot more data to store; the higher level of scalability drove the creation of Colossus
Colossus client: probably the most complex part of the system
- lots of functions go directly in the client, such as
  - software RAID
  - application encoding chosen
Curators: foundation of Colossus, its scalable metadata service
- can scale out horizontally
- built on top of a NoSQL database like BigTable
- allow Colossus to scale up by over a 100x over the largest GFS
D servers: simple network attached disks
Custodians: background storage managers, handle such as disk space balancing, and RAID construction
- ensures the durability and availability
- ensures the system is working efficiently
Data: there are hot data (e.g. newly written data) and cold data
Mixing flash and spinning disks
- really efficient storage organization
  - just enough flash to push the I/O density per gigabyte of data
  - just enough disks to fill them all up
- use flash to serve really hot data, and lower latency
- regarding to disks
  - equal amounts of hot data across disks
    - each disk has roughly same bandwidth
    - spreads new writes evenly across all the disks so disk spindles are busy
  - rest of disks filled with cold data
    - moves older cold data to bigger drives so disks are full

GFS: Evolution on Fast-forward

An interview with Google’s Sean Quinlan by the Association for Computer Machinery (ACM).

View the interview.

Some important info:

“We also ended up doing what we call a “multi-cell” approach, which basically made it possible to put multiple GFS masters on top of a pool of chunkservers.”

“We also have something we called Name Spaces, which are just a very static way of partitioning a namespace that people can use to hide all of this from the actual application.” … “a namespace file describes”

“The distributed master certainly allows you to grow file counts, in line with the number of machines you’re willing to throw at it.” … “Our distributed master system that will provide for 1-MB files is essentially a whole new design. That way, we can aim for something on the order of 100 million files per master. You can also have hundreds of masters.”

BigTable “as one of the major adaptations made along the way to help keep GFS viable in the face of rapid and widespread change.”

Google File System II: Dawn of the Multiplying Master Nodes Comments on GFS2 (colossus)

by Cade Metz in San Francisco.

The article and some excerpt.

Blockchain | Systems

Understanding Temporary Forks and Reorganization in Blockchain

ByEric Ma Dec 14, 2024Dec 15, 2024

Temporary forks and chain reorganizations (reorgs) are natural occurrences in decentralized blockchain systems. They arise due to the asynchronous nature of block propagation across the network. While temporary forks are short-lived and resolved quickly, chain reorganization refers to the process by which the blockchain discards one branch in favor of a longer or higher-priority chain….

Linux | Tutorial

How sched_min_granularity_ns, sched_latency_ns and sched_wakeup_granularity_ns in CFS affect the timeslice of processes

ByWeiwei Jia Dec 1, 2016Jan 7, 2020

Abstract Currently, the most famous process scheduling algorithm in Linux Kernel is Completely Fair Scheduling (CFS) algorithm. The core idea of CFS is to let each process share the same proportional CPU resources to run so that it is fair to each process. In this article, I will introduce how sched_min_granularity_ns and sched_latency_ns work internal…

Programming

4 Ways of Converting String to Int in C++

ByEric Ma Jul 13, 2013Nov 1, 2020

It is common to convert a string (std::string) to integer (int) in C++ programs. Because of the long history of C++ which has several versions with extended libraries and supports almost all C standard library functions, there are many ways to convert a string to int in C++. This post introduces how to convert a…

Cannot start VM with error “no network with matching name ‘default'”

ByWeiwei Jia Mar 24, 2018Jan 7, 2020

I update libvirt version and want to start VM with the new libvirt tools but I failed as follows. > sudo virsh start kvm1 error: Failed to start domain kvm1 error: Network not found: no network with matching name ‘default’ It seems that the default ‘virbr0’ is missing after I update libvirt so I solve…

Linux

Sending Email Using mailx/s-nail in Linux Through Gmail SMTP

ByEric Ma Jul 13, 2013Nov 21, 2020

The heirloom mailx (or s-nail if you are using Ubuntu 18 or later or similar releases) command in Linux is still providing service for guys like me, especially when we need to send email automatically by script. Gmail is great. Now, how to use gmail’s smtp in mailx/mail? gmail is a little special since gmail’s…

How to get hostname in Python on Linux?

ByQ A Mar 24, 2018

In Python, how to get hostname as the command hostname does on Linux? In Python, you can get the hostname by the socket.gethostname() library function in the socket module: import socket hostname = socket.gethostname() Reference: https://www.systutorials.com/dtivl/20/how-to-get-the-hostname-of-the-node?show=34#a34 Read more: How to get the hostname of the node in Python? Getting Hostname in Bash in Linux in…

8 Comments

Pingback: Large-scale Data Storage and Processing System in Datacenters | Cloud Computing
Pingback: Google and evolution of big-data | Useful Stuff
Pingback: Large-scale Data Storage and Processing System in Datacenters - SysTutorials
Clara says:

Feb 28, 2017 at 8:10 pm

This page is linked by a director from Google https://www.linkedin.com/in/mbinde/ for reference to Colossus !

Reply
Pingback: Separation of storage and compute in BigQuery – Cloud Data Architect
Eric Z Ma says:

Jul 25, 2018 at 6:43 pm

Glad this article is referenced by “Google Cloud Platform” article “Opinionated Managed Storage Engine” at https://medium.com/google-cloud/the-12-components-of-google-bigquery-c2b49829a7c7 – “Colossus is Google’s successor to GFS”.

Reply
Tuomas Nurmela says:

Aug 2, 2020 at 3:23 am

Google Cloud Next 2020 virtual conference Infrastructure week had a session covering Colossus overview in session “A peek behind the VM at the Google Storage infrastructure” (presenters: Dean Hildebrand (technical director), Denis Serenyi (tech lead, Google Cloud Storage))
https://www.youtube.com/watch?v=q4WC_6SzBz4

Reply
1. Eric Ma says:
  
  Aug 2, 2020 at 7:22 pm
  
  Good info! Thanks Tuomas. I added the link and a digest of the content related to Colossus.
  
  Reply

Storage Architecture and Challenges

A peek behind the VM at the Google Storage infrastructure

GFS: Evolution on Fast-forward

Google File System II: Dawn of the Multiplying Master Nodes Comments on GFS2 (colossus)

Similar Posts

8 Comments

Leave a Reply Cancel reply