Colossus: Google’s Next-Generation Distributed File System
Colossus is Google’s successor to the Google File System (GFS), designed to handle the massive scale of data storage across Google’s infrastructure. While GFS was documented in detail in the 2003 SOSP paper, Colossus information remains sparse in academic literature—much of what we know comes from talks and interviews with Google engineers. Colossus serves as the underlying storage layer for systems like Spanner, Google’s globally distributed database.
Why Colossus Replaced GFS
GFS worked well for its time, but Google’s exponential growth created new constraints:
- Scale explosion: The volume of data stored exceeded GFS’s practical limits
- Metadata bottlenecks: A single master became a limitation for file count and namespace growth
- Cost pressure: Replication overhead (3x) became unsustainable at Google’s scale
- Complexity: Application-level coordination of replication and erasure coding was needed
Colossus introduced a fundamentally different architecture to address these issues.
Colossus Architecture
Metadata Layer: Curators
The most critical innovation in Colossus is the decentralized metadata system built on Curators—a horizontally scalable metadata service layered on top of NoSQL databases like BigTable.
Unlike GFS’s single master, Curators enable:
- Horizontal scaling of metadata operations
- 100x+ larger namespace capacity compared to GFS
- Support for 100+ million files per curator instance
- Distributed namespace partitioning that applications don’t need to know about
Client-Side Intelligence
The Colossus client handles significantly more responsibility than GFS clients:
- Software RAID: Encoding and replication logic runs on clients
- Data placement decisions: Clients choose which servers receive data
- Erasure coding: Applications select encoding schemes appropriate to their durability and cost requirements
- Replication strategy: Clients manage both replication and encoding across multiple failure domains
This pushes complexity into clients where it can be customized per application, rather than centralizing it in the filesystem.
Storage Servers: D Servers and Custodians
D Servers are simple network-attached disk devices with minimal intelligence—just enough to store and retrieve data.
Custodians are background storage managers that handle:
- Disk space balancing and rebalancing
- RAID construction and reconstruction
- Ensuring system efficiency and durability
- Data lifecycle management as data ages
This separation allows D Servers to remain simple while Custodians handle complexity asynchronously.
Data Storage Optimization
Colossus differentiates between hot and cold data, using hybrid storage to optimize cost and performance:
Flash and Spinning Disk Mix
Hot data (recently written, frequently accessed) lives on flash to:
- Reduce latency for active workloads
- Provide sufficient I/O density without excessive hardware
Cold data (older, infrequently accessed) moves to:
- Large-capacity spinning disks
- Reduced-capacity devices optimized for sequential access
- Storage tiers selected to maximize utilization
Write Distribution
New writes are spread evenly across all disk spindles to:
- Maintain consistent bandwidth per disk
- Prevent hot spots that would waste disk capacity
- Keep disk spindles busy without overloading
Erasure Coding vs. Replication
Colossus standardized on Reed-Solomon erasure coding for most data because:
- Cost: 1.5x storage overhead vs. 3x for GFS replication
- Availability: Field data and simulations show improved MTTF (Mean Time To Failure) compared to replication
- Flexibility: Different data can use different coding schemes (e.g., higher redundancy for critical metadata, aggressive coding for bulk logs)
- Cross-cluster replication: Much more efficient than replicating full copies across geographically distributed clusters
Distributed Namespace and Multi-Cell Design
GFS evolved toward a “multi-cell” architecture that addressed some limitations:
- Multiple masters: Distributes the master responsibility across a pool of chunkservers
- Namespaces: A static partitioning mechanism that lets applications use a unified namespace view while hiding the underlying distributed implementation
- Namespace descriptors: Metadata files that describe how the namespace is partitioned
Colossus takes this further by making namespace partitioning and distribution automatic rather than static.
Key Takeaways for System Designers
When designing large-scale distributed storage systems:
- Push complexity to clients where policies can vary by application
- Horizontally scale metadata to avoid single points of contention
- Use erasure coding instead of replication for cost-sensitive bulk storage
- Separate concerns: Simple storage servers with intelligent management layers
- Tiered storage: Match data temperature to hardware cost and performance characteristics
- Distributed namespace: Hide distribution complexity from applications through abstraction
Colossus represents the natural evolution of GFS given Google’s scale. Its design principles—client-driven coordination, horizontally scalable metadata, and cost-aware storage—have influenced subsequent distributed storage systems across the industry.

This page is linked by a director from Google https://www.linkedin.com/in/mbinde/ for reference to Colossus !
Glad this article is referenced by “Google Cloud Platform” article “Opinionated Managed Storage Engine” at https://medium.com/google-cloud/the-12-components-of-google-bigquery-c2b49829a7c7 – “Colossus is Google’s successor to GFS”.
Google Cloud Next 2020 virtual conference Infrastructure week had a session covering Colossus overview in session “A peek behind the VM at the Google Storage infrastructure” (presenters: Dean Hildebrand (technical director), Denis Serenyi (tech lead, Google Cloud Storage))
https://www.youtube.com/watch?v=q4WC_6SzBz4
Good info! Thanks Tuomas. I added the link and a digest of the content related to Colossus.