Cosmos is “Microsoft’s internal data storage/query system for analyzing enormous amounts (as in petabytes) of data”.
There is no paper/technical report about Cosmos published yet. I compiled a list of information about Cosmos on the Web as follows.
What is Microsoft’s Cosmos service? by Yaron Y. Goland.
Cosmos Big Data and Big Challenges by Pat Helland.
What Is COSMOS?
- Petabyte Store and Computation System
- About 62 physical petabytes stored (~275 logical petabytes stored)
- Tens of thousands of computers across many datacenters
- Massively parallel processing based on Dryad
- Similar to MapReduce but can represent arbitrary DAGs of computation
- Automatic computation placement with data
- SCOPE (Structured Computation Optimized for Parallel Execution)
- SQL-like language with set-oriented record and column manipulation
- Automatically compiled and optimized for execution over Dryad
- Management of hundreds of “Virtual Clusters” for computation allocation
- Buy your machines and give them to COSMOS
- Guaranteed that many compute resources
- May use more when they are not in use
- Ubiquitous access to OSD’s data
- Combining knowledge from different datasets is today’s secret sauce