Microsofts Cosmos Service

Cosmos is “Microsoft’s internal data storage/query system for analyzing enormous amounts (as in petabytes) of data”.

There is no paper/technical report about Cosmos published yet. I compiled a list of information about Cosmos on the Web as follows.

What is Microsoft’s Cosmos service? by Yaron Y. Goland.

Microsoft Cosmos: Petabytes perfectly processed perfunctorily by Seth Eliot.

Cosmos Big Data and Big Challenges by Pat Helland.

What Is COSMOS?

  • Petabyte Store and Computation System
    • About 62 physical petabytes stored (~275 logical petabytes stored)
    • Tens of thousands of computers across many datacenters
  • Massively parallel processing based on Dryad
    • Similar to MapReduce but can represent arbitrary DAGs of computation
    • Automatic computation placement with data
  • SCOPE (Structured Computation Optimized for Parallel Execution)
    • SQL-like language with set-oriented record and column manipulation
    • Automatically compiled and optimized for execution over Dryad
  • Management of hundreds of “Virtual Clusters” for computation allocation
    • Buy your machines and give them to COSMOS
    • Guaranteed that many compute resources
    • May use more when they are not in use
  • Ubiquitous access to OSD’s data
    • Combining knowledge from different datasets is today’s secret sauce

Eric Ma

Eric is a systems guy. Eric is interested in building high-performance and scalable distributed systems and related technologies. The views or opinions expressed here are solely Eric's own and do not necessarily represent those of any third parties.

One comment:

Leave a Reply

Your email address will not be published. Required fields are marked *