Microsofts Cosmos Service

Posted on In Computing systems, Resource management, Storage systems

Cosmos is “Microsoft’s internal data storage/query system for analyzing enormous amounts (as in petabytes) of data”.

There is no paper/technical report about Cosmos published yet. I compiled a list of information about Cosmos on the Web as follows.

What is Microsoft’s Cosmos service? by Yaron Y. Goland.

Microsoft Cosmos: Petabytes perfectly processed perfunctorily by Seth Eliot.

Cosmos Big Data and Big Challenges by Pat Helland.

What Is COSMOS?

  • Petabyte Store and Computation System
    • About 62 physical petabytes stored (~275 logical petabytes stored)
    • Tens of thousands of computers across many datacenters
  • Massively parallel processing based on Dryad
    • Similar to MapReduce but can represent arbitrary DAGs of computation
    • Automatic computation placement with data
  • SCOPE (Structured Computation Optimized for Parallel Execution)
    • SQL-like language with set-oriented record and column manipulation
    • Automatically compiled and optimized for execution over Dryad
  • Management of hundreds of “Virtual Clusters” for computation allocation
    • Buy your machines and give them to COSMOS
    • Guaranteed that many compute resources
    • May use more when they are not in use
  • Ubiquitous access to OSD’s data
    • Combining knowledge from different datasets is today’s secret sauce

Eric Ma

Eric is a systems guy. Eric is interested in building high-performance and scalable distributed systems and related technologies. The views or opinions expressed here are solely Eric's own and do not necessarily represent those of any third parties.

One comment

Leave a Reply

Your email address will not be published. Required fields are marked *