SQL layers on NoSQL databases

What are the SQL layer solution over NoSQL databases such as key/value stores?

Phoenix: A SQL layer on HBase:

https://github.com/forcedotcom/phoenix

They also show some performance results:

https://github.com/forcedotcom/phoenix/wiki/Performance


F1 – The Fault-Tolerant Distributed RDBMS Supporting Google’s Ad Business:

http://research.google.com/pubs/pub38125.html

With F1, we have built a novel hybrid system that combines the
scalability, fault tolerance, transparent sharding, and cost benefits
so far available only in “NoSQL” systems with the usability,
familiarity, and transactional guarantees expected from an RDBMS.


Tenzing A SQL Implementation On The MapReduce Framework:

http://research.google.com/pubs/pub37200.html

Tenzing is a query engine built on top of MapReduce for ad hoc
analysis of Google data. Tenzing supports a mostly complete SQL
implementation (with several extensions) combined with several key
characteristics such as heterogeneity, high performance, scalability,
reliability, metadata awareness, low latency, support for columnar
storage and structured data, and easy extensibility. Tenzing is
currently used internally at Google by 1000+ employees and serves
10000+ queries per day over 1.5 petabytes of compressed data. In this
paper, we describe the architecture and implementation of Tenzing, and
present benchmarks of typical analytical queries.


HAWQ from EMC:

http://www.emc.com/about/news/press/2013/20130225-04.htm

HAWQ (pronounced hawk) represents the EMC Greenplum engineering effort
that brings 10 years of large-scale data management research and
development to the Apache Hadoop framework. Leveraging the feature
richness and maturity of the industry leading Greenplum MPP analytical
database, this innovation has resulted in the world’s first true SQL
parallel database on top of the Hadoop Distributed File System (HDFS).

http://www.theregister.co.uk/2013/02/25/emc_pivotal_hd_hadoop_hawq_database/

Project Hawq, the SQL database layer that rides atop of HDFS rather
than trying to replace it with a NoSQL data store


Apache Hive: http://hive.apache.org/

It defines a SQL-like language called HiveQL.


Stinger Initiative: Making Apache Hive 100 Times Faster: http://hortonworks.com/blog/100x-faster-hive/


Cloudera Impala

http://blog.cloudera.com/blog/2012/10/cloudera-impala-real-time-queries-in-apache-hadoop-for-real/

Source code:

https://github.com/cloudera/impala

it uses the same metadata, SQL syntax (Hive SQL), ODBC driver and user
interface (Hue Beeswax) as Apache Hive, providing a familiar and
unified platform for batch-oriented or real-time queries.


Spire:

Home: https://drawntoscalehq.com/

Spire is the first SQL database for large, user-facing applications
built on Hadoop. Spire is built to power large-scale websites, mobile
apps, and machine-to-machine data.

Unlike any other Hadoop and SQL solution, Spire scales to tens of
thousands of reads and writes per second, with full ANSI SQL and
intuitive management tools.

Architecturally similar to Google F1, Spire makes it simple to build
applications for the Big Data Era.


Hadapt: http://hadapt.com/

Hadapt unifies SQL and Hadoop, enabling customers to analyze all of their data (structured, unstructured, and multi-structured) in a single platform – no connectors, complexities, or rigid structure.

Leave a Reply

Your email address will not be published. Required fields are marked *