Amazon Redshift (hosted DWH) - SQL for simplicity - Data Management & Data Architecture

NoSQL databases like Amazon Dynamo got quite popular in the OLTP market. Their marketing credo is supposed to be simplicity compared to RDBMS. The absence of a declarative programming language like SQL means that a lot of work has to be done in the application though. Couchbase announced in a recent press release that their product is going to support an SQL dialect called N1QL (pronounced Nickel). Other suppliers may follow in a similiar direction.

Amazon also offers a hosted DWH solution which is part of Amazon Web Services. Does Amazon follow a similiar approach without SQL suport?

A recent paper “Amazon Redshift and the Case for Simpler Data Warehouses” from the “Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data” supplies the answer clearly. The AWS authors pointed out in their chapter about simplicity: “We believe the success of SQL-based databases came in large part from the significant simplifications they brought to application development through the use of declarative query processing and a coherent model of concurrent execution” (page 1920).

The paper also descibes the system architecture of Redshift. Redshift uses familiar DWH techniques like

columnar layout
compression
co-locating compute and data
MPP processing
no traditional indexes

The database engine is based on ParAccel. Data and computation is distributed accros several nodes with one leader node and at least one compute node. The leader node handles client connections and generates execution plans as C++ or machine code. The executable is send to the compute nodes. The compute nodes send data back to the leader for final aggregation.

Continuous delivery (or even a step further continuous integation or DevOps) is getting more and more important to reduce the time to market for client code: frequent releases with even several daily deployments. The same is also true for patching Redshift by Amazon. A configurable 30min window is used to patch Redshift customer clusters on a weekly basis. Small patches are installed compared to the traditional approach with extensive patch sets containing many new functions. The patches are reversible and will be rolled-back if errors increase or performance degrades.

The document is worth while to read and contains also a lot of surprising details like “A meaningful percentage of Amazon Redshift customers delete their clusters every Friday and restore from backup each Monday” (page 1920).

Amazon Redshift (hosted DWH) – SQL for simplicity

Leave a reply Cancel reply

Recent Posts

Archives

Categories

Amazon Redshift (hosted DWH) – SQL for simplicity

Related Posts

Columnar analytical databases for DWH and Data Analytics

DataBeat week 06/2015

Q&A on Data Integration and Big Data

SQL Tuning – filtered rows percentage method

Leave a reply Cancel reply

Recent Posts

Archives

Categories