BigData – who does not think of Google who released white papers on BigtableGoogle File System (today: Colossus) and MapReduce. These papers contained lots of ideas that influenced the development of Apache Hadoop.
On what kind of databases is Google working today? Still Bigtable? Recently Google published a white paper describing their distributed database F1 (term comes from genetics “Filial 1 hybrid”) for the missing critical AdWords system. The AdWords database “is over 100 TB, serves up to hundreds of thousands of requests per second, and runs SQL queries that scan tens of trillions of data rows per day. Availability reaches five nines, even in the presence of unplanned outages, and observable latency on our web applications has not increased compared to the old MySQL system” [Google White Paper]. Google claims the database to support typical NoSQL features like high scalability and high availability. The database also supports typical relational features like ACID and SQL. IMO, the database is much more relational than NoSQL-style. Some of the main characteristics of the database are summarized below.

Basic Architecture

F1 is built on top of Spanner. Spanner is a low level data store and responsible for persistence, replication, caching, data sharding, transactions, etc. Spanner servers interacts with the Colossus File System (CFS).
As F1 servers do not contain data, F1 servers can easily be added. Data is synchronously replicated across multiple, widely distributed datacenters. Commit latencies are 50-150 ms are rather high though (see also “Latency and Throughput” below).

Data Model

F1 is not schemaless as many of those loud and trendy NoSQL databases today. F1 has a data model that is comparable to a relational model with some differences. Tables in a F1 schema can be organized as a hierarchy of parent and child tables. The child tables can be clustered with the parent table. Primary and foreign keys are an important part of efficient data manipulation and retrieval. Local and global indexes can be used to speed up queries.

Transactions

ACID transactions are a must for a critical system like AdWords. The authors reports about developers’ experience with eventual consistency systems at Google: “In all such systems, we find developers spend a significant fraction of their time building extremely complex and error-prone mechanisms to cope with eventual consistency and handle data that may be out of date” [Google White Paper; bold font added by me]. Therefore F1 supports three kinds of ACID transactions: snapshot, pessimistic and optimistic transactions.

Change History

Changes are logged in relational databases by e.g. application code or triggers. Change history is a basic database level feature in F1. The changes are not only used as a logging mechanism. Applications can use publish-and-subscribe to be notified if data is changed. Server caching also uses change history data to avoid having out of date entries.

Latency and Throughput

Commit latency is rather high with 50-150ms. The high system scalability comes with a cost. Application coding strategies have been applied to hide the latency of synchronous commits and to get similiar results compared to the original mySQL solution. But adding new servers for scaling up is much easier today.