Reference to blogs, tweets, discussions, etc that caught my attention during the last week.

Data Modeling

Data independence as the key idea of RDBMS is illustrated in “Relational algebra-How it makes Relational Databases go faster” by Kyle Hailey. Algebraic optimization is done by the database system and not by the programmer as in many NoSQL databases.

Fuss is regularly made about inefficient schema evolution in RDBMS. Just throwing data as textfiles into Hadoop is not really the solution. With Hadoop, you get many choices about file formats. Avro is a choice that allows schema evolution as described by Gwen Shapira in “The problem of managing schemas“.

Data Architecture

Popular NoSQL and Hadoop blog articles 2014:

Data Storage

Notes on machine-generated data, year-end 2014” by Curt Monash is a compact summary about kinds of machine-generated data, their database structures, continuous events and streaming + memory-centric processing.