Reference to blogs, tweets, discussions, etc that caught my attention during the last week.

Data Modeling

Steve Hoberman asked the question “Create a logical data model?” on his blog and supplied his and responder’s opinions. There is an overall agreement that a logical data model is advisable for all database developments including NoSQL. Business understanding, consistency + validation, flexibility, support + maintenance are substantial arguments.

Data Architecture

Probabilistic techniques, data streams and online learning – Looking forward to a bigger 2015” by Debasish Ghosh emphasizes the meaningfulness for storage, algorithm and data structure optimization for right-time analytics before looking into more complex architectures like Nathan Marz’ Lambda architecture. Data structures like Bloom Filters (e.g. OracleDB), HyperLogLog (e.g. cardinality estimation: Cloudera Impala) and others are already in use to get faster results.

Slides from Nathan Marz’ presentation “Using Simplicity to Make Hard Big Data Problems Easy” from Data Day Texas on 10-Jan-2015. His approach is known as Lambda architecture with a coexistence of batch and realtime view processing.

Data Storage

The site db-engines.com ranked MongoDB (1.), Redis (2.) and Elasticsearch (3.) for getting most popularity gain in 2014: “MongoDB is the DBMS of the year, defending the title from last year” by Paul Andlinger and Matthias Gelbmann.

Data Analytics

Download link to free eBook “Data Driven: Creating a Data Culture” by DJ Patil and Hilary Mason, published by O’Reilly in 2015.

Cooking Watson-style: Supercomputer turns to recipes” by Niall Firth refers to a beta cooking app to test supercomputer Watson to create new recipes.

Data Quotes

I like to think of a search engine as a very fast ranking engine. If the problem requires me to rank something, than search engine technology is going to be hard to beat. If you need it to do all different kinds of joins across a large number of document types or constant large table scans, it may be appropriate to do in a search engine and it may not. It’s a classic “it depends” situation.” quoted from the interview “On Solr and Mahout. Interview with Grant Ingersoll” by Roberto V. Zicari.

Data Divers

Worth while overview of lightweight virtualization solution Docker: “Docker – Beginner’s tutorial” by talPor Solutions.