The European Data Modeling Zone (DMZ) took place in Düsseldorf from 23-OCT-2017 to 25-OCT-2017. The location was well-chosen as reachability by train or plane was good. Overall the conference was worthwhile for me as topics like Data Vault or Data Integration architecture were covered.
The first day started with a 3-hour session by Dirk Lerner with bitemporal data “Send bi-temporal data from Ground to Vault to the Stars“. The topic is quite heavy stuff. Bi-temporality looks innocent but if an example is played through, the complexity becomes obvious. The session started with an overview of time periods (closed/closed, closed/open, open/closed, open/open) and Allen relationships. Allen defined relations between time intervals as shown in the picture.
Dirk introduced the terminology for bitemporal dimensions and criticized the heterogeneous usage of words for the same. It must be distinguished between business time (that has a meaning to the business user) and technical time (time generated by the database)
He prefers to use state time and assertion time and continues with non-temporal, uni-temporal, and bi-temporal Data Vault satellite examples (and the most common time period [closed, open) & Allen relationship “Meets”):
- Non-temporal satellite
- Primary key: Hub-FK, EventTime
- Uni-temporal satellite
- Primary key: Hub-FK, AssertionTimeFrom
- Further columns: AssertionTimeBefore (note: assertionTimeBefore instead of assertionTimeTo as better naming for an open interval)
- Bi-temporal satellite
- Primary key: Hub-FK, AssertionTimeFrom, StateTimeFrom
- Further columns: AssertionTimeBefore, StateTimeBefore
He covered the path from staging to a Core Warehouse Layer (Data Vault layer) to the mart layer modeled as star schema including reporting requirements “as is”, “as was”, and “as of”.
Data Vault was a major topic and was covered in many sessions, e.g.
- Hans Hultgren spoke about multi-structured data and how to model it regarding tools like Hadoop or Kafka for streaming requirements. Data integration means change. Change is ever-present and unstoppable. Agility is a measure of an enterprise’ ability to adopt change and a primary feature of the DWH. Germany is currently the fastest growing Data Vault market.
- Kent Graziano (who also started the second day with a morning Chi Gung activity session) spoke about a Hybrid Data Vault case study with a migration from a 2-layered, Kimball-style DWH to Data Vault. He showed the downsides of a 2-layered DWH when new sources are added. The solution was a 3-layered architecture with a Core Warehouse Layer for data integration by business keys.
- John Giles spoke covered multiple Data Vault challenges around business keys and the requirements to record existence over time. He compared record tracking and status tracking satellites and their usage for his requirements.
- Roelant Vos had a one-hour session and a full day workshop covering Data Warehouse Automation. The ultimate goal is the persistent staging area with a virtualization on-top.
Data and data-driven business is a core demand. Steve Hoberman showed the results of a survey about the skills required for a data modeler. The data modeling skill itself is on place 5. Much more important are other skills to architect data products. The skill in the first place is communication. On second place is the knowledge of databases. Knowing RDBMS is not enough anymore: NoSQL, Hadoop, Elastic, Blockchain, etc. are already mainstream. Value proposition and agility are challenges that must be addressed.
Correspondingly suitable was Martijn Evers topic on “Data modeling must DIE” in a separate session. He also held a whole day workshop on data architecture. His main message was that it is essential to cover a variety of concerns in an architecture – data modeling is just not enough. The following mindmap shows a selection of some of his topics and below are some impressions from his workshop.
— TheRealDataArchitect (@FSDataArchitect) 27. Oktober 2017