Dr. Carsten Bange from Business Application Research Center (BARC) presented trends in the Big Data, Advanced Analytics, and Cloud market during Oracle Data Vision 2017 conference in Neuss. He focused on three areas:
- Data Management
- Explorative BI
Corporate users are currently engaged with 8 trends in Data Management. These trends are driven by the lust for flexibility. At the end, flexibility leads to heterogeneity with e.g. many different database or storage systems in general.
Data Preparation is the process of integrating, cleaning, and harmonizing data for subsequent analysis. Self-Service BI is enriched with data preparation capabilities for power users or Data Scientists.
Data is an asset that needs to be managed so that end users can find the data they need wherever it is stored. It’s necessary to get a complete view of the data inventory stored in a Data Lake including Data Quality. In the ideal case, internal or external data marketplaces can be supplied.
Data Lake Management
Data Lakes get rather complex with a bunch of different tools being badly integrated and having low productivity. Best practices and high productivity tools allow handling data ingestion, data preparation, and data delivery within a Data Lake.
Data Vault Modeling
Data Vault modeling allows a flexible data integration into relational DWHs or NoSQL/BigData systems. Many projects already implemented successfully Data Vault models. Data Vault 2.0 is not just a modeling approach, but also an architecture and a methodology.
Data Warehouse Automation
The loading of data from a staging layer or an HDFS landing zone into Data Vault structures follows some standard patterns. DWH automation tools generate code for ingesting data into those tables.
The heterogeneity grows with the increase of various relational databases or NoSQL DBs including Hadoop-based systems. The reality is that many companies have not just one DWH but several. The same seems to become true with Data Lakes. Data Virtualization systems combine data residing in different databases from one interface or query.
Meta Data Management
Meta Data Management is not new and appears regularly as a hot topic. It’s a hot topic now again with the fuss about BigData. There’s a strong need for tools capturing business, technical, and operational metadata automatically as manual input by users, developers, or others will lack completeness and accuracy.
DWHs and especially Data Lakes need to be managed properly. Otherwise, they will turn into Data Swamps. Data governance activities like defining HDFS layouts and establishment of processes are necessary to avoid a disorganized, undocumented, and insecure data storage.