Data Blog

Materialization examples of Data Engineering with dbt

Materialization examples of Data Engineering with dbt

dbt offers several materialization options to create ETL/ELT processes. The article shows and compares various approaches how to use dbt for ETL/ELT. A previous post contains an introduction into dbt: Data Engineering with dbt – first steps using PostgreSQL and...

read more
PostgreSQL application_name

PostgreSQL application_name

PostgreSQL application_name can be set in the connection string. The view pg_stat_activity will show the application_name to help to identify the sessions. The article shows how to set application_name and how to benefit from it. It is highly recommended to set the...

read more
PostgreSQL columnar extension cstore_fdw

PostgreSQL columnar extension cstore_fdw

PostgreSQL columnar extension cstore_fdw is a storage extension which is suited for OLAP-/DWH-style queries and data-intense applications. Columnar analytical databases have unique characteristics compared to row-oriented data access. Many commercial products exist:...

read more
PostgreSQL partitioning guide

PostgreSQL partitioning guide

PostgreSQL partitioning is a powerful feature when dealing with huge tables. Partitioning allows breaking a table into smaller chunks, aka partitions. Logically, there seems to be one table only if accessing the data, but physically there are several partitions....

read more
Anonymization techniques and data privacy

Anonymization techniques and data privacy

Anonymization techniques are essential for data analytics or in test/dev databases. Anonymization and pseudonymization are very different but often confused. GDPR does not apply to anonymized data anymore. GDPR is still applicable for pseudonymized data that can be...

read more
Log-based Change Data Capture - lessons learnt

Log-based Change Data Capture - lessons learnt

My article on medium summarizes experiences from various projects with log-based change data capture (CDC). There are many use cases for which CDC is beneficial. Some DBs even have CDC functionality integrated without requiring a separate tool. The article first...

read more
Calvin: distributed ACID transactions

Calvin: distributed ACID transactions

Most distributed databases do not offer ACID transactions. The support of linear scalability is the main reason that distributed NoSQL databases like MongoDB, Cassandra, AWS DynamoDB and many others have reduced transactional support. Abadi et al. propose in a paper...

read more
Study on Knowledge Sharing – Spotify Guilds / CoPs

Study on Knowledge Sharing – Spotify Guilds / CoPs

Communications of the ACM published a study on Spotify Guilds / CoPs (Communities of Practice). A CoP is a group of people with similar interests who share their knowledge, solve problems or establish standards. The study examines the challenge of knowledge sharing...

read more
The Zettabyte challenge

The Zettabyte challenge

IDC published a White Paper about the challenge of Big Data Volume in a data-driven world. IDC expects that the data volume will grow from 45 Zettabyte (ZB) in 2020 to 175 ZB in 2025. The data will be produced in various forms like transactional data, text, voices,...

read more
Columnar analytical databases for DWH and Data Analytics

Columnar analytical databases for DWH and Data Analytics

The German magazine BI Spektrum published my article on analytical databases for DWH and Data analytics. The article discusses the characteristics of columnar databases and some analytical database categories. This blog contains a very brief summary....

read more
Q&A on Data Integration and Big Data

Q&A on Data Integration and Big Data

Roberto Zicari did a Q&A with me about Data Integration and Big Data. Covered topics are Data integration, Big Data architecture, ETL, SQL, Hadoop, Data Lake, Data Catalog, Data Quality, education. The interview is available on odbms.org with the following...

read more

Archives

Categories