Every single Data Warehouse solution in our companies have its own architecture version. Sometimes you can find different names for the same component, and that is good. The real problem comes when you find new components/layers with redundant or without a specific function in the data flow. I hear about solutions where the ODS is part of the transformation process, Master Data with huge catalogs is used as data source, projects without Staging Area and others typical conceptual errors.
In this post we will discuss the basics and what are the components in a “standard” Data Warehouse Architecture.
Transactional/Data Source Layer
This represents the different data sources that will be used for feed the data warehouse, including any database platform, Excel Files, Flat Files and others.
- Always consider the inclusion business user Excel Files, there are a lot of these files with amazing data that is not managed by Software Applications.
ETL – Extraction
This step is a simple “copy and paste” from the different data sources. It’s quite common to extract the complete table without any business rules transformations. When we have some “big” tables, data should be filtered by date or other columns in order to prevent performance issues.
This is a database that is used as a mechanism to centralize the different sources in a single platform, removing the technical difficulties of multiplatform environments, in that way, you can join tables using direct queries.
- Don’t create foreign keys
- Use Truncate Table or Drop/Create Table to clean up the table
- Don’t copy your OLTP indexes: the columns are the same but the queries that are going to run against these tables are different
ETL – Transformation
This is the most important step/layer in the Data Warehouse process. It is where the business rules, standardization, data integration and data quality take place. There are no rules to transform your data, your data doesn’t have rules, you have to fix that mess.
- Consider to use Master Data and Data Quality tools to support your transformations
- Use mapping tables and others auxiliary tables
This is the magic place where everything is “perfect”, we have transformed and cleaned the data. The tables are denormalized and thanks to the ETL process everything is working perfectly.
– Denormalization: this is a fancy word to say “define your tables focused on performance and user usability”
In the past, this layer was considered “the place where business rules are stored”, nowadays this is considered a mechanism to hide technical details to the business user; no more tables, columns or keys, now we only have dimensions and measures.
PowerBI, Tableau, QlikView and including Excel are kind of tools which are involved in this layer, now we are talking about Data Visualization. The business rules and technical issues are hidden behind the OLAP component.
Metadata is just data about data. This layer is basically tables which manage information about the data stored in the data warehouse.
ODS – Operational Data Store
Everybody hates ODS, almost nobody understands the reason to create an ODS. In simple words, an Operational Data Store is a database that is outside the primary Data Warehouse Data Flow but shares some of the benefits: it’s uses the Staging Area as the unique data source and tables are designed to support a specific set of reports.
This Data Warehouse Architecture post is not a recipe, this is just a guide of the primary involved components. We have to understand that there are a lot of different implementations and interpretations about how to build your DW.