Amazing world of Data : Building Blocks of Data Warehouse
In the previous blog, we discussed how data became the new oil of the 21st Century, also explained about concept data warehouses in brief. Now let’s dive deeper into the concept of the Data warehouse. If you haven’t read the previous blog then go ahead and give it a read,
Before moving ahead, one needs to understand the basic difference between a data warehouse and a database, In a data warehouse you “integrate and transform enterprise data into information” suitable for strategic decision making, whereas a database is a “collection of related data” which represents some elements of the real world. What makes a data warehouse different is the defining features of data in the data warehouse. The data in a data warehouse is,
- Subject oriented
- Nonvolatile
- Integrated
- Time variant
Now let’s have a look at the architecture of a data warehouse. Here, architecture means the structure that brings all the components of a data warehouse together. The centerpiece of this structure is the need to deliver integrated data, so we design everything not just to prepare and store the data but to transform it into suitable strategic information. So the architecture provides the overall framework for developing and deploying a data warehouse.
Major building blocks of the data warehouse are,
- Source data
- Data staging
- Data storage
- Information delivery
- Management and control
- Metadata
Each component has a definite function and provides specific services. The architecture of a data warehouse is wide, complex, and expansive and consists of distinct components.
Source data coming into the data warehouse can be grouped into four categories: Production, Internal, External and Archived data.
- Production data comes from various operational systems of the enterprise which include financial systems, manufacturing systems, and systems along with supply chain management.
- Internal data refers to an organization’s private user data like customer profit, departmental databases.
- Archived data refers to periodically taken old operational system data which is stored in an archived file.
- But most of the time executives depend on data from external sources, which contain statistics relating to their industry produced by external agencies and national statistical offices.
This data coming from different sources will be given to the Data Staging Components. Once data is extracted from several discrete sources, major functions like extraction, transformation, and loading take place in the staging area. The staging area consists of a workbench to perform all functions. so to sum up we can say that a data staging area provides a place and an area with a set of functions to clean, change, combine, convert, duplicate and prepare source data for staging and to use in a data warehouse.
While building Data Storage for a data warehouse, some key points must be remembered, large volumes of historical data that is being stored must have a structure so that it is suitable for analysis and should easily accessible and retrievable, so most data warehouses employ relational database management systems but some of the data warehouses also employ multidimensional databases which usually are proprietary products.
Now the data is extracted, transformed, and loaded, it is sent to an Information Delivery Mechanism, which provides for queries and reports through intranet services or internet services like email or some dashboard applications. The users for this information be can a novice user who has no training, therefore, needs prefabricated reports and preset queries or can be a business analyst with the ability to do complex analysis using the information in the data warehouse. Information delivery components of a data warehouse cater to all these needs, it transform data into information.
These all components needs to run in accordance with one another, this crucial task is done by Management and Control Component, this component sits on top of all the other components, and coordinates the services and activities within the data warehouse. The management and control component also controls the data transformation and monitors the movement of data into the staging area and from there into the data storage. It works with the database management system and enables data to properly stored in repositories and also moderates the information delivery to the user.
These all components having a specific tasks come together and form a complex and wide network which we call a Data Warehouse
Wait I didn’t explain what is meant by “Metadata ‘’, that you need to stay tuned for the next blog.Where we will discuss more in depth about the data features mentioned above and also the interesting topic of metadata.
Until then you can connect with me via LinkedIn, send an invite to