By Hari Mailvaganam
Data Warehousing is open to an almost limitless range of definitions. Simply put, Data Warehouses store an aggregation of a company's data.
Data Warehouses are an important asset for organizations to maintain efficiency, profitability and competitive advantages. Organizations collect data through many sources - Online, Call Center, Sales Leads, Inventory Management. The data collected have degrees of value and business relevance. As data is collected, it is passed through a 'conveyor belt', call the Data Life Cycle Management.
An organization's data life cycle management's policy will dictate the data warehousing design and methodology.
Figure 1. Overview of Data Warehousing Infrastructure
The goal of Data Warehousing is to generate front-end analytics that will support business executives and operational managers.
The pre-Data Warehouse zone provides the data for data warehousing. Data Warehouse designers determine which data contains business value for insertion.
OLTP databases are where operational data are stored. OLTP databases can reside in transactional software applications such as Enterprise Resource Management (ERP), Supply Chain, Point of Sale, Customer Serving Software. OLTPs are design for transaction speed and accuracy.
Metadata ensures the sanctity and accuracy of data entering into the data lifecycle process. Meta-data ensures that data has the right format and relevancy. Organizations can take preventive action in reducing cost for the ETL stage by having a sound Metadata policy. The commonly used terminology to describe meta data is "data about data".
Before data enters the data warehouse, the extraction, transformation and cleaning (ETL) process ensures that the data passes the data quality threshold.
ETLs are also responsible for running scheduled tasks that extract data from OLTPs.
The Data Warehouse repository is the database that stores active data of business value for an organization. The Data Warehouse modeling design is optimized for data analysis.
There are variants of Data Warehouses - Data Marts and ODS. Data Marts are not physically any different from Data Warehouses. Data Marts can be though of as smaller Data Warehouses built on a departmental rather than on a company-wide level.
Data Warehouses collects data and is the repository for historical data. Hence it is not always efficient for providing up-to-date analysis. This is where ODS, Operational Data Stores, come in. ODS are used to hold recent data before migration to the Data Warehouse.
ODS are used to hold data that have a deeper history that OLTPs. Keep large amounts of data in OLTPs can tie down computer resources and slow down processing - imagine waiting at the ATM for 10 minutes between the prompts for inputs. .
The last and most critical potion of the Data Warehouse overview are the front-end applications that business users will use to interact with data stored in the repositories.
Data Mining is the discovery of useful patterns in data. Data Mining are used for prediction analysis and classification - e.g. what is the likelihood that a customer will migrate to a competitor.
OLAP, Online Analytical Processing, is used to analyze historical data and slice the business information required. OLAPs are often used by marketing managers. Slices of data that are useful to marketing managers can be - How many customers between the ages 24-45, that live in New York state, buy over $2000 worth of groceries a month?
Reporting tools are used to provide reports on the data. Data are displayed to show relevancy to the business and keep track of key performance indicators (KPI).
Data Visualization tools is used to display data from the data repository. Often data visualization is combined with Data Mining and OLAP tools. Data visualization can allow the user to manipulate data to show relevancy and patterns.
Please contact us if you have any questions or suggestions.