By Hari Mailvaganam
"There is a tide in the affairs of men, which taken at the flood, leads on to fortune."
Julius Caesar (Act IV Scene 3 line 218 -Brutus) by William Shakespeare
The primary rational for data warehousing is to provide businesses with analytics results from data mining, OLAP, Scorecarding and reporting. The cost of obtaining front-end analytics are lowered if there is consistent data quality all along the pipeline from data source to analytical reporting.
Figure 1. Overview of Data Warehousing Infrastructure
Metadata is about controlling the quality of data entering the data stream.
Batch processes can be run to address data degradation or changes to data policy. Metadata policies are enhance by using metadata repositories.
One of the projects we recently worked on was with a major insurance company in North America. The company had amalgamated over the years with acquisitions and also had developed external back-end data integrations to banks and reinsurance partners.
Figure 2. Disparate Data Definition Policies in an Insurance Company
The client approached DWreview as they felt that they were not obtaining sufficient return-on-investments on their data warehouse. Prediction analysis, profit-loss ratio and OLAP reports were labor and time intensive to produce. The publicly listed insurance company was also in the process of implementing a financial Scorecarding application to monitor compliance with the Sarbanes-Oxley act.
In consultation with the company's IT managers, we analyzed the potential quid-pro-quos of different design changes.
The first step in the process of realignment the data warehousing policies was the examination of the metadata policies and deriving a unified view that can work for all stakeholders. As the company was embarking on a new Scorecarding initiative it became feasible to bring the departments together and propose a new enterprise-wide metadata policy.
Departments had created their own data marts for generating quick access to reports as they had felt the central data warehouse was not responsive to their needs. This also created a bottleneck as data was not always replicated between the repositories.
With the IT manager's approval and buy-in of departmental managers, a gradual phase in of a company-wide metadata initiative was introduced. Big bang approaches rarely work - and the consequences are extremely high for competitive industries such as insurance.
The metaphor we used for the project was the quote from Julius Cesar by Shakespeare given at the start of the article. We felt that this was a potentially disruptive move but if the challenges were met positively, the rewards would be just.
Figure 3. Company-wide Metadata Policy
Industry metadata standards exists in industry verticals such as insurance, banks, manufacturing. OMG’s Common Warehouse Metadata Initiative (CWMI) is a vendor back proposal to enable easy interchange of metadata between data warehousing tools and metadata repositories in distributed heterogeneous environments.
Figure 4. Partial Schematic Overview of Data Flow after Company-wide Metadata Implementation
In the months since the implementation, the project has been moving along smoothly. There were training seminars given to keep staff abreast on the development and the responses were overwhelmingly positive.
The implementation of the Sarbanes-Oxley Scorecarding initiative was on time and relatively painless. Many of the challenges that would have been faced without a metadata policy were avoided.
With a unified data source and definition, the company is embarking further on the analysis journey. OLAP reporting is moving across stream with greater access to all employees. Data mining models are now more accurate as the model sets can be scored and trained on larger data sets.
Text mining is being used to evaluate claims examiners comments regarding insurance claims made by customers. The text mining tool was custom developed by DWreview for the client's unique requirements. Without metadata policies in place it would be next to impossible to perform coherent text mining. The metadata terminologies used in claims examination were developed in conjunction with insurance partners and brokers. Using the text mining application, the client can now monitor consistency in claims examination, identify trends for potential fraud analysis and provide feedback for insurance policy development.
Developing metadata policies for organizations falls into three project management spheres - generating project support, developing suitable guidelines and setting technical goals. For a successful metadata implementation strong executive backing and support must be obtained.
A tested method for gathering executive sponsorship is first setting departmental metadata standards and evaluating the difference in efficiency. As metadata is abstract in concept a visceral approach can be helpful. It will also help in gaining trust from departments that may be reluctant to hand over metadata policies.
Please contact us if you have any questions.