By Hari Mailvaganam
After working out the business information of interest to extract from a data warehouse, and designing the OLAP cube with the relevant dimensions and measures, the next step is to choose the type of storage for the OLAP cube.
The correct design decisions made at the inception of the data warehousing project will lower implementation and ongoing maintenance costs. And provide quick and easy to access analytics information to business users.
The designer can store the OLAP data in the data warehouse itself or in a separate repository optimized for cube storage. Depending on the size of your data warehousing solution and the expected connection load, the ideal storage method can be chosen. For larger implementations with frequent queries, a separate optimized OLAP repository will be better suited.
Queries made against an OLAP cube use large amounts of computer resources. In a perfect world, we can store all measures against each level of every combination of dimensions of the OLAP cube. Hence on each occasion a query is made, it can obtain the value from the pre-calculated aggregations. This can save time as the source data will not have to be queried.
However this is not practical in most real-world applications. The number of aggregations increases exponentially with the number of OLAP levels and dimensions:
a = l^d
a - Aggregations, l - Levels, d - Dimensions
Table 1. Number of OLAP Aggregations Increases Exponentially
Many cubes used in practice have over ten dimensions, and can hold on average of four or more levels. These would require over 1 048 576 aggregations. To pre-calculate a million aggregations or more will require an inordinate amount of time and computer resources. Data Warehouse administrators often pre-calculate the more popular aggregations to reduce demand on the database. Companies also typically will restrict the use of OLAP to a select set of users.
In an implementation DWreview recently made with a major retail chain, OLAP aggregations were pre-calculated for popular queries. These were accessible by a broad section of users. Users were given freedom to conducting OLAP queries depending on their levels of permission.
Figure 1. Freedom to Query OLAP Database
Using this design strategy saved considerable amount of hardware and software cost. Together with sound data management policies, the company saved substantially in the data warehousing implementation using this design strategy, without loosing functionality and reliability.
OLAP storage is one of the critical choices to be made when designing the solution. OLAP storage comes in three forms:
MOLAP - Multidimensional OLAP. In MOLAP, both the source data and the aggregations are stores in a multidimensional format. MOLAP is the fastest option for data retrieval, but requires the most disk space. Disk space is less of a concern these days with lowering storage and processing cost.
ROLAP - Relational OLAP. All data, including the aggregations are stored within the source relational database. This will be a concern for larger data warehousing implementations which have higher usage needs. ROLAP is the slowest for data retrieval. Whether an aggregation exists or not, a ROLAP database must access the data warehouse itself. ROLAP is best suited for smaller data warehousing implementations.
HOLAP - Hybrid OLAP. HOLAP is a combination of both the above storage methodologies. HOLAP databases store the aggregations that exist within a multidimensional structure, leaving the cell-level data itself in a relational form. Where the data is pre aggregated, HOLAP offers the performance of MOLAP, where the data must be fetched from the tables. HOLAP is as slow as ROLAP.
Due to shrinking hardware and processing cost, MOLAP are generally most often used. HOLAP is a better solution if the solution is accessing a stand-alone database. ROLAP are more convenient to set up when the query demands are relatively low and also on a stand-alone database.
Please contact us if you have any questions on your OLAP implementation.