Friday, August 27, 2010

OLAP Council & Codd's Rules

As promised in the previous blog, I am back with a brief on the now dead OLAP council and Codd's OLAP rules.


OLAP council was formed in January,1995 with the aim of setting a common platform for OLAP vendors and serve as Industry guide. The mission as seen on their website- http://www.olapcouncil.org/index.html.





The council came up with Multi-Dimensional Application Programming Interface (MDAPI) and its revisions to allow standardisation and interoperability between various OLAP products.  Several vendors  joined OLAP council including IBM, Business Objects and so on while Microsoft choose to stay away and launched its OLE DB Data access API with multi dimensional capabilities. Ultimately, OLAP council  was shut down (though their website still exists) and was replaced by Analytical Solutions Forum (ASF) in October 1999 which was even more inefficient than OLAP council and soon passed away. Microsoft then became the pioneer in Business Intelligence world.


Too much of a dose of History...yeah? Lets bring on the floor Codd's rules. 


Edgar Frank "Ted" Codd coined the term OLAP and gave 12 rules for online analytical processing. However, there is some controversy in the BI industry on the number of these rules as one can witness quite some redundancy in these rules.


Rule 1- Multidimensional conceptual view
Rule 2- Transparency
Rule 3- Accessibility 
Rule 4- Consistent reporting performance
Rule 5- Client/server architecture
Rule 6- Generic dimensionality
Rule 7- Dynamic sparse matrix handling
Rule 8- Multi-user support
Rule 9- Unrestricted cross-dimensional operations
Rule 10- Intuitive data manipulation
Rule 11- Flexible reporting
Rule 12- Unlimited dimensions and aggregation levels.
    Learnt from POD that dimensions usually are 5-6 and can go up to 12 ( not infinite as Todd stated!). He also mentioned that sufficient efforts must be put into handling sparsity of the data as the same algorithm might not be applicable in all scenarios. 

Finally the suspense of Cubes, Dimensions, Facts and OLAP is over!

Week 6 Already!...How is it possible? Didn't we just start?
Gotta put on my running shoes and sprint for the sake of Assignments!

After many years of being a Mircrosoft patron, I finally bought my Mac and thought I would never need that windows XYZ(umpteen possible OS versions) again but much to my dislike, have to install n numbers of Microsoft products(Windows OS, MS office, MS SQL Server, Visual Studio .NET etc.)  for my assignment.  Still struggling with the installations and wondering if I will ever have a working development environment on my laptop for my Assignment.... :(

Sorry, I got offtrack ....we are not here to discuss about my trifle with technology but to talk about Business Intelligence concepts.

Finally I got to know the much spoken jargons of the BI world... OLAP, MOLAP, ROLAP, HOLAP, MDM, Cubes, Dimensions, Facts and so on.

I used to think this all Data warehousing, mining and Business Intelligence and all thing is pretty recent but POD mentioned that the terms Dimensions and Facts were coined in 1960's by General Mills together with Dartmouth college. WOW!!

Below is a short and sweet description of the above mentioned jargons not in the same order though.

Facts - Measurable data which is of interest to the user.

Dimension - A particular angle or perspective to view the facts. For example - if facts are the sales per day for 10 various products and as a user if I want to see the sale of a particular product or sale in a week. Time and products will be dimensions here and fact will the sale figures.

The dimensions can have hierarchy for example country can have child as states and states can have cities and so on.

The position in the hierarchy is identified with level.

I learnt that there is always a Time Dimension!

There are Dimension tables to store information(attributes) about that dimension example a customer dimension table can have customer name, address and other information along with a unique key to identify a customer which we can call as customer key. So, a dimension table has Key, Attributes and Values.

A Fact table has measurable Facts(measures) and keys to the Dimension tables.  Measures are numeric and describe the quantitative aspect of a fact.  They can be either calculated or based on column of a table. Example sale units, revenue, cost etc.

Multi-Dimensional Model (MDM) is a representation of data in terms of dimensions, measures, dimension attributes, hierarchy, level.





 Figure - Diagram of the Logical Multidimensional Model
(Reference- 
Oracle® OLAP Application Developer's Guide10g Release 1 (10.1))


Now the C word( Cube!) hated by POD and logically so. The cube is a kind of metaphor used in BI world and using it front of a lay man should be punishable crime.

Cube is an amalgam of dimensions and measures. The edges of the cube mark the dimensions while the body contains measures.

OLAP stands for On-Line Analytical Processing and it is an approach to answer multi-dimensional queries.


MOLAP stands for Multi-dimensional OLAP and it involves data being stored in form of multi dimensional cubes. Fast as cubes are pre-calculated but can not handle much data.

ROLAP stands for Relational OLAP and the underlying data being is stored in form of  in form of  relational database. Can handle large volumes of data but it is slow to retrieve the data.

HOLAP stands for Hybrid OLAP and it combines best of the MOLAP and ROLAP models. For summary type information, MOLAP's cube are leveraged and for drilling down details, ROLAP is used.
I found one more flavour of OLAP called as DOLAP (Desktop OLAP) in which the cubes are stored in desktops and the analysis can be done in offline mode. 


ROLAP, MOLAP and DOLAP are represented as Option 2, 3 and 4 respectively below.




                                           Options for Supporting OLAP by Colin White 

(Source-OLAP in the Database,Intelligent Business Strategies,Information Management Magazine, June 2003)


Also learnt the CODD's  12(Controversial!) rules but will cover that in next blog along with the introduction of now dead OLAP council.

Saturday, August 21, 2010

BIA-fit5093

I know I am starting late...Week 5 is over and I am just starting with my blogs! Well,  I was tossing around the idea of a blog in my mind since last couple of days but being totally new to this Blogging world, had my own inhibitions. This being my first semester of coursework after a gap of 5 years since undergrad, I did feel like a kid going school for the first time, shy and lost! I owe it to you POD for being so encouraging and making this subject so interesting that I had to take the plunge.

I hope I will catch up fast. Lets get going.....


I have been exposed to various databases like Oracle 8i, IBM DB2, MS SQL Server and not to forget the good old MS Access and reporting tools like Crystal Reports and Jasper reports during my work experience. Always heard about jargons like OLAP , Data Warehouse, cubes etc. and was fascinated by them but never got lucky to know them better. This is the reason why FIT5093 is a unit in my Moodle (I can pat myself for making such a Good Choice).

Already subscribed to the podcasts, posted my first blog and got TweetDeck installed on my mac...I m already more techsavvy :) ...THANKS POD!

Friday, August 20, 2010

My First Pivot Table!

I had my first stint with Pivot Table in lab. for Business Intelligence Applications and I must say, it is as Cool as it sounds.  Having missed my previous week's tutorial work, I thought it would be some really difficult task this week as it is supposed to be in continuation with last week's work, but surprisingly it was good fun. I must credit the well written specifications from the tutor as they really served as a launch pad.

Pivot tables are a smart way to display your data. It was a feature of our very own MS Excel which I had always known it exists but never ever explored before. I admit I wasn't disappointed and it is really as simple as drag and drop.  The pivot table needs data( after all you need to display some data, right?) which can be imported either from a simple source as another excel file or any database through ODBC connection. You can drag drop the fields from your data in rows and columns and pivot table will smartly create summary for you. It intelligently automates for you the summary of your data in a readable grid format.

If you are connected to a database through network, you can also save an offline copy in .cub format. Another interesting way of creating pivot table is through Panorama (some fancy name eh?) using google docs widget. 

Pivot Table using Panorama widget


You can play around with the features of pivot table but yes, every good thing has a limit. Pivot table has few limitations for example - it does not automatically refreshes itself if the underline raw data has changed (I witnessed it in lab.). However, I am sure some intelligent macros are possible which  can make pivot table refresh automatically. 

Let me know your good and bad point list about Pivot table.