Friday, October 29, 2010

EXAAMMMMMMS!!!

BIA Exam in on 8th November......I can only hope POD has not set the question paper with more creativity than the past year exam papers. :)

With this hopefully the last blog, I have a confession to make. Although this blog was purely greedy in motive for the 3% bonus, I actually feel it was not about just scribbling few words every week. There's more to it...to write something useful in here, I had to read about the topics from the POD's lecture contents and when I will open the lecture notes before exams, I will have the feeling of 'deja-vu' which hopefully would calm down my brain.

The other units, where there was no regular dose of recap (no bonus on offer ...so no blogs....which means no RECAP :)), my brain cries out loud..."HELLLOOO...THIS IS SOMETHING NEW...DON'T PRESSURISE ME TO REMEMBER IT".

All the BEST to me and my fellow mates of FIT5093!!!
All the BEST POD to bear with assessing our answer sheets :)

Sunday, October 10, 2010

Week 11

Couldn't attend the Week 10 lecture due to bad health but did see POD's lecture material and got the message which he wanted to convey -

It's the Interface Stupid!

Apparently, Interfaces don't get much attention in BIA world and are often neglected by vendors.
Pivot tables are most popular but people find them complex and not apt for many situations.

Tufte's Good visualisations rules emphasise upon- Simplification of reports which shows data and not complex aggregations; data presented should be comparable.

Learnt about few tips from Fres and Miller (1998) about Interface design like choosing appropriate charts (Ignore Pie Charts), consistent layout and colours, right kind of scale, 3D's to be used only when really needed(read Never!) and honesty and integrity(Moral science lessons suggesting not to cook data up!!)

Ehrenberg's has focused on the objective of the presentation to be either telling a 'story' or for records. He has suggested few rules for presentation of data like rounding off to significant digits; use of row and column averages to help understand whether its up or down; comparison is easier in columns(to be used to decide the display of data); row and columns should be ordered by size; figures which need to be compared should be placed side by side (avoid grid lines); use of figure to compare data. Also, colour need to be effectively used.

Also learnt some tips for report designing and Interface design. Neilson suggested testing of BIA by analysing if the users are using the system the way we designed it and also that a review exercise should be done every Quarter is possible.

Just realised how useful the screenr links from POD are, specially for assignments. Got a few tips by viewing last year's assignment related help. Edward Tufte developed a tool called Sparklines which can be attached with Excel( I am gonna try it for sure).

So, yeah it's the Interface stupid!

P.S - Thanks POD for ur GIFT of extension of ** days :)

Tuesday, October 5, 2010

BIA Assignment!!

This blog is about BIA Assignments, not that the Assignments are not good but just that I am struggling to understand the case study.

Read the Ralph Kimball chapter on Airline case study and did get some hints about various ways to handle the role-playing through super-dimensions,junk dimensions etc. However, the book did not help much with ways to implement it in SQl Server Business Intelligence Development Studio.

After various attempts to do the assignment and reach past the data source view stage, finally went to POD for help. He made(as always) it so easy by giving hints and encouraging to do simple things first and assured that there will be a penalty-free window after due date for assignment submission.

Soon after getting confirmation from POD on my understanding of superdimension through 'namedquery' as we did in tutorial's Aroma case study, I was charged up and created my 'citypair' and 'classofservice' superdimensions.

So yeah the assignment is moving ahead. Just hoping I will submit one good BI Application for SkyWay people and get good marks as well :)

Tuesday, September 21, 2010

Week 9- Strategic Business Functions

This blog will cover Strategic Business Functions approach in detail with a brief of Critical Success Factors approach.

CSF - Critical Success Factors- Different functional managers are asked question about the important things they would like to enquire/know about their business once they are back from a vacation. The answers are then analysed to define the critical success factors. Critical success factors are difficult to define because the definition of critical success factors can conflict between different function divisions example - inventory manager would like to keep his inventory low and maximise deliveries and at the same time, transport managers would like to minimise deliveries and increase inventory.

Strategic Business Functions approach- This is one of a technique used but is not well known. Failures of CSF techniques led to the rise of SBF approach. Does not ask individuals but only analysts about their views.
Characteristics-
1. Develops an understanding of the executive roles
2. What functions the business does it is important (Strategic)
3. Identifies linkages between the strategic functions
4. Allows for prototyping
5. Aids in well prioritised development
6. More of a wish list.

Talk to executives in groups so that conflicts are not ignored but resolved. The time to arrange such meetings is often through retreats (excursions, workshops, brainstorming sessions) . Discussion is about what is important to business and their linkages. Prioritise the SBF's. Once SBF's are identified, the data sources are identified. Then development takes place as per the prioritised SBF's.

SBF are easier to start off and are less demanding of exec's time. Real strategic business objectives are identified and clarification of inter-dependency between functions is identified. Clear understanding of critical needs of the organisation and more consistent reports. Achieve the SBF's by questioning "WHY" we need rather than focusing on "WHAT" we need.
Cause Effect understanding is achieved...how one change effects other function units.

TQM - Total Quality Management
Collect data, use simple statistical tools like Histograms, Pareto Charts, Run Charts, Shewhart Charts etc. to analyse data and take actions to improvise systems. Interestingly, there is no evidence that graphs improves a decision maker's understanding of data.Wow! why use them??

Business Performance Measurement -
1) Productivity Measurement is measurement of the ability of a firm to create something having exchange value.
EIS/DSS/BI applications often report productivity by measuring inputs being transformed into outputs through some production process and it does not limit to a manufacturing organisation.
Steps - Measure inputs ,measure outputs ,know the process of transformation in detail.
Productivity is increased getting more outputs from same number of inputs.
Law of diminishing returns - Varying one input and keeping others constant, some productivity increase is adopted.

Production Function is mathematical expression that states the maximum amount of output that can be produced by same number of input. Shifting to a new production function represents a change in productivity.

Exam date-sheet is up and is giving me goosebumps....I am still struggling with assignments...Phew! never realised time flies this fast. I had promised myself to get a head-start on BIA assignment and well I did, if you count connecting to database and checking out the data. Well, I have been punctual with my tutorials and Aroma case study is going on well. Will try to give more time this weekend and post my updates.

P.S - If you mistake the blog for a 'lecture note', then apologies in advance. Actually, I typed it during the lecture itself to save some time.

Wednesday, September 15, 2010

Week 8 -IBM Cognos' 24 Ways

Learnt about IBM Cognos' 24 ways to design a multi-dimensional models which cover different functional domains- Sales, Marketing, Human Resources, Distribution, Purchasing, Production, Customer Service and last but not the least IT Systems. These models are a good reference points for someone who wants to develop a Multi-dimensional model for a particular functional area.

Cognos' 24 ways are based on some assumptions as mentioned below with my personal comments in bracket-
1. Good quality data is available.
(High hopes!)
2. Iterative implementation.
(BI systems generally follow iterative approach)
3. Believe that organisations have a lot in common.
(True for some functional domains like Finance, HR, Sales etc.)
4. Dimensions are shared between functional domains
(Good approach)
5. Focus is on business people and not IT people
( Pragmatic approach, not really beneficial)

They promote that sharing the dimensions will lead to better co-ordination in decision making across functional areas and following the 24' ways will result in better ROI( Return on Investment). They start up with a consultant led 90 days implementation of the FINANCE domain which is a pragmatic approach as Finance is the only division wherein a company would have good quality data readily available.

Lectures are going well in terms of content....only slightly behind in schedule but never mind, I m sure one day we will catch up! POD literally gives away so much of information in one lecture that you feel rich in terms of knowledge.

Tutorials are going fine as well and I don't face difficulty in doing things as the specifications are so descriptive. However, people in our lab face challenge in 'processing' the dimensions due to some stupid privilege issues which props up if we try to save our workspace on C:/ drive. The people who save on their flash drives can apparently run successfully. I will make sure to take my flash drive from now onwards. And of-course it works fine in my laptop's work environment as well.

Unfortunately, I have not started with my assignment yet. There is just too much of writing work for other units which I am still slogging to meet the due dates. Hopefully, I will have some progress done till I write my next blog. AMEN!!

Friday, September 3, 2010

Multi-Dimensional Design

Learnt about multi-dimensional database designing this week. Assuming we all know what a Cube...errr a multi-dimensional Structure is, let us know more about their design. A multi-dimension database is a collection of many such related cubes. The dimensions of the multi-dimensional structure provide slice and dice features for the data. Following points should be kept in mind while designing such database-

1) Identify additive and non-additive items(or facts) - For example price of products are non-additive while the number of units of a product are additive.

2) Sparsity and Density of the dimension -

Sparse dimension means that only a small fraction of the available data positions are filled. For example - if a cube has a dimension "Top 10 customer"then only few data positions will be filled as only few customer would fit in that dimension and rest all customers will have that data position empty.

Dense dimension would have larger percentage of available data positions occupied.

3) Star schema or Snowflake schema for storing the data.
Star schema has one main table(fact table)having composite primary key connected to the dimension tables through foreign key - primary key relationship.

Star schema requires a separate fact table called as Aggregate table for each hierarchy level.

From star schema example

Snowflake schema is similar to a star schema having normalised dimensions. There is still 1 fact table joined to various dimension tables which are further broken into closely related tables after proper database normalisation.
Snowflake is useful to handle sparse dimensions but star schema is good if the queries are expected to be simple as less number of joins will be required than a snowflake schema.

From snowflake schema example
4) Calculating measures from other measures - The decision has to be carefully made if a calculated measure has to be stored pre-calculated in the fact table or calculated on the fly as requested. The storage space, computational time and also the cross dimension nature of such measures are to be kept in mind.

5) Keep room for changing dimensions like customer addresses

Various tools used for designing multi-dimensional databases are MS Analysis Services ( which we are using in the tutorials :), Palo, Pentaho etc.

There are various methods to conceptualise a multi-dimensional model like-

1) Thomsen's diagram - A few vertical & horizontal lines is all you need to represent the model.
Its beauty is its Simplicity. Also POD's favourites!

2) Adapt's diagram - Symbolic representation of the model. Microsoft is one of its patron, so it might gain popularity.

3) Miscrosoft PivotDiagrams - Graphical representation.

Too much for today! Time to say goodbye and work on the assignments which are running faster than me towards the due date.

Cognos 24 ways coming up in next blog. It reminds me to order the free book from Cognos before they figure out from their BI reports that Monash students are among their TOP 10 buyer's and they stop shipping freebies!

Keep Blogging!

Friday, August 27, 2010

OLAP Council & Codd's Rules

As promised in the previous blog, I am back with a brief on the now dead OLAP council and Codd's OLAP rules.

OLAP council was formed in January,1995 with the aim of setting a common platform for OLAP vendors and serve as Industry guide. The mission as seen on their website- http://www.olapcouncil.org/index.html.

The council came up with Multi-Dimensional Application Programming Interface (MDAPI) and its revisions to allow standardisation and interoperability between various OLAP products. Several vendors joined OLAP council including IBM, Business Objects and so on while Microsoft choose to stay away and launched its OLE DB Data access API with multi dimensional capabilities. Ultimately, OLAP council was shut down (though their website still exists) and was replaced by Analytical Solutions Forum (ASF) in October 1999 which was even more inefficient than OLAP council and soon passed away. Microsoft then became the pioneer in Business Intelligence world.

Too much of a dose of History...yeah? Lets bring on the floor Codd's rules.

Edgar Frank "Ted" Codd coined the term OLAP and gave 12 rules for online analytical processing. However, there is some controversy in the BI industry on the number of these rules as one can witness quite some redundancy in these rules.

Rule 1- Multidimensional conceptual view
Rule 2- Transparency

Rule 3- Accessibility

Rule 4- Consistent reporting performance

Rule 5- Client/server architecture

Rule 6- Generic dimensionality

Rule 7- Dynamic sparse matrix handling

Rule 8- Multi-user support

Rule 9- Unrestricted cross-dimensional operations

Rule 10- Intuitive data manipulation

Rule 11- Flexible reporting

Rule 12- Unlimited dimensions and aggregation levels.

Learnt from POD that dimensions usually are 5-6 and can go up to 12 ( not infinite as Todd stated!). He also mentioned that sufficient efforts must be put into handling sparsity of the data as the same algorithm might not be applicable in all scenarios.

Finally the suspense of Cubes, Dimensions, Facts and OLAP is over!

Week 6 Already!...How is it possible? Didn't we just start?
Gotta put on my running shoes and sprint for the sake of Assignments!

After many years of being a Mircrosoft patron, I finally bought my Mac and thought I would never need that windows XYZ(umpteen possible OS versions) again but much to my dislike, have to install n numbers of Microsoft products(Windows OS, MS office, MS SQL Server, Visual Studio .NET etc.) for my assignment. Still struggling with the installations and wondering if I will ever have a working development environment on my laptop for my Assignment.... :(

Sorry, I got offtrack ....we are not here to discuss about my trifle with technology but to talk about Business Intelligence concepts.

Finally I got to know the much spoken jargons of the BI world... OLAP, MOLAP, ROLAP, HOLAP, MDM, Cubes, Dimensions, Facts and so on.

I used to think this all Data warehousing, mining and Business Intelligence and all thing is pretty recent but POD mentioned that the terms Dimensions and Facts were coined in 1960's by General Mills together with Dartmouth college. WOW!!

Below is a short and sweet description of the above mentioned jargons not in the same order though.

Facts - Measurable data which is of interest to the user.

Dimension - A particular angle or perspective to view the facts. For example - if facts are the sales per day for 10 various products and as a user if I want to see the sale of a particular product or sale in a week. Time and products will be dimensions here and fact will the sale figures.

The dimensions can have hierarchy for example country can have child as states and states can have cities and so on.

The position in the hierarchy is identified with level.

I learnt that there is always a Time Dimension!

There are Dimension tables to store information(attributes) about that dimension example a customer dimension table can have customer name, address and other information along with a unique key to identify a customer which we can call as customer key. So, a dimension table has Key, Attributes and Values.

A Fact table has measurable Facts(measures) and keys to the Dimension tables. Measures are numeric and describe the quantitative aspect of a fact. They can be either calculated or based on column of a table. Example sale units, revenue, cost etc.

Multi-Dimensional Model (MDM) is a representation of data in terms of dimensions, measures, dimension attributes, hierarchy, level.

Figure - Diagram of the Logical Multidimensional Model

(Reference- Oracle® OLAP Application Developer's Guide10g Release 1 (10.1))

Now the C word( Cube!) hated by POD and logically so. The cube is a kind of metaphor used in BI world and using it front of a lay man should be punishable crime.

Cube is an amalgam of dimensions and measures. The edges of the cube mark the dimensions while the body contains measures.

OLAP stands for On-Line Analytical Processing and it is an approach to answer multi-dimensional queries.

MOLAP stands for Multi-dimensional OLAP and it involves data being stored in form of multi dimensional cubes. Fast as cubes are pre-calculated but can not handle much data.

ROLAP stands for Relational OLAP and the underlying data being is stored in form of in form of relational database. Can handle large volumes of data but it is slow to retrieve the data.

HOLAP stands for Hybrid OLAP and it combines best of the MOLAP and ROLAP models. For summary type information, MOLAP's cube are leveraged and for drilling down details, ROLAP is used.I found one more flavour of OLAP called as DOLAP (Desktop OLAP) in which the cubes are stored in desktops and the analysis can be done in offline mode.

ROLAP, MOLAP and DOLAP are represented as Option 2, 3 and 4 respectively below.

Options for Supporting OLAP by Colin White

(Source-OLAP in the Database,Intelligent Business Strategies,Information Management Magazine, June 2003)

Also learnt the CODD's 12(Controversial!) rules but will cover that in next blog along with the introduction of now dead OLAP council.

Saturday, August 21, 2010

BIA-fit5093

I know I am starting late...Week 5 is over and I am just starting with my blogs! Well, I was tossing around the idea of a blog in my mind since last couple of days but being totally new to this Blogging world, had my own inhibitions. This being my first semester of coursework after a gap of 5 years since undergrad, I did feel like a kid going school for the first time, shy and lost! I owe it to you POD for being so encouraging and making this subject so interesting that I had to take the plunge.

I hope I will catch up fast. Lets get going.....

I have been exposed to various databases like Oracle 8i, IBM DB2, MS SQL Server and not to forget the good old MS Access and reporting tools like Crystal Reports and Jasper reports during my work experience. Always heard about jargons like OLAP , Data Warehouse, cubes etc. and was fascinated by them but never got lucky to know them better. This is the reason why FIT5093 is a unit in my Moodle (I can pat myself for making such a Good Choice).

Already subscribed to the podcasts, posted my first blog and got TweetDeck installed on my mac...I m already more techsavvy :) ...THANKS POD!

Friday, August 20, 2010

My First Pivot Table!

I had my first stint with Pivot Table in lab. for Business Intelligence Applications and I must say, it is as Cool as it sounds. Having missed my previous week's tutorial work, I thought it would be some really difficult task this week as it is supposed to be in continuation with last week's work, but surprisingly it was good fun. I must credit the well written specifications from the tutor as they really served as a launch pad.

Pivot tables are a smart way to display your data. It was a feature of our very own MS Excel which I had always known it exists but never ever explored before. I admit I wasn't disappointed and it is really as simple as drag and drop. The pivot table needs data( after all you need to display some data, right?) which can be imported either from a simple source as another excel file or any database through ODBC connection. You can drag drop the fields from your data in rows and columns and pivot table will smartly create summary for you. It intelligently automates for you the summary of your data in a readable grid format.

If you are connected to a database through network, you can also save an offline copy in .cub format. Another interesting way of creating pivot table is through Panorama (some fancy name eh?) using google docs widget.

Pivot Table using Panorama widget

You can play around with the features of pivot table but yes, every good thing has a limit. Pivot table has few limitations for example - it does not automatically refreshes itself if the underline raw data has changed (I witnessed it in lab.). However, I am sure some intelligent macros are possible which can make pivot table refresh automatically.

Let me know your good and bad point list about Pivot table.