Stars and Cubes -- <One> Dimension Table Features

Surrogate Keys and Natural Keys
In a star schema, each dimension table is assigned a surrogate key. This key is not a carryover from an operational system. It is created especially for the data warehouse or data mart. Surrogate keys are usually integers, generated and managed as part of the extract, transform, load (ETL) process that loads the star schema. The key values have no intrinsic meaning and are not of interest to users of the data warehouse. In each row of the dimension table, the surrogate has a unique value.

Separate and distinct from surrogate keys, one or more natural keys will also be present in most dimension tables. The natural keys are identifiers carried over from source systems. They may not uniquely identify a row in the data warehouse, but they do identify a corresponding entity in the source system. Unlike surrogate key values, the values in natural key columns may have meaning to users of the data warehouse. Even when they do not carry significant meaning, their presence is necessary for the ETL routines that load fact tables.

The use of surrogate keys as unique identifiers allows the data warehouse to respond to changes in source data in whatever manner best fits analytic requirements. Because the dimension table does not rely on the natural key to identify unique rows, it can maintain history even if the source system does not. For example, an order entry system might contain a record for customer_id 404777, which includes the customer’s address. If the system overwrites the address when a customer moves, it is not tracking history. Were the customer dimension table to use the same customer_id to identify unique rows, it would be able to store only one row for customer_id 404777. It would be unable to maintain the history of the address. By using a surrogate key, it becomes possible to maintain two versions of customer_id 404777.

=============================================================================================

Rich Set of Dimensions
Dimensions provide context for facts. Without context, facts are impossible to interpret. For example, I might tell you, “Order dollars are $40,000.” This statement is of no use to you. It presents a fact (order dollars) without any explanation. Is this orders for one product or all products? Is it one day’s worth of orders or one year’s? To understand what “$40,000” means, you need more information.

Dimensions and their values add meaning in many ways:
• They are used to filter queries or reports. 
• They are used to control the scope of aggregation for facts.
• They are used to order or sort information. 
• They accompany facts to provide context on reports. 
• They are used to define master–detail organization, grouping, subtotaling, and summarization.
Put to these uses, dimensions unlock the value of facts. Dimensions can be added to queries in different combinations to answer a wide variety of questions. The larger the set of dimension attributes, the more ways that facts can be analyzed.

Dimension tables with a large number of attributes maximize analytic value. They can be thought of as wide.

posted on 2012-02-27 18:13  Jia  阅读(230)  评论(0编辑  收藏  举报