Data Warehouse
Knowledge Discovery Process
OLTP & OLAP
联机事务处理(OLTP, online transactional processing)系统:涵盖组织机构大部分的日常操作,purchasing, inventory, banking,manufacturing, payroll, registration, accounting
联机分析处理(OLAP, online analytical processing)系统:以不同的格式组织和提供数据,以满足不同用户的各种需求,为数据分析和决策方面提供服务。
Distinct features (OLTP vs. OLAP):
User and system orientation: customer vs. market
Data contents: current, detailed vs. historical, consolidated
View: current, local vs. evolutionary, integrated
Access patterns: update vs. read-only but complex queries
Data Warehouse
DBMS— tuned for OLTP: access methods, indexing, concurrency control, recovery
Warehouse—tuned for OLAP: complex OLAP queries, multidimensional view, consolidation
Data Warehouse:
数据仓库将分布在企业网络中不同信息岛上的业务数据集成到一起,存储在一个单一的集成关系型数据库中,利用这样的集成信息,可方便用户对信息访问,可使决策人员对一段时间内的历史数据进行分析,研究事务的发展走势。
A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision-making process.” — W. H.Inmon
data stored in data warehouse has been processed after extracation, cleaning, transformation, load(sort, summarize...) and refresh.
Data Warehouse model : dimensions and measures, you can locate some data by dimension and see the data by measures
Conception model : star schema, snowflake schema(a refinement of star schema), fact constellations(a collection of stars)
Example of Star Schema:
Typical OLAP Operations :
Roll up: summarize data by climbing up hierarchy or by dimension reduction, you can roll up to all to reduce a dimension
Dill down: reverse of Roll-up, from higher level summary to lower level summary or detailed data
Slice and dice: project and select
Priot(rotate): reorient the cube, visualization, 3D to series of 2D planes.
参考
中国科学院大学《数据挖掘》课程slices