数据挖掘导论-2

exploratory data analysis (EDA)

is an approach to analyzing data sets to summarize their main characteristics, often with visual methods.

In this book, we focus on:

1) summary statistics

2) visualization

3) online analytical processing(OLAP)


 

UCI Machine Learning Repository
http://www.ics.uci.edu/~mlearn/MLRepository.html


1. summary statistics:

1) mean is very sensitive to outliers.Thus, the median or a trimmed mean is also commonly used.

2) variance is also sensitive to outliers.

 

Average absolute deviation:


2. Visualization

box plot:

Parallel Coordinates:

不使用纵轴。横轴上是很多attribute(顺序影响解读),每个样本的各属性值在横轴上方的位置标好,连线,即每个样本用一条线表示。


 

3. OLAP

OLAP uses a multidimensional array representation.

 

posted @ 2017-03-11 13:31  陆离可  阅读(143)  评论(0编辑  收藏  举报