BK: Data mining, Chapter 2 - getting to know your data

Why: real-world data are typically noisy, enormous in volume, and may originate from a hodgepodge of heterogeneous sources. 

mean; median; mode(most common value); distribution; 

Knowing such basic statistics regarding each attribute makes it easier to fill in missing values, smooth noisy values, and spot outliers during data preprocessing.

posted @ 2020-02-11 02:23  keeps_you_warm  阅读(141)  评论(0编辑  收藏  举报