[Machine Learning with Python] Familiar with Your Data

Here I list some useful functions in Python to get familiar with your data. As an example, we load a dataset named housing which is a DataFrame object. Usually, the first thing to do is get top five rows the dataset by head() function:

housing = load_housing_data()
housing.head()

The info() method is useful to get a quick description of the data, in particular the total number of rows, and each attribute’s type and number of non-null values.

housing.info()

The describe() function will return statistics including count, mean, median, std, min, max and quantiles of each feature.

housing.describe()

For categorical varibles, we usually hope to see the labels and the count for each label. value_counts() function works here:

housing["ocean_proximity"].value_counts()

That’s it. I’ll update more functions if I meet in further study.

posted @ 2018-12-29 04:37  Sherrrry  阅读(109)  评论(0编辑  收藏  举报