[Machine Learning with Python] Data Visualization by Matplotlib Library

 

Before you can plot anything, you need to specify which backend Matplotlib should use. The simplest option is to use Jupyter’s magic command %matplotlib inline. This tells Jupyter to set up Matplotlib so it uses Jupyter’s own backend.

Scatter Plot

housing.plot(kind="scatter", x="longitude", y="latitude")

You can set the parameter alpha to study the density of points:

housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.1)

The plot can convey more information by setting different colors, sizes, shapes, etc. Here we will use a predefined color map (option cmap) called jet. As an example, we plot the house prices in different locations and let the radius of each circle represents the district’s population (option s), and the color represents the price (option c).

1 %matplotlib inline
2 import matplotlib.pyplot as plt
3 housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.4,
4     s=housing["population"]/100, label="population", figsize=(10,7),
5     c="median_house_value", cmap=plt.get_cmap("jet"), colorbar=True,
6     sharex=False)
7 plt.legend()
8 save_fig("housing_prices_scatterplot")

 

Note that the argument sharex=False fixes a display bug (the x-axis values and legend were not displayed). This is a temporary fix (see: https://github.com/pandas-dev/pandas/issues/10611).

 Scatter Matrix

1 from pandas.plotting import scatter_matrix
2 
3 attributes = ["median_house_value", "median_income", "total_rooms",
4               "housing_median_age"]
5 scatter_matrix(housing[attributes], figsize=(12, 8))
6 save_fig("scatter_matrix_plot")

 

 Histogram

Histogram is a useful method to study the distribution of numeric attributes.

1 %matplotlib inline
2 import matplotlib.pyplot as plt
3 housing.hist(bins=50, figsize=(20,15))
4 save_fig("attribute_histogram_plots")
5 plt.show()

 For single attribute, you can use the following statement:

housing["median_income"].hist()

Correlation Plot

We can calculate the correlation coefficients between each pair of attributes using corr() method and look at the value by sort_values():

corr_matrix = housing.corr()
corr_matrix["median_house_value"].sort_values(ascending=False)

 

Also, we can use scatter_matrix function, which plots every numerical attribute against every other numerical attribute. The diagonal displays the histogram of each attribute.

from pandas.tools.plotting import scatter_matrix
attributes = ["median_house_value", "median_income", "total_rooms", "housing_median_age"]
scatter_matrix(housing[attributes], figsize=(12, 8))

 

posted @ 2018-12-29 05:06  Sherrrry  阅读(450)  评论(0编辑  收藏  举报