Python实现简单的数据可视化

现在python这门解释型语言被越来越多的人们喜欢，强大的库支持，使得编程过程变得简单。

我是一个传统的C语言支持者，往后也打算慢慢的了解Python的强大。

今天我就学习一下使用python实现数据可视化。

参考：https://mp.weixin.qq.com/s/Nb2ci6d5MhoRoepu6G3YdQ

1 安装依赖库
——

◈ NumPy 用于简化数组和矩阵的操作

◈ SciPy 用于数据科学

◈ Matplotlib 用于绘图

在windows下我使用Pycharm作为IDE，安装库也十分方便，直接在包管理工具终添加即可，如果超时无法下载，可以参照我之前的博客换成国内源。

2 导入依赖包
——

import numpy as np               #使用as重命名
from scipy import stats          #可以只导入包的一部分
import matplotlib.pyplot as plt  #import matplotlib.pyplot == from matplotlib import pyplot

3 定义变量
——

python中的变量在第一次赋值时被声明，变量类型由分配给变量的值推断。习惯上，不使用大写字母命名。

input_file_name = "anscombe.csv"
delimiter = "\t"                 #数据之间的分隔符
skip_header = 3                  #文件开头要跳过的行
column_x = 0
column_y = 1

4 读取数据
——

毫无疑问，我们要事先得到需要可视化的数据：

(这里我们只对四个部分中部分一的进行处理)

使用 NumPy 中函数 genfromtxt() 读取 CSV 文件非常容易，该函数生成 NumPy 数组：

data = np.genfromtxt(input_file_name, delimiter = delimiter, skip_header = skip_header)

在 Python 中，一个函数可以有数量可变的参数，你可以通过指定所需的参数来传递一个参数的子集。数组是非常强大的矩阵状对象，可以很容易地分割成更小的数组：

这里的：就指代全部选择

x = data[:, column_x]           #x取所有行的column_x列
y = data[:, column_y]           #y取所有行的column_y列

5 拟合数据
——

SciPy 提供了方便的数据拟合功能，例如 linregress() 函数提供了一些与拟合相关的重要值，如斜率、截距和两个数据集的相关系数:

slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
print("Slope: {:f}".format(slope))
print("Intercept: {:f}".format(intercept))
print("Correlation coefficient: {:f}".format(r_value))

因为 linregress() 提供了几条信息，所以结果可以同时保存到几个变量中。

6 绘图
——

Matplotlib 库仅仅绘制数据点，因此，你应该定义要绘制的点的坐标。已经定义了 x 和 y 数组，所以你可以直接绘制它们，但是你还需要更多的点来画直线。

linspace() 函数可以方便地在两个值之间生成一组等距值。再利用强大的 NumPy 数组可以轻松计算纵坐标，该数组可以像普通数值变量一样在公式中使用

fit_x = np.linspace(x.min() - 1, x.max() + 1, 100)  #随机生成100个线性数据
fit_y = slope * fit_x + intercept

要绘图，首先，定义一个包含所有图形的图形对象：

fig_width = 7 #inch
fig_height = fig_width / 16 * 9 #inch
fig_dpi = 100
fig = plt.figure(figsize = (fig_width, fig_height), dpi = fig_dpi)

参数也非常好理解，最后调用figure()函数生成一个图形。

一个图形可以画几个图；在 Matplotlib 中，这些图被称为轴。本示例定义一个单轴对象来绘制数据点：

ax = fig.add_subplot(111)
ax.plot(fit_x, fit_y, label = "Fit", linestyle = '-')
ax.plot(x, y, label = "Data", marker = '.', linestyle = '')
ax.legend()
ax.set_xlim(min(x) - 1, max(x) + 1)
ax.set_ylim(min(y) - 1, max(y) + 1)
ax.set_xlabel('x')
ax.set_ylabel('y')

如果要保存图片，有：

fig.savefig('fit_python.png')

如果要显示（而不是保存）该绘图，请调用：

plt.show()

7 结果
——

终端输出：

生成图像：

怎么样，感觉还不错吧，Python真是个好用的工具，以后会更新更多的实用的案例~

完整代码：

import numpy as np               #使用as重命名
from scipy import stats          #可以只导入包的一部分
import matplotlib.pyplot as plt  #import matplotlib.pyplot == from matplotlib import pyplot

input_file_name = "anscombe.csv"
delimiter = "\t"                 #数据之间的分隔符
skip_header = 2                  #文件开头要跳过的行
column_x = 0
column_y = 1

print("#### Anscombe's first set with Python ####")

data = np.genfromtxt(input_file_name, delimiter = delimiter, skip_header = skip_header)
x = data[:, column_x]           #x取所有行的column_x列
y = data[:, column_y]           #y取所有行的column_y列

slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
print("Slope: {:f}".format(slope))
print("Intercept: {:f}".format(intercept))
print("Correlation coefficient: {:f}".format(r_value))

fit_x = np.linspace(x.min() - 1, x.max() + 1, 100)  #随机生成100个线性数据
fit_y = slope * fit_x + intercept

fig_width = 7 #inch
fig_height = fig_width / 16 * 9 #inch
fig_dpi = 100
fig = plt.figure(figsize = (fig_width, fig_height), dpi = fig_dpi)

ax = fig.add_subplot(111)
ax.plot(fit_x, fit_y, label = "Fit", linestyle = '-')
ax.plot(x, y, label = "Data", marker = '.', linestyle = '')
ax.legend()
ax.set_xlim(min(x) - 1, max(x) + 1)
ax.set_ylim(min(y) - 1, max(y) + 1)
ax.set_xlabel('x')
ax.set_ylabel('y')

plt.show()

posted @ 2020-02-29 17:18 cnwanglu 阅读(10251) 评论(1) 编辑收藏举报

刷新页面返回顶部

bluehattt

Python实现简单的数据可视化

公告