Numpy随机数组(random)

numpy.random()模块补充了Python内置random模块的一些功能,用于高效/高速生成一些概率分布的样本数组数据。

In [1]: import numpy as np

In [2]: from random import normalvariate

#从下面比较可以看到,numpy.random模块比Python内置random模块快20多倍
In [4]: %timeit np.random.normal(size=1000000)
31.6 ms ± 1.55 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [5]: %timeit samples = [normalvariate(0,1) for i in range(1000000)]
872 ms ± 9.42 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

伪随机数(peseudorandom numbers)

numpy的随机数是基于算法在确定条件下产生的,通过numpy.random.seed()可以设置随机数生成的种子,以便得到相同的随机数结果。

#设置全局的随机数生成的种子
In [6]: np.random.seed(1234)

#RandomState()用于产生独立的随机数生成器
In [7]: rng = np.random.RandomState(1234)

In [8]: rng.randn(10)
Out[8]:
array([ 0.47143516, -1.19097569,  1.43270697, -0.3126519 , -0.72058873,
        0.88716294,  0.85958841, -0.6365235 ,  0.01569637, -2.24268495])

Python内置模块random

In [1]: import random

In [2]: position = 0

In [3]: walks = [position]

In [4]: steps = 1000
#随机产生一个walks数组
In [5]: for i in range(steps):
        	# random.randint(0,1)随机返回整数0或1
   ...:     step = 1 if random.randint(0,1) else -1
   ...:     position += step
   ...:     walks.append(position)
   ...:

In [7]: import matplotlib.pyplot as plt

In [8]: plt.plot(walks)
Out[8]: [<matplotlib.lines.Line2D at 0x14beae2a948>]

In [9]: plt.plot(walks[:100])
Out[9]: [<matplotlib.lines.Line2D at 0x14bf0817648>]

Numpy的random模块

#单个数组
In [11]: import numpy as np
In [12]: import matplotlib.pyplot as plt
    
#在[0,2)之间产生包含1000个整数的一维数组
In [13]: walks = np.random.randint(0,2,size=1000)
In [14]: walks
Out[14]:
array([1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1,
       0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1,
       0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0,
       1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1,
       1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0,
       1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1,
       1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0,
       1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0,
       0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0,
       1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1,
       1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0,
       0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0,
       0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1,
       1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0,
       1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0,
       0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1,
       0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0,
       1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0,
       1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0,
       1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1,
       1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1,
       0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1,
       1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1,
       0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0,
       1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0,
       0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1,
       1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0,
       1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1,
       0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0,
       0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1,
       0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1,
       0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1,
       0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0,
       0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1,
       0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1,
       1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0,
       1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0,
       0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0,
       1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0,
       1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0,
       0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1,
       1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0,
       0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1,
       0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1,
       1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0,
       0, 1, 0, 0, 1, 1, 1, 0, 1, 0])

#逐个判定数组中的元素是否大于0,大于0则用1替换,不大于0则用-1替换
In [18]: walks = np.where(walks > 0,1,-1)

#累加
In [20]: walks = walks.cumsum()

In [22]: plt.plot(walks)
Out[22]: [<matplotlib.lines.Line2D at 0x14bf0920588>]

In [23]: plt.plot(walks)
Out[23]: [<matplotlib.lines.Line2D at 0x14bf04b3888>]

#获取最小偏离值
In [25]: walks.min()
Out[25]: -6
#获取最大偏离值
In [26]: walks.max()
Out[26]: 44
#获取首次偏离远点大于10的位置(步数)
In [27]: (np.abs(walks) > 10 ).argmax()
Out[27]: 78

#多维数组生成
In [29]: nwalks = 5000
In [30]: nsteps = 1000
#size=(nwalks,nsteps)表示生成nwalks * nsteps 数组
In [31]: draws = np.random.randint(0,2,size=(nwalks,nsteps))

In [32]: steps = np.where(draws > 0, 1, -1)

#每一行数组沿1轴累加
In [33]: walks = steps.cumsum(1)

In [34]: walks
Out[34]:
array([[  1,   2,   1, ...,  -8,  -7,  -6],
       [  1,   0,   1, ..., -66, -67, -66],
       [ -1,   0,  -1, ...,  50,  49,  50],
       ...,
       [  1,   2,   3, ..., -36, -37, -38],
       [  1,   0,  -1, ...,  62,  61,  62],
       [ -1,   0,   1, ...,  10,   9,  10]], dtype=int32)

In [35]: walks.max()
Out[35]: 123

In [36]: walks.min()
Out[36]: -129

#每一行沿1轴判定是否有元素绝对值>=30,有则返回True,无则返回False
In [37]: hits30 = (np.abs(walks) >= 30).any(1)

In [38]: hits30
Out[38]: array([ True,  True,  True, ...,  True,  True, False])

#判定有多少行有元素绝对值>=30
In [39]: hits30.sum()
Out[39]: 3374

#提取元素绝对值>=30的行,并返回每行的首次>=30的元素位置
In [40]: crossing_times = (np.abs(walks[hits30]) >= 30).argmax(1)

In [41]: crossing_times
Out[41]: array([283, 461, 339, ..., 989, 427, 525], dtype=int64)

In [42]: crossing_times.mean()
Out[42]: 503.4712507409603

In [43]: plt.plot(walks)

部分numpy.random.后缀函数功能

函数 说明
seed 确定随机数生成器的种子
permutation 返回一个序列的随机排列或返回一个随机排列的范围
shuffle 对一个序列就地随机排列
rand 产生均匀分布的样本值
randint 从给定的[a,b)范围内随机取整数
randn 产生标准正态分布的样本值
binomial 产生二项分布的样本值
normal 产生正态(高斯)分布的样本值
beta 产生Beta分布的样本值
chisquare 产生卡方分布的样本值
gamma 产生Gamma分布的样本值
uniform 产生在[0,1)中均匀分布的样本值
posted @ 2021-12-23 12:56  溪奇的数据  阅读(1176)  评论(0编辑  收藏  举报