python实现简单线性回归
之前推导了一元线性回归和多元线性回归,今天就用python来实现一下一元线性回归
先看下之前推导的结果
,
第一种是用循环迭代的计算方法。这里的x,y是numpy中的array类型
def sum(x): sum1 = 0 for i in x: sum1 += i return sum1 def sub(x,y): ret = [] for i in range(len(x)): ret.append(x[i] - y) return np.array(ret) def mean(num): sum = 0 for i in num: sum += i return sum / len(num) def multiply(x,y): ret = [] for i in range(len(x)): ret.append(x[i]*y[i]) return np.array(ret) def square(x): ret = [] for i in range(len(x)): ret.append (x[i] * x[i]) return np.array(ret) def linearRegression(x,y): length = len(x) t1 = time() x_mean1 = mean(x) a = sum(multiply(y,sub(x,x_mean1))) / (sum(square(x)) - sum(x) ** 2 / length) sum1 = 0 for i in range(length): sum1 += (y[i] - a * x[i]) b = sum1 / length
第二种人是用vectorization的方法
def linearRegression_(x,y): length = len(x) x_mean = x.mean() a = (y * (x - x_mean)).sum() / ((x ** 2).sum() - x.sum() ** 2 / length) b = (y - a * x).sum() / length
为了比较二者的性能,这里我们随机生成10000条数据,分别统计两种方法运行的时间
x=np.random.randint(0,100,10000) y=np.random.randint(0,100,10000) t1 = time() linearRegression(x,y) t2 = time() print(t2 - t1) t1 = time() linearRegression_(x,y) t2 = time()
得到二者的结果
0.1349632740020752 0.0009996891021728516
上面的是循环计算所需的时间,下面的是vectorization所需得时间。很明显vectorization要优于循环计算