记录皮尔逊相关系数-相似性比较算法
皮尔森相关系数(Pearson correlation coefficient)也叫皮尔森积差相关系数(Pearson product-moment correlation coefficient),是用来反应两个变量相似程度的统计量。或者说可以用来计算两个向量的相似度(在基于向量空间模型的文本分类、用户喜好推荐系统中都有应用)。
目前工作中由于比较学生对两个考点的掌握情况的概率做统计来推荐合适的学习内容。
我太笨,读书少,看不懂,只能记者,留着当作业。
公式是这样的:
公式分解:
python 的实现:
# Input: 2 objects # Output: Pearson Correlation Score def pearson_correlation(object1, object2): values = range(len(object1)) # Summation over all attributes for both objects sum_object1 = sum([float(object1[i]) for i in values]) sum_object2 = sum([float(object2[i]) for i in values]) # Sum the squares square_sum1 = sum([pow(object1[i],2) for i in values]) square_sum2 = sum([pow(object2[i],2) for i in values]) # Add up the products product = sum([object1[i]*object2[i] for i in values]) #Calculate Pearson Correlation score numerator = product - (sum_object1*sum_object2/len(object1)) denominator = ((square_sum1 - pow(sum_object1,2)/len(object1)) * (square_sum2 - pow(sum_object2,2)/len(object1))) ** 0.5 # Can"t have division by 0 if denominator == 0: return 0 result = numerator/denominator return result
等有时间写个 C# 的算法。
内容来源参考:
http://mines.humanoriented.com/classes/2010/fall/csci568/portfolio_exports/sphilip/pear.html
https://segmentfault.com/q/1010000000094674
http://www.cnblogs.com/zhangchaoyang/articles/2631907.html
.......................................this is a good man