记录皮尔逊相关系数-相似性比较算法

皮尔森相关系数(Pearson correlation coefficient)也叫皮尔森积差相关系数(Pearson product-moment correlation coefficient),是用来反应两个变量相似程度的统计量。或者说可以用来计算两个向量的相似度(在基于向量空间模型的文本分类、用户喜好推荐系统中都有应用)。

目前工作中由于比较学生对两个考点的掌握情况的概率做统计来推荐合适的学习内容。

我太笨,读书少,看不懂,只能记者,留着当作业。

 

公式是这样的:

 

公式分解:

 

 

python 的实现:

# Input: 2 objects
# Output: Pearson Correlation Score
def pearson_correlation(object1, object2):
    values = range(len(object1))
    
    # Summation over all attributes for both objects
    sum_object1 = sum([float(object1[i]) for i in values]) 
    sum_object2 = sum([float(object2[i]) for i in values])

    # Sum the squares
    square_sum1 = sum([pow(object1[i],2) for i in values])
    square_sum2 = sum([pow(object2[i],2) for i in values])

    # Add up the products
    product = sum([object1[i]*object2[i] for i in values])

    #Calculate Pearson Correlation score
    numerator = product - (sum_object1*sum_object2/len(object1))
    denominator = ((square_sum1 - pow(sum_object1,2)/len(object1)) * (square_sum2 - 
        pow(sum_object2,2)/len(object1))) ** 0.5
        
    # Can"t have division by 0
    if denominator == 0:
        return 0

    result = numerator/denominator
    return result

等有时间写个 C# 的算法。

 

内容来源参考:

http://mines.humanoriented.com/classes/2010/fall/csci568/portfolio_exports/sphilip/pear.html

https://segmentfault.com/q/1010000000094674

http://www.cnblogs.com/zhangchaoyang/articles/2631907.html

 

posted @ 2017-04-01 15:59  easeyeah  阅读(1274)  评论(0编辑  收藏  举报