LSH之p-stable分布
1:Cauchy distribution
Probability density function
The Cauchy distribution has the probability density function
where x0 is the location parameter, specifying the location of the peak of the distribution, and γ is the scale parameter which specifies the half-width at half-maximum (HWHM). γ is also equal to half the interquartile range and is sometimes called the probable error. Cauchy himself exploited such a density function in 1827, with infinitesimal scale parameter, in defining a Dirac delta function (see there).
Probability density function
The purple curve is the standard Cauchy distribution
The special case when x0 = 0 and γ = 1 is called the standard Cauchy distribution with the probability density function
Cumulative distribution function
The cumulative distribution function (cdf) is:
Cumulative distribution function
2:p-stable distributions
根据上面原理,很容易证明标准正态分布是2-stable。
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
问题:
1:怎么预先计算k值
通过随机从dataset中取小量点,让后按照算法计算一边,通过递增k值,找到一个k值使得计算时间最小。
2:怎么放bucket里面
每个点,都有L个K元向量,其实向量中的每个元素都是同一种性质的,只是用了不同hash函数hash的话。至于具体怎么分布的就要看h1这个函数了。
3:怎么保证精确度
manual手册上有详细说明,其实为什么作者选用标准正态分布,就是因为标准正态分布是2-stable,这样在精确度方面就有了数学的保证