LSH之p-stable分布

1:Cauchy distribution

Probability density function

The Cauchy distribution has the probability density function

      f(x; x_0,\gamma) = \frac{1}{\pi\gamma \left[1 + \left(\frac{x - x_0}{\gamma}\right)^2\right]}
 = { 1 \over \pi } \left[ { \gamma \over (x - x_0)^2 + \gamma^2  } \right],    

 

 

where x0 is the location parameter, specifying the location of the peak of the distribution, and γ is the scale parameter which specifies the half-width at half-maximum (HWHM). γ is also equal to half the interquartile range and is sometimes called the probable error. Cauchy himself exploited such a density function in 1827, with infinitesimal scale parameter, in defining a Dirac delta function (see there).

Probability density function
Probability density function for the Cauchy distribution
The purple curve is the standard Cauchy distribution

 

 

The special case when x0 = 0 and γ = 1 is called the standard Cauchy distribution with the probability density function

 f(x; 0,1) = \frac{1}{\pi (1 + x^2)}. \!

 

Cumulative distribution function

The cumulative distribution function (cdf) is:

F(x; x_0,\gamma)=\frac{1}{\pi} \arctan\left(\frac{x-x_0}{\gamma}\right)+\frac{1}{2}

Cumulative distribution function
Cumulative distribution function for the Cauchy distribution

 

2:p-stable distributions

image

 

 

无标题

根据上面原理,很容易证明标准正态分布是2-stable。

 

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

 

问题:

1:怎么预先计算k值

通过随机从dataset中取小量点,让后按照算法计算一边,通过递增k值,找到一个k值使得计算时间最小。

2:怎么放bucket里面

每个点,都有L个K元向量,其实向量中的每个元素都是同一种性质的,只是用了不同hash函数hash的话。至于具体怎么分布的就要看h1这个函数了。

3:怎么保证精确度

manual手册上有详细说明,其实为什么作者选用标准正态分布,就是因为标准正态分布是2-stable,这样在精确度方面就有了数学的保证

posted @ 2012-03-13 13:23  jiejnan  阅读(3690)  评论(0编辑  收藏  举报