两道概率算法题

题目1. 给定一个包含n行数据的文件(n未知),要求设计一个算法,只遍历文件一遍就能等概率地输出某一行。即,每行被输出的概率是相等的(1/n,n未知)。假设文件可能会很大,内存有限,不能保存所有的文件数据。

题目2. n个数据排除一行,即数据Ci在位置Li, 1<=i<=n,设计一个算法把数据打乱,使得每个数据在等概率地出现在每个位置,即 p(Ci, Lj)=1/n, 其中 1<=i<=n, 1<=j<=n.

 

 

 

 

 

 

-------------------------------------------------------------------

解答1.

【算法】打开文件,读第一行数据,记录第一行到内存中,即R=line(1); 读第二行数据,生成一个(0,1)的随机数,如果小于0.5,将记录中的数据替代为第二行数据, R=line(2); ...读第k行,生成一个(0, 1)的随机数,如果小于1/k, 将记录中的数据替代为第k行数据,即R=line(k)...一直到读完所有行(n行),输出R。R中的数据是line(i)的概率为1/n, 1<=i<=n。

【证明】假设文件共有n行,n>=1。记P(k)为读到第k行的时候,R中的数据是line(k)的概率为1/k, k>=1

  basis step: 读到第一行的时候,R中的数据是line1的概率为1,P(1) is true.

      induction step: if P(k) is true, k>=1, 当读到第k+1行的时候,如果随机数小于1/(k+1), 则R中的数据是line(k+1),即R=line(k+1)的概率是1/(k+1)。当随机数大于1/(k+1)的时候,R=R_old, 即以概率k/(k+1)为R_old,by the hypothesis, 此时R在数据是line[i] (1<=i<=k)的概率是k/(k+1)*1/k=1/(k+1), 即P(k+1) is true.

Since both basis step and induction step are true, we can show that P(n) is true for every positive interger n.

 

【后记】这个题目是从《c专家编程》上看来的

 

 

解答2:

【算法】假设n个元素保存在数组中,数据C[i]都在位置L[i],即p(C[i], L[i])=1. 当前在位置L[k], k=1.

  step1, 如果k==n, stop. 否则,生成一个在(0-1)的随机数,如果大于1/(n+1-k), 跳到step3,否则跳到step2

  step2, 生成一个[k+1, n]的随机整数,如果等于m,将C[k]和C[m]交换位置

  step3, 到下一个位置,即k=k+1,go to step1

【证明】let propostional function Q(k) be p(C[i], L[k])=1/n, where i=1,2...n, and p(C[k], L[j])=1/n, where j=1,2...n for every positive interger k.

  basis step: we will show Q(1) is true. If the rand number is small than 1/n, C[1] will stay on L[1], i.e. p(C[1], L[1])=1/n. Otherwise, a random interger m will be generated which is in [2, n] and will be swap with C[1]. The probability that m=i (2<=i<=n) is that 1/(n-1), so p(C[i], L[1]) = (1-1/n)*1/(n-1)=1/n.

At the same time, this also means p(C[1], L[j])=1/n, j=1,2...n

  indution step: we will prove [Q(1)^Q(2)^...^Q(k)]->Q(k+1), k>=1. So the hypothesis is as following.

  a) p(C[i], L[j]) = 1/n, where i=1,2...n, and j=1,2...k

  b) p(C[i], L[j]) = 1/n, where i=1,2...k, and j=1,2...n

  Firstly, we can show p(C[i], L[k+1]) = 1/n, k+1<=i<=n. The only way to make C[k+1] stay location L[k+1] is that C[K+1] stays at Location L[k+1] before the (k+1)th pass, and stay there after this pass, so p(C[k+1], L[k+1]) = (1-k/n)*1/(n-k)=1/n; For K+2<=i<=n, the only way to make C[i] stays at location L[k+1] is that C[i] stays at L[i] before the (k+1) pass, and is swapped with C[k+1] at this pass, so p(C[i], L[K+1]) = (1-k/n)*1/(n-k)=1/n. So p(C[i], L[k+1]) = 1/n, where k+1<=i<=n.

    Secondly, we can show p(C[i], L[k+1]) = 1/n for 1<=i<=k after the (k+1)th pass. There is one way to make C[i] stays at location L[k+1] after the (k+1) th pass which is C[i] stayed at L[k+1] before this pass and stay there after this pass. The probability is 1/n*1/(n-k). The others way is that assuming C[i] stays at L[j] (k+2<=j<n) and C[j] is swapped with L[k+1], which has the probability of 1/n*1/(n-k)*(n-(k+2)+1). We can add them together to get p(C[i], L[k+1]) = 1/n, where 1<=i<=k.

  Thirdly, we can show p(C[k+1], L[j]) = 1/n where k+2<=j<=n. One way to make C[k+1] stay at L[j] is that C[k+1] stayed at L[k+1] before the (k+1)th pass, and is swapped with C[k+1], so the probability is p(C[k+1], L[j]) = (n-k)/n*1/(n-k)=1/n. Note p(C[k+1], L[k+1]) = 1/n has been proved.

  Finally, we can show p(C[k+1], L[j]) = 1/n where 1<=j<=k. This haved been determined by the previous pass before (k+1)th pass.

In conclusion, we have proved both basis step and induction step, so we can show that Q(n) is true for any positive interger. This also means Q(1)^Q(2)^...Q(n) is true, in other words, p(C[i], L[j]) = 1/n, where i=1,2...n, and j=1,2...n; and p(C[i], L[j]) = 1/n, where i=1,2...n, and j=1,2...n

【后记】这道题目好像是《编程珠玑》里面的

 

 

 

 

 

posted on 2012-08-18 15:20  Torstan  阅读(250)  评论(0编辑  收藏  举报

导航