两道概率算法题

题目1. 给定一个包含n行数据的文件（n未知），要求设计一个算法，只遍历文件一遍就能等概率地输出某一行。即，每行被输出的概率是相等的（1/n，n未知）。假设文件可能会很大，内存有限，不能保存所有的文件数据。

题目2. n个数据排除一行，即数据Ci在位置Li, 1<=i<=n，设计一个算法把数据打乱，使得每个数据在等概率地出现在每个位置，即 p(Ci, Lj)=1/n, 其中 1<=i<=n, 1<=j<=n.

答

案

见

下

方

-------------------------------------------------------------------

解答1.

【算法】打开文件，读第一行数据，记录第一行到内存中，即R=line(1); 读第二行数据，生成一个（0,1）的随机数，如果小于0.5，将记录中的数据替代为第二行数据, R=line(2); ...读第k行，生成一个（0, 1）的随机数，如果小于1/k, 将记录中的数据替代为第k行数据，即R=line(k)...一直到读完所有行(n行)，输出R。R中的数据是line(i)的概率为1/n, 1<=i<=n。

【证明】假设文件共有n行，n>=1。记P(k)为读到第k行的时候，R中的数据是line(k)的概率为1/k, k>=1

　　basis step: 读到第一行的时候，R中的数据是line1的概率为1，P(1) is true.

induction step: if P(k) is true, k>=1, 当读到第k+1行的时候，如果随机数小于1/(k+1), 则R中的数据是line(k+1)，即R=line(k+1)的概率是1/(k+1)。当随机数大于1/(k+1)的时候，R=R_old, 即以概率k/(k+1)为R_old，by the hypothesis, 此时R在数据是line[i] (1<=i<=k)的概率是k/(k+1)*1/k=1/(k+1)，即P(k+1) is true.

Since both basis step and induction step are true, we can show that P(n) is true for every positive interger n.

【后记】这个题目是从《c专家编程》上看来的

解答2：

【算法】假设n个元素保存在数组中，数据C[i]都在位置L[i]，即p(C[i], L[i])=1. 当前在位置L[k]， k=1.

　　step1, 如果k==n, stop. 否则，生成一个在(0-1)的随机数，如果大于1/(n+1-k), 跳到step3，否则跳到step2

　　step2, 生成一个[k+1, n]的随机整数，如果等于m，将C[k]和C[m]交换位置

　　step3, 到下一个位置，即k=k+1，go to step1

【证明】let propostional function Q(k) be p(C[i], L[k])=1/n, where i=1,2...n, and p(C[k], L[j])=1/n, where j=1,2...n for every positive interger k.

　　basis step: we will show Q(1) is true. If the rand number is small than 1/n, C[1] will stay on L[1], i.e. p(C[1], L[1])=1/n. Otherwise, a random interger m will be generated which is in [2, n] and will be swap with C[1]. The probability that m=i (2<=i<=n) is that 1/(n-1), so p(C[i], L[1]) = (1-1/n)*1/(n-1)=1/n.

At the same time, this also means p(C[1], L[j])=1/n, j=1,2...n

　　indution step: we will prove [Q(1)^Q(2)^...^Q(k)]->Q(k+1), k>=1. So the hypothesis is as following.

　　a) p(C[i], L[j]) = 1/n, where i=1,2...n, and j=1,2...k

　　b) p(C[i], L[j]) = 1/n, where i=1,2...k, and j=1,2...n

　　Firstly, we can show p(C[i], L[k+1]) = 1/n, k+1<=i<=n. The only way to make C[k+1] stay location L[k+1] is that C[K+1] stays at Location L[k+1] before the (k+1)th pass, and stay there after this pass, so p(C[k+1], L[k+1]) = (1-k/n)*1/(n-k)=1/n; For K+2<=i<=n, the only way to make C[i] stays at location L[k+1] is that C[i] stays at L[i] before the (k+1) pass, and is swapped with C[k+1] at this pass, so p(C[i], L[K+1]) = (1-k/n)*1/(n-k)=1/n. So p(C[i], L[k+1]) = 1/n, where k+1<=i<=n.

　 Secondly, we can show p(C[i], L[k+1]) = 1/n for 1<=i<=k after the (k+1)th pass. There is one way to make C[i] stays at location L[k+1] after the (k+1) th pass which is C[i] stayed at L[k+1] before this pass and stay there after this pass. The probability is 1/n*1/(n-k). The others way is that assuming C[i] stays at L[j] (k+2<=j<n) and C[j] is swapped with L[k+1], which has the probability of 1/n*1/(n-k)*(n-(k+2)+1). We can add them together to get p(C[i], L[k+1]) = 1/n, where 1<=i<=k.

　　Thirdly, we can show p(C[k+1], L[j]) = 1/n where k+2<=j<=n. One way to make C[k+1] stay at L[j] is that C[k+1] stayed at L[k+1] before the (k+1)th pass, and is swapped with C[k+1], so the probability is p(C[k+1], L[j]) = (n-k)/n*1/(n-k)=1/n. Note p(C[k+1], L[k+1]) = 1/n has been proved.

　　Finally, we can show p(C[k+1], L[j]) = 1/n where 1<=j<=k. This haved been determined by the previous pass before (k+1)th pass.

In conclusion, we have proved both basis step and induction step, so we can show that Q(n) is true for any positive interger. This also means Q(1)^Q(2)^...Q(n) is true, in other words, p(C[i], L[j]) = 1/n, where i=1,2...n, and j=1,2...n; and p(C[i], L[j]) = 1/n, where i=1,2...n, and j=1,2...n

【后记】这道题目好像是《编程珠玑》里面的

posted on 2012-08-18 15:20 Torstan 阅读(266) 评论(0) 收藏举报

刷新页面返回顶部

Torstan

两道概率算法题

导航

公告