2014-04-24 21:06
题目:假设你有40亿个不重复的非负整数存在一个文件里,1GB的内存。你要如何找出一个文件中不包含的整数呢?如果是10亿个整数,10MB内存又该如何?
解法:前者内存充足,可以直接用一个位向量来标记整数的出现情况,然后顺序扫描找出第一个没出现过的数字。后者内存不足,可以把整数范围分成多段,每次扫描整个文件,判断该段中的所有整数是否都出现过了。如果有没出现的,答案就找到了,否则继续扫描下一段。
代码:
1 // 10.3 Assume you have a list of 4 billion non-negative integers, you're given 1GB memory. Try to find out a number that is not contained in the number list. 2 // Answer: 3 // 1GB means 8Gbit, thus we can use 4 billion bit vector, of actual size 500MB. Scan the list for one pass and mark the existent as '1'. 4 // Scan the bit vector to find the first '0' number. 5 // Extension: if you have only 10MB memory, what's it gonna be then? 6 // Answer: 7 // 10MB means 80Mbit, thus we have to scan the list for multiple time. Every time the bit vector covers only part of the range. 8 // Such as 0~999, 1000~199, ..., 3999999000~3999999999 9 // For each interval, we scan the all numbers and check if there is missing element in the bit vector. If found, return it. Otherwise go on until the last interval. 10 // the length of the interval should be carefully chosen, to minimize total number of scannings. 11 int main() 12 { 13 return 0; 14 }