用堆解决topK的问题 c++
topK问题就是在很多个无序的数之间选出前最小(大)的K个数,对于这种问题可以建立K大小的堆,如果是求K大个数的话,就建立最小堆,反之求最小个数的话就建最大堆。然后再遍历这个无序数组,如果遇到比堆顶元素大,就移除堆顶元素,把大的那个元素加入堆,反之亦然。
代码如下:
#include<iostream> #include<algorithm> #include<vector> #include<cstdio> #include<fstream> using namespace std; template <typename type> inline void print(const vector<type>& cala) { typename vector<type>::const_iterator first = cala.begin(); for (; first != cala.end(); ++first) { cout << *first<< endl; } } int main(void) { vector<long> vec; vector<long> result;//用来保存结果 long array[1000000] = {0}; vector<long>::iterator dfirst, dlast; vector<long>::iterator rfirst, rlast; ifstream in("data.txt"); long i; for (dfirst = vec.begin(); !in.eof(); ++dfirst) { in >> i; vec.push_back(i); } vec.pop_back();//最后一个数据会多读入一次,所以去掉最后一个 dfirst = vec.begin(); result.assign(dfirst, dfirst + 100);//把前一百个数据给result make_heap(result.begin(), result.end());//构造堆,让result.begin()为最大的数 for (dfirst = vec.begin() + 100; dfirst < vec.end(); ++dfirst) { //如果vec数组里面有比result.begin()小的数,则把result.begin()移除,再让vec数组小的数进入堆 if (*dfirst < *result.begin()) { pop_heap(result.begin(), result.end()); result.pop_back(); result.push_back(*dfirst); //重新构造堆 push_heap(result.begin(), result.end()); } } //打印结果 print(result); return 0; }
$root@localhost: g++ –g –Wall topk.cc –o topk
下面是随机生成数据的程序:
#include<cstdlib> #include<iostream> #include<ctime> #include<algorithm> #include<cstdio> using namespace std; int main(void) { long array[1000000] = {0}; long i = 0; FILE *fp = NULL; srand(time(NULL)); for (; i < 1000000; i++) { array[i] = rand() % 1000000; array[rand() % 1000000] = array[i]; } random_shuffle(&array[0], &array[1000000]); if ((fp = fopen("data.txt", "w")) == NULL) { cerr << "can not open file." << endl; exit(0); } for (i = 0; i < 1000000; i++) { fprintf(fp, "%ld\n", array[i]); } fclose(fp); return 0; }
$root@localhost: g++ –g –Wall srand.cc –o srand
$root@localhost: ./srand
$root@localhost: ./topk
这样就能输出结果了。
下面给出结果测试的shell程序:
#!/bin/sh
./srand
./topk > sort.txt
cat data.txt | sort -n | head -100 > file2.txt
cat sort.txt | sort -n > file1.txt
diff file1.txt file2.txt
如果没有什么不同,就证明结果正确。