统计文章内各个单词出现的次数

算法的思路是：

从头到尾遍历文件，从文件中读取遍历到的每一个单词。
把遍历到的单词放到hash_map中，并统计这个单词出现的次数。
遍历hash_map，将遍历到的单词的出现次数放到优先级队列中。
当优先级队列的元素个数超过k个时就把元素级别最低的那个元素从队列中取出，这样始终保持队列的元素是k个。
遍历完hash_map，则队列中就剩下了出现次数最多的那k个元素。

具体实现和结果如下：

算法的思路是：
从头到尾遍历文件，从文件中读取遍历到的每一个单词。
把遍历到的单词放到hash_map中，并统计这个单词出现的次数。
遍历hash_map，将遍历到的单词的出现次数放到优先级队列中。
当优先级队列的元素个数超过k个时就把元素级别最低的那个元素从队列中取出，这样始终保持队列的元素是k个。
遍历完hash_map，则队列中就剩下了出现次数最多的那k个元素。
  具体实现和结果如下：
[cpp] view plain copy print?
// 出现次数最多的K个单词.cpp : Defines the entry point for the console application.  
#include "stdafx.h"  
#include <hash_map>  
#include <string>  
#include <fstream>  
#include <queue>  
#include <iostream>  
#include <algorithm>  
#include <boost/timer.hpp>   
using namespace std;  
using namespace boost;  
void top_k_words()//出现次数最多的是个单词  
{  
    timer t;  
    ifstream fin;  
    fin.open("modern c.txt");  
    if (!fin)  
    {  
        cout<<"can not open file"<<endl;  
    }  
    string s;  
    hash_map<string,int> countwords;  
    while (true)  
    {  
        fin>>s;  
        countwords[s]++;  
        if (fin.eof())  
        {  
            break;  
        }  
          
    }  
    cout<<"单词总数 （重复的不计数）:"<<countwords.size()<<endl;  
    priority_queue<pair<int,string>,vector<pair<int,string>>,greater<pair<int,string>>> countmax;  
    for(hash_map<string,int>::const_iterator i=countwords.begin();  
        i!=countwords.end();i++)  
    {  
        countmax.push(make_pair(i->second,i->first));  
        if (countmax.size()>10)  
        {  
            countmax.pop();  
        }  
    }  
    while(!countmax.empty())  
    {  
        cout<<countmax.top().second<<" "<<countmax.top().first<<endl;  
        countmax.pop();  
    }  
    cout<<"time elapsed "<<t.elapsed()<<endl;  
}  
int main(int argc, char* argv[])  
{  
    top_k_words();  
  
    system("pause");  
    return 0;  
}

posted @ 2016-09-08 16:10 程序员修练之路阅读(732) 评论(0) 收藏举报

刷新页面返回顶部

程序员修练之路

过一个平凡无趣的人生实在太容易了，你可以不读书，不冒险，不运动，不写作，不外出，不折腾……但是，人生最后悔的事情就是：我本可以。

统计文章内各个单词出现的次数

公告