统计文本文件的汉字和英文字符的个数

统计汉字和英文字符

需要分别判断是汉字或者是英文字符

非unicode系统中，计算机中英文字符均为asscii码，其不可能超过0x80

如果读入的字符的unsigned的值大于0x80，则其与后续的一个char组成的是一个unicode编码字符。

所以判断英文或者是asscii的代码如下：

1 if(((unsigned)strRead[i])>0x80)//汉字出现
2 {
3     //
4     nd.map[0]=strRead[i++];
5     nd.map[1]=strRead[i++];
6 }
7 else
8 {
9     nd.map[0]=strRead[i++];
10     nd.map[1]=0;
11 }

下面就要设计采用什么样的结构进行统计次数呢。

我采用了map struct的形式进行了统计和比较

这样可以保证同一个字符或者汉字只有一个节点，同时在统计过程中能做到find的复杂度是logn。

下面是结构体的设计：

1 struct Node
2 {
3     char map[4];
4     bool  operator <(const Node& t) const
5     {
6         if(strcmp(map,t.map)>0)
7         {
8             return true;
9         }
10         else
11         {
12             return false;
13         }
14     }
15 };

完整程序如下：

用迭代器进行最后结果的输出：

1 #include <iostream>
2 #include <map>
3 //程序已经在dev-cpp下编译通过
4 //读取同目录下的in.txt中的信息进行统计
5 using namespace std;
6
7 struct Node
8 {
9     char map[4];
10     bool  operator <(const Node& t) const
11     {
12         if(strcmp(map,t.map)>0)
13         {
14             return true;
15         }
16         else
17         {
18             return false;
19         }
20     }
21 };
22
23
24 int main()
25 {
26     char strRead[1000];
27     map<Node,int> mp;//用于统计汉字和字母的个数
28     map<Node,int>::iterator it;
29     Node nd;
30     freopen("in.txt","r",stdin);
31     freopen("out.txt","w",stdout);
32     while(cin.getline(strRead,1000))
33     {
34         for(int i=0;i<strlen(strRead);)//留作判断
35         {
36             /*
37             if(strRead[i]==' ')
38             {
39                 i++;
40                 continue;
41             }
42             */
43             if(((unsigned)strRead[i])>0x80)//汉字出现
44             {
45                 //
46                 nd.map[0]=strRead[i++];
47                 nd.map[1]=strRead[i++];
48             }
49             else
50             {
51                 nd.map[0]=strRead[i++];
52                 nd.map[1]=0;
53             }
54             nd.map[2]=0;
55             it=mp.find(nd);
56             if(it!=mp.end())
57             {
58                 it->second++;
59             }
60             else
61             {
62                 mp.insert(pair<Node,int>(nd,1));
63             }
64         }
65     }
66     for(it=mp.begin();it!=mp.end();it++)
67     {
68         cout<<it->first.map<<"-->"<<it->second<<endl;
69     }
70     mp.clear();
71     //getchar();
72     //getchar();
73     //system("Pause");
74     return 1;
75 }
76

posted @ 2010-10-09 12:10 Eric.wei 阅读(746) 评论(0) 编辑收藏举报

刷新页面返回顶部

Eric.wei's BLOG

Don't call me, I will call you：）

统计文本文件的汉字和英文字符的个数

公告