1071 Speech Patterns (25 分)
1. 题目
People often have a preference among synonyms of the same word. For example, some may prefer "the police", while others may prefer "the cops". Analyzing such patterns can help to narrow down a speaker's identity, which is useful when validating, for example, whether it's still the same person behind an online avatar.
Now given a paragraph of text sampled from someone's speech, can you find the person's most commonly used word?
Input Specification:
Each input file contains one test case. For each case, there is one line of text no more than 1048576 characters in length, terminated by a carriage return \n
. The input contains at least one alphanumerical character, i.e., one character from the set [0-9 A-Z a-z
].
Output Specification:
For each test case, print in one line the most commonly occurring word in the input text, followed by a space and the number of times it has occurred in the input. If there are more than one such words, print the lexicographically smallest one. The word should be printed in all lower case. Here a "word" is defined as a continuous sequence of alphanumerical characters separated by non-alphanumerical characters or the line beginning/end.
Note that words are case insensitive.
Sample Input:
Can1: "Can a can can a can? It can!"
Sample Output:
can 5
2. 题意
给出一串字符串,找出其中出现次数最多的单词,输出该单词的小写形式及出现次数。注:单词由[0-9 A-Z a-z
]字符组成,单词间以非字母或数字相隔,单词不区分大小写。
3. 思路——字符串+map
- 根据题意,单词不区分大小写,且最后输出要求为小写,那么一开始就将字符串中大写字母全部转化为小写。
- 创建一个
map
,键值对关系为<单词,出现次数>
,用于计数单词。 - temp用作辅助字符串,用于收集单词。遍历字符串,当碰到非字母或非数字时,进行单词计数,并将temp置空;否则将字符加入temp中。遍历结束后需要注意字符串最后一位是否为字母或数字,如果是的话,说明上面遍历没有对最后一个单词进行计数,需要额外计数操作。
- 最后输出出现次数最多的单词及其数量。
4. 代码
#include <iostream>
#include <string>
#include <map>
#include <cctype>
using namespace std;
int main()
{
string str;
getline(cin, str);
// 将输入字符串中的所有大写字母转化为小写
for (int i = 0; i < str.length(); ++i)
if (isupper(str[i])) str[i] = str[i] - 'A' + 'a';
string temp = "";
int maxCnt = 0;
string maxStr = "";
map<string, int> res;
for (int i = 0; i < str.length(); ++i)
{
if (!isalnum(str[i]))
{
// 这个if主要是排除掉空字符串计数的问题
// 空字符串出现的原因主要有连续几个字符都是非字母或非数字
// 只要判断当前字符的前一个字符是否也非字母或非数字,如果是则不计数
if (i && !isalnum(str[i - 1]))
{
temp = "";
continue;
}
// 当碰到非字符或非数字,进行计数,并置空temp字符串,重新获取单词信息
res[temp] += 1;
if (res[temp] > maxCnt)
{
maxCnt = res[temp];
maxStr = temp;
}
temp = "";
} else
{
temp += str[i];
}
}
// 避免最后一个单词没有计数的问题
// 因为如果最后一个字符为字母或数字,那么上面循环结束后最后一个单词没有计数
if (isalnum(str[str.length() - 1]))
{
res[temp] += 1;
if (res[temp] > maxCnt)
{
maxCnt = res[temp];
maxStr = temp;
}
}
cout << maxStr << " " << maxCnt << endl;
return 0;
}