实现一个简单的代码字计数器(三)

上一篇文章里面我们已经实现了一个简单的计数单词的代码程序,实现的结果就是以代码中的空格作为分割符号,统计各个单词的出现频数。但是如果是这样的情况:一个单词的组成是几个单词的连在一起,除了第一个大写字母外,所有单词都以大写字母开头。我们希望能将这样的单词也给分解了,因为其中可能包含了我们需要的关键词。举个例子:代码中出现的大量的countVec,coutLink,countInt,countDouble,如果我们常规操作的话这些单词的个数都是1,而且对我们理解这一段代码毫无用处,但是如果分解后变成了4个count,我们就有理由相信这段代码可能和计数功能有关。

我们原先是利用

auto symbols = std::vector<std::string>{};
boost::split(symbols, code, isDelimiter);
symbols.erase(std::remove(begin(symbols), end(symbols), ""), end(symbols));

即利用空格进行划分,现在我们需要根据下面两个要求来更改相关的程序:1.我们需要知道确定一个单词的范围,找到其中的大写字母进行分割;同时单词与单词之间的空格也要分割。2.循环找到下一个单词

确定单词范围

我们可以利用两个迭代器:beginWord指向单词的第一个字母,endWord指向单词结尾的字母,这里的单词是指有大写字母或者空格分割的:

auto const beginWord = std::find_if_not(begin(code), end(code), isDelimiter);
auto const endWord = std::find_if(std::next(beginWord), end(code), [](char c){ return isDelimiter(c) || isupper(c); });

确定了范围就将分割的单词放进words暂存起来:words.emplace_back(beginWord,endWord)

循环找单词

auto beginWord = std::find_if_not(begin(code), end(code), isDelimiter);
while (beginWord != end(code))
{
    auto endWord = std::find_if(std::next(beginWord), end(code), [](char c){ return isDelimiter(c) || isupper(c); });
    words.emplace_back(beginWord, endWord);
    beginWord = std::find_if_not(endWord, end(code), isDelimiter);
}

下面放上整个代码:

#include<iostream>
#include<iomanip>
#include<string>
#include<map>
#include<vector>
#include<iterator>
#include<boost/algorithm/string.hpp>

using WordCount = std::vector<std::pair<std::string, size_t>>;
WordCount getWordCount(std::string const& code);

bool isDelimiter(char c)
{
	auto const isAllowedInName = isalnum(c) || c == '_';
	return !isAllowedInName;
}

std::map<std::string, size_t> countWords(std::vector<std::string> const& words)
{
	auto wordCount = std::map<std::string, size_t>{};
	for (auto const& word : words)
	{
		++wordCount[word];
	}
	return wordCount;
}


std::vector<std::string> getCaseWordsFromCode(std::string const& code)
{
	auto words = std::vector<std::string>{};
	auto beginWord = std::find_if_not(begin(code), end(code), isDelimiter);
	while (beginWord != end(code))
	{
		auto endWord = std::find_if(std::next(beginWord), end(code), [](char c) { return isDelimiter(c) || isupper(c); });
		words.emplace_back(beginWord, endWord);
		beginWord = std::find_if_not(endWord, end(code), isDelimiter);
	}
	return words;
}


WordCount getWordCount(std::string const& code)
{
	/*auto symbols = std::vector<std::string>{};
	boost::split(symbols, code, isDelimiter);
	symbols.erase(std::remove(begin(symbols), end(symbols), ""), end(symbols));*/

	auto const symbols = getCaseWordsFromCode(code);

	auto const wordCount = countWords(symbols);

	auto sortedWordCount = WordCount(begin(wordCount), end(wordCount));  //类型转换
	std::sort(begin(sortedWordCount), end(sortedWordCount), [](auto const& p1, auto const& p2) { return p1.second > p2.second; });

	return sortedWordCount;
}

//void print(WordCount const& entries)
//{
//	for (auto const& entry : entries)
//	{
//		std::cout << std::setw(30) << std::left << entry.first << '|' << std::setw(10) << std::right << entry.second << '\n';
//	}
//}

void print(WordCount const& entries)
{
	if (entries.empty()) return;
	auto const longestWord = *std::max_element(begin(entries), end(entries), [](auto const& p1, auto const& p2) { return p1.first.size() < p2.first.size(); });
	auto const longestWordSize = longestWord.first.size();
	for (auto const& entry : entries)
	{
		std::cout << std::setw(longestWordSize + 1) << std::left << entry.first << '|' << std::setw(10) << std::right << entry.second << '\n';
	}
}

static constexpr auto code = R"(
bool isDelimiter(char c)
{
auto const isAllowedInName = isalnum(c) || c == '_';
return !isAllowedInName;
}
std::map<std::string, size_t> countWords(std::vector<std::string> const& words)
{
auto wordCount = std::map<std::string, size_t>{};
for (auto const& word : words)
{
++wordCount[word];
}
return wordCount;
}
WordCount getWordCount(std::string const& code)
{
auto symbols = std::vector<std::string>{};
boost::split(symbols, code, isDelimiter);
symbols.erase(std::remove(begin(symbols), end(symbols), ""), end(symbols));
auto const wordCount = countWords(symbols);
auto sortedWordCount = WordCount(begin(wordCount), end(wordCount));
std::sort(begin(sortedWordCount), end(sortedWordCount), [](auto const& p1, auto const& p2){ return p1.second > p2.second; });
return sortedWordCount;
}
})";

int main()
{
	print(getWordCount(code));
	system("pause");
}

由下图结果可以看出Count这个函数出现次数最多,所以我们的程序应该是个计数程序,这也与我们的出发点是一致的。

image.png

posted @ 2018-12-25 15:03  MrYun  阅读(290)  评论(0编辑  收藏  举报