PTA 5-9 Huffman Codes (30) - 树 - 哈弗曼树
PTA - Data Structures and Algorithms (English) - 5-9
In 1953, David A. Huffman published his paper "A Method for the Construction of Minimum-Redundancy Codes", and hence printed his name in the history of computer science. As a professor who gives the final exam problem on Huffman codes, I am encountering a big problem: the Huffman codes are NOT unique. For example, given a string "aaaxuaxz", we can observe that the frequencies of the characters 'a', 'x', 'u' and 'z' are 4, 2, 1 and 1, respectively. We may either encode the symbols as {'a'=0, 'x'=10, 'u'=110, 'z'=111}, or in another way as {'a'=1, 'x'=01, 'u'=001, 'z'=000}, both compress the string into 14 bits. Another set of code can be given as {'a'=0, 'x'=11, 'u'=100, 'z'=101}, but {'a'=0, 'x'=01, 'u'=011, 'z'=001} is NOT correct since "aaaxuaxz" and "aazuaxax" can both be decoded from the code 00001011001001. The students are submitting all kinds of codes, and I need a computer program to help me determine which ones are correct and which ones are not.
Input Specification:
Each input file contains one test case. For each case, the first line gives an integer N (2≤N≤63), then followed by a line that contains all the N distinct characters and their frequencies in the following format:
c[1] f[1] c[2] f[2] ... c[N] f[N]
where c[i] is a character chosen from {'0' - '9', 'a' - 'z', 'A' - 'Z', '_'}, and f[i] is the frequency of c[i]and is an integer no more than 1000. The next line gives a positive integer M (≤1000), then followed by M student submissions. Each student submission consists of N lines, each in the format:
c[i] code[i]
where c[i]
is the i
-th character and code[i]
is an non-empty string of no more than 63 '0's and '1's.
Output Specification:
For each test case, print in each line either "Yes" if the student's submission is correct, or "No" if not.
Note: The optimal solution is not necessarily generated by Huffman algorithm. Any prefix code with code length being optimal is considered correct.
Sample Input:
7 //结点数目num
A 1 B 1 C 1 D 3 E 3 F 6 G 6 //每个结点数据data及出现的次数weight
4 //测试数据的组数checkNum
A 00000 //之后的 4*7行 是结点数据ch及其编码s
B 00001
C 0001
D 001
E 01
F 10
G 11
A 01010
B 01011
C 0100
D 011
E 10
F 11
G 00
A 000
B 001
C 010
D 011
E 100
F 101
G 110
A 00000
B 00001
C 0001
D 001
E 00
F 10
G 11
Sample Output:
这是一道考察“哈夫曼编码”的问题,但是这里不一定非要把哈夫曼树构造出来。Note: The optimal solution is not necessarily generated by Huffman algorithm
- 输入:第一行是结点数目num;第二行是每个结点数据data及出现的次数weight;第三行是测试数据的组数checkNum;第四行及以后是结点数据ch及编码s。
- 输出:对于每一组测试数据,输出编码是否符合“哈夫曼编码”,是则输出Yes,否则输出No。
- 符合“哈夫曼编码”需要符合两个条件:①WPL最小 ②编码的前缀不能是其他编码的前缀。
1) map 用于存放:A 1 B 1 C 1 D 3 E 3 F 6 G 6 //每个结点的数据data及出现的次数(权值)weight
2) 使用C++标准库中的优先队列:priority_queue,引入头文件 #include <queue>。优先队列底层由堆实现,数据放入队列后,会自动按照“优先级”排好顺序。
#include <map> #include <queue> map<char, int> myMap; priority_queue<int, vector<int>, greater<int> >pq; //小顶堆 for(int i=0; i<num; i++) // 输入结点的数据c[i]、权值f[i] { cin >> c[i] >> f[i]; myMap[c[i]] = f[i]; // 映射 pq.push(f[i]); // 向队列中添加元素 }
3) 计算WPL的值,从priority_queue中取出两个元素,相加之后再放回队列里。
// 计算WPL的值 int myWpl = 0; while(!pq.empty()) { int myTop =; pq.pop(); if(!pq.empty()) { int myTop2 =; pq.pop(); pq.push(myTop + myTop2); int m = myTop + myTop2; myWpl += m; //每次加m(子节点权值重复加入) 等效于 路径长度*权值 } }
4) 测试数据需按编码排序,但标准库并没有为map制定sort函数,因此我们用vector装载pair类型,既可以模仿出map的功能,又可以用vector的排序函数。
#include <algorithm> // sort() typedef pair<char, string> PAIR; // 用PAIR来代替pair<char, string> (编码类型:string) // cmp():自定义按什么内容或大小顺序排序 // 这里是按编码的长度排序 int cmp(const PAIR& x, const PAIR& y) { return x.second.size() < y.second.size(); } // vector + pair<,> 模仿 map vector<PAIR> checkVec; checkVec.push_back(make_pair(ch, s)); // 向vector中添加元素 sort(checkVec.begin(), checkVec.end(), cmp); // 按照编码的长度排序
5) 判断前缀问题:substr函数,取字符串中的一段并与当前编码进行比较。
bool flag = true; //已符合条件一:wpl最小 for(int i=0; i<num; i++) { string tmp = checkVec[i].second; for(int j=i+1; j<num; j++) { if(checkVec[j].second.substr(0,tmp.size())==tmp) flag = false; } }
#include <iostream> #include <algorithm> // 排序函数 sort() #include <map> #include <queue> using namespace std; typedef pair<char, string> PAIR; // + vector来模仿 map int cmp(const PAIR& x, const PAIR& y) // 自定义让sort()按哪种方式排序 { return x.second.size() < y.second.size(); } int main() { int num; cin >> num; char *c = new char[num]; int *f = new int[num]; map<char, int> myMap; // 用来存节点数据及权值,并构成映射 // 使用优级队列 priority_queue<int, vector<int>, greater<int> >pq; for(int i=0; i<num; i++) // 输入结点及出现次数(权值) { cin >> c[i] >> f[i]; myMap[c[i]] = f[i]; pq.push(f[i]); // 将权值压入优先队列 } // 计算WPL的值 int myWpl = 0; while(!pq.empty()) { int myTop =; pq.pop(); if(!pq.empty()) { int myTop2 =; pq.pop(); pq.push(myTop + myTop2); int m = myTop + myTop2; myWpl += m; } } // 输入测试数据 int checkNum; // 测试数据的组数 cin >> checkNum; for(int i=0; i<checkNum; i++) { int wpl = 0; char ch; string s; // vector + PAIR 模仿 map,使其可排序 vector<PAIR> checkVec; for(int j=0; j<num; j++) { cin >> ch >> s; checkVec.push_back(make_pair(ch, s)); // 向vector中添加测试数据及其编码 wpl += s.size() * myMap[ch]; } sort(checkVec.begin(), checkVec.end(), cmp); // 按照编码长度排序 if(wpl != myWpl) { cout << "No" << endl; continue; } else { bool flag = true; // 表示已满足条件一:wpl最小(wpl==myWpl) //条件二:编码的前缀不能是其他编码的前缀:substr() for(int i=0; i<num; i++) { string tmp = checkVec[i].second; for(int j=i+1; j<num; j++) { if(checkVec[j].second.substr(0,tmp.size())==tmp) flag = false; } } if(flag == true) cout << "Yes" << endl; else cout << "No" << endl; continue; } cout << "Yes" << endl; } return 0; }