PTA 5-9 Huffman Codes (30) - 树 - 哈弗曼树

题目：http://pta.patest.cn/pta/test/16/exam/4/question/671

PTA - Data Structures and Algorithms (English) - 5-9

In 1953, David A. Huffman published his paper "A Method for the Construction of Minimum-Redundancy Codes", and hence printed his name in the history of computer science. As a professor who gives the final exam problem on Huffman codes, I am encountering a big problem: the Huffman codes are NOT unique. For example, given a string "aaaxuaxz", we can observe that the frequencies of the characters 'a', 'x', 'u' and 'z' are 4, 2, 1 and 1, respectively. We may either encode the symbols as {'a'=0, 'x'=10, 'u'=110, 'z'=111}, or in another way as {'a'=1, 'x'=01, 'u'=001, 'z'=000}, both compress the string into 14 bits. Another set of code can be given as {'a'=0, 'x'=11, 'u'=100, 'z'=101}, but {'a'=0, 'x'=01, 'u'=011, 'z'=001} is NOT correct since "aaaxuaxz" and "aazuaxax" can both be decoded from the code 00001011001001. The students are submitting all kinds of codes, and I need a computer program to help me determine which ones are correct and which ones are not.

Input Specification:

Each input file contains one test case. For each case, the first line gives an integer N (2≤N≤63), then followed by a line that contains all the N distinct characters and their frequencies in the following format:

c[1] f[1] c[2] f[2] ... c[N] f[N]

where c[i] is a character chosen from {'0' - '9', 'a' - 'z', 'A' - 'Z', '_'}, and f[i] is the frequency of c[i]and is an integer no more than 1000. The next line gives a positive integer M (≤1000), then followed by M student submissions. Each student submission consists of N lines, each in the format:

c[i] code[i]

where c[i] is the i-th character and code[i] is an non-empty string of no more than 63 '0's and '1's.

Output Specification:

For each test case, print in each line either "Yes" if the student's submission is correct, or "No" if not.

Note: The optimal solution is not necessarily generated by Huffman algorithm. Any prefix code with code length being optimal is considered correct.

Sample Input:

7                             //结点数目num
A 1 B 1 C 1 D 3 E 3 F 6 G 6   //每个结点数据data及出现的次数weight
4                             //测试数据的组数checkNum
A 00000                       //之后的 4*7行 是结点数据ch及其编码s
B 00001
C 0001
D 001
E 01
F 10
G 11
A 01010
B 01011
C 0100
D 011
E 10
F 11
G 00
A 000
B 001
C 010
D 011
E 100
F 101
G 110
A 00000
B 00001
C 0001
D 001
E 00
F 10
G 11

Sample Output:

Yes
Yes
No
No

题目分析：

这是一道考察“哈夫曼编码”的问题，但是这里不一定非要把哈夫曼树构造出来。Note: The optimal solution is not necessarily generated by Huffman algorithm

- 输入：第一行是结点数目num；第二行是每个结点数据data及出现的次数weight；第三行是测试数据的组数checkNum；第四行及以后是结点数据ch及编码s。

- 输出：对于每一组测试数据，输出编码是否符合“哈夫曼编码”，是则输出Yes，否则输出No。

- 符合“哈夫曼编码”需要符合两个条件：①WPL最小 ②编码的前缀不能是其他编码的前缀。

解法转自：http://www.cnblogs.com/clevercong/p/4193370.html

1) map 用于存放：A 1 B 1 C 1 D 3 E 3 F 6 G 6 //每个结点的数据data及出现的次数(权值)weight

2) 使用C++标准库中的优先队列：priority_queue，引入头文件 #include <queue>。优先队列底层由堆实现，数据放入队列后，会自动按照“优先级”排好顺序。

#include <map>
#include <queue>

map<char, int> myMap;
priority_queue<int, vector<int>, greater<int> >pq;  //小顶堆

for(int i=0; i<num; i++)  // 输入结点的数据c[i]、权值f[i]
{
    cin >> c[i] >> f[i];
    myMap[c[i]] = f[i];  // 映射
    pq.push(f[i]);  // 向队列中添加元素
}

3）计算WPL的值，从priority_queue中取出两个元素，相加之后再放回队列里。

// 计算WPL的值
int myWpl = 0;
while(!pq.empty())
{
    int myTop = pq.top();
    pq.pop();
    if(!pq.empty())
    {
        int myTop2 = pq.top();
        pq.pop();
        pq.push(myTop + myTop2);
        int m = myTop + myTop2;
        myWpl += m;  //每次加m(子节点权值重复加入) 等效于 路径长度*权值
    }
}

4) 测试数据需按编码排序，但标准库并没有为map制定sort函数，因此我们用vector装载pair类型，既可以模仿出map的功能，又可以用vector的排序函数。

#include <algorithm>  // sort()
typedef pair<char, string> PAIR;  // 用PAIR来代替pair<char, string> (编码类型:string)

// cmp()：自定义按什么内容或大小顺序排序
// 这里是按编码的长度排序
int cmp(const PAIR& x, const PAIR& y)
{
    return x.second.size() < y.second.size();
}
// vector + pair<,> 模仿 map
vector<PAIR> checkVec;
checkVec.push_back(make_pair(ch, s));  // 向vector中添加元素
sort(checkVec.begin(), checkVec.end(), cmp);  // 按照编码的长度排序

5) 判断前缀问题：substr函数，取字符串中的一段并与当前编码进行比较。

bool flag = true;  //已符合条件一：wpl最小
for(int i=0; i<num; i++)
{
    string tmp = checkVec[i].second;
　　for(int j=i+1; j<num; j++)
    {
        if(checkVec[j].second.substr(0,tmp.size())==tmp)
            flag = false;
    }
}

完整代码：

#include <iostream>
#include <algorithm>  // 排序函数 sort()
#include <map>
#include <queue>
using namespace std;

typedef pair<char, string> PAIR;  // + vector来模仿 map

int cmp(const PAIR& x, const PAIR& y)  // 自定义让sort()按哪种方式排序
{
    return x.second.size() < y.second.size();
}

int main()
{
    int num;
    cin >> num;
    char *c = new char[num];
    int *f = new int[num];
    map<char, int> myMap;  // 用来存节点数据及权值，并构成映射
    // 使用优级队列
    priority_queue<int, vector<int>, greater<int> >pq;

    for(int i=0; i<num; i++)  // 输入结点及出现次数(权值)
    {
        cin >> c[i] >> f[i];
        myMap[c[i]] = f[i];
        pq.push(f[i]);  // 将权值压入优先队列
    }
    // 计算WPL的值
    int myWpl = 0;
    while(!pq.empty())
    {
        int myTop = pq.top();
        pq.pop();
        if(!pq.empty())
        {
            int myTop2 = pq.top();
            pq.pop();
            pq.push(myTop + myTop2);
            int m = myTop + myTop2;
            myWpl += m;
        }
    }
    // 输入测试数据
    int checkNum;  // 测试数据的组数
    cin >> checkNum;
    for(int i=0; i<checkNum; i++)
    {
        int wpl = 0;
        char ch;
        string s;
        // vector + PAIR 模仿 map，使其可排序
        vector<PAIR> checkVec;
        for(int j=0; j<num; j++)
        {
            cin >> ch >> s;
            checkVec.push_back(make_pair(ch, s));  // 向vector中添加测试数据及其编码
            wpl += s.size() * myMap[ch];
        }
        sort(checkVec.begin(), checkVec.end(), cmp);  // 按照编码长度排序
        if(wpl != myWpl)
        {
            cout << "No" << endl;
            continue;
        }
        else
        {
            bool flag = true;  // 表示已满足条件一：wpl最小(wpl==myWpl)

            //条件二：编码的前缀不能是其他编码的前缀：substr()
            for(int i=0; i<num; i++)
            {
                string tmp = checkVec[i].second;
                for(int j=i+1; j<num; j++)
                {
                    if(checkVec[j].second.substr(0,tmp.size())==tmp)
                        flag = false;
                }
            }
            if(flag == true)
                cout << "Yes" << endl;
            else
                cout << "No" << endl;
            continue;
        }
        cout << "Yes" << endl;
    }
    return 0;
}

posted @ 2015-09-21 16:08 claremz 阅读(922) 评论(1) 编辑收藏举报

刷新页面返回顶部

=。=