haffman树

1. 简介

haffman编码主要用于数据压缩，huffman树可以解决二进制编码时码长最短且无二义性。haffman树是haffman编码的基础。根据字符出现的频率，利用haffman树可以构造一种不等长的二进制编码，并且构造所得的haffman编码是一种最优前缀编码，可以使编码后的电文长度最短，且保证任何一个字符的编码都不是同一字符集中另一字符码的前缀。

2.haffman树的概念

haffman树是带权路径长度最小的二叉树。

几个基本的概念：

结点的路径长度：从根结点到该结点的路径上的边数。

树的路径的长度：树中每个叶子结点的路径长度之和。

结点的带权路径长度：结点的路径长度与结点权值的乘积。

树的带权路径长度：树的带权路径长度WPL（Weight Path Length）是树中所有叶子结点的带权路径长度之和。

3.haffman树构造方法

算法：

（1）根据权值w1,w2,w3,...,wn构造n个二叉树F={T1,T2,...,Tn}，其中Ti是只含权值wi的结点

（2）从F中选两个权值最小的二叉树Ti和Tj，构造一个根结点R，R的权WR为Wi+Wj。

（3）从F中删除Ti和Tj，加入新的树根结点R到F中。

（4）重复步骤2和步骤3，直到F中只有一棵树为止。

haffman树构建示意图：

4.haffman编码及其实现

haffman树是haffman编码的基础，利用haffman树可以构造haffman编码。haffman编码的基本原理是频繁使用的数据用较短的代码代替，而较少使用的数据则用较长的代码代替，这样可以使报文中的编码数减到最小，从而达到压缩的目的。

例如给出一段报文： ABCD ABCD CBD BD B DBCB

报文中出现的字符集{A,B,C,D}，其中B出现的次数最多，7次，其次是D，5次，然后依次是C和A，分别出现的次数为4和2。对4个字符进行编码，至少需要两位二进制码。首先给出不经过压缩的方法，每个字符两位二进制编码为A:00 B:01 C:10 D:11，则将原文编码所得结果如下：

00011011 00011011 100111 0111 01 11011001 编码长度为36

采用 haffman编码，则 B:0 D:10 C:110 A:111

报文编码结果： 111011010 111011010 110010 010 0 1001100 编码长度为35

利用构建的haffman树的方法可以得到报文的haffman编码，具体算法如下：

（1）统计处每个符号出现的频率，例如上例中的频率统计为{2/18,7/18,4/18,5/18}.

（2）从左到右把上述的频率按从小到大的顺序排列。

（3）每一次挑出的最小的两个值作为二叉树的两个叶结点，并将它们合并后的结点作为根结点，这两个结点不再参与比较，新的根结点参与比较。

（4）重复上一步，直到最后得到和为1 的根节点。

（5）将新形成的二叉树的左结点标0，右结点标1。把从最上面的根结点到最下面的叶子结点途中遇到的0,1序列串起来，就得到各个符号的编码

当报文中高频字符频率较高时，haffman呈现出较高的压缩比。haffman不仅应用于文本编码，而且同样应用于图像编码等领域中。haffman优点在于：

a）对于给出的报文可以得到最短编码

b）非同一个字符的任意两个字符A和B，不会出现A的编码时B的编码的前缀这种情况。这是因为如果设编码字符集为{C1,C2,...,Cn}，那么根到任何叶子结点Ci的路径都不会是另一个编码的前缀。因而两个字符之间不需要分隔符。

由上文可知，产生haffman编码需要对原始数据扫描两遍。第一遍扫描是为了统计出原始数据中的每个值出现的频率，第二遍是建立haffman树进行编码，由于需要建立二叉树并遍历二叉树生成的编码，因此数据压缩和还原速度比较慢。

HuffmanTree.cpp

代码

#include <iostream>
#include "HuffmanTree.h"

using namespace std;
#define  maxLength  100

void main()
{
    HuffmanTree *tree = NULL;
    HuffmanCode code;
    char data[maxLength];
    char *letter;
    int *weight;
    int count;
    cout<<"请输入一行文本数据："<<endl;
    cin>>data;
    cout<<endl;
    OutputWeight(data,strlen(data),&letter,&weight,&count);
    HuffmanCodeing(tree, &code, weight, count);
    cout<<"字符  出现频率  编码结果"<<endl;
    for(int i=0;i<count;i++)
    {
        cout<<letter[i]<<"     ";
        cout<<weight[i]/1000.0<<"%\t";
        count<<code++<<endl;
        //code++;
        //count<<code[i+1]<<endl;
    }
    cout<<endl;
}

/*从结点集合中选出权值最小的两个结点，将值分别赋给s1和s2*/
void Select(HuffmanTree *tree, int count, int *s1, int *s2)
{
    unsigned int temp1=0;
    unsigned int temp2=0;
    unsigned int temp3=0;
    for(int i=1; i<=count; i++)
    {
        if(temp1 == 0)
        {
            temp1 = tree[i].weight;
            *s1=i;
        }
        else
        {
            if(temp2 == 0)
            {
                temp2 = tree[i].weight;
                *s2 = i;
                if(temp2 < temp1)
                {
                    temp3 = temp2;
                    temp2 = temp1;
                    temp1 = temp3;
                    temp3 = *s2;
                    *s2 = *s1;
                    *s1 = temp3;
                }//if(temp2 < temp1)
            }//if(temp2 == 0)        
            else
            {
                if(tree[i].weight < temp1)
                {
                    temp2 = temp1;
                    temp1 = tree[i].weight;
                    *s2 = *s1;
                    *s1 = i;
                }
                if(tree[i].weight > temp1 && tree[i].weight<temp2)
                {
                    temp2 = tree->weight;
                    *s2 = i;
                }//if(tree[i].weight > temp1 && tree[i].weight<temp2)
            }//else
        }//else
    }
}

/*Huffman编码函数*/
void HufffmanCoding(HuffmanTree *tree, HuffmanCode *code, int *weight, int count)
{
    int i;
    int s1,s2;
    int totalLength;
    char *cd;
    unsigned int c;
    unsigned int f;
    int start;
    if(count<=1)  return;
    totalLength = count*2-1;
    tree = new HuffmanTree[(totalLength+1)*sizeof(HuffmanTree)];

    for(i=1;i<=count;i++)
    {
        tree[i].parent = 0;
        tree[i].lChild = 0;
        tree[i].rChild = 0;
        tree[i].weight = (*weight);
        weight++;
    }
    //构造haffman树
    for(i=count+1;i<totalLength;++i)
    {
        Select(tree,i-1,&s1,&s2);
        tree[s1].parent = i;
        tree[s2].parent = i;
        tree[i].lChild = s1;
        tree[i].rChild = s2;
        tree[i].weight = tree[s1].weight + tree[s2].weight;
    }

    //输出haffman 编码
    (*tree) = (HuffmanCode)malloc((count+1)*sizeof(char*));
    cd = new char[count*sizeof(char)];
    cd[count-1]='\0';
    for(i=1;i<=count;++i)
    {
        start=count-1;
        for(c=i,f=tree[i].parent;f!=0;c=f,f=tree[f].parent)
        {
            if(tree[f].lChild == c)
                cd[--start] = '0';
            else
                cd[--start] = '1';
            (*tree)[i] = new char[(count - start)*sizeof(char)];
            strcpy((*tree)[i], &cd[start]);
        }
    }
    delete [] tree;
    delete [] cd;
}

/*在字符串中查找某个字符，如果找到，则返回其位置*/
int LookFor(char *str, char letter, int count)
{
    int i;
    for(i=0;i<count;i++)
    {
        if(str[i] == letter)
            return i;
    }
    return -1;
}
void OutputWeight(char *data, int length, char **letter, int **weight, int *count)
{
    int i;
    char *letterArray = new char[length];
    int *letterCount = new int[length];
    int allCount=0;
    int index;
    int sum=0;
    float persent = 0;
    for(i=0;i<length;i++)
    {
        if(i==0)
        {
            letterArray[0] = data[i];
            letterCount[0] = 1;
            allCount++;
        }
        else
        {
            index=LookFor(letterArray,data[i],allCount);
            if(index == -1)
            {
                letterArray[allCount] = data[i];
                letterCount[allCount]=1;
                allCount++;
            }
            else
            {
                letterCount[index]++;
            }
        }
    }
    for(i=0;i<allCount;i++)
    {
        sum=sum+letterCount[i];
    }
    *weight = new int[allCount];
    *letter = new char[allCount];
    for(i=0;i<allCount;i++)
    {
        persent = (float)letterCount[i]/(float)sum;
        (*weight)[i] = (int)(1000 * persent);
        (*letter)[i]=letterArray[i];
    }
    *count = allCount;
    delete [] letterArray;
    delete [] letterCount;
}

HuffmanTree.h

代码

#if !defined(_HUFFMANTREE_H_)
#define _HUFFMANTREE_H_

/*
   haffman 树结构
*/
class   HuffmanTree
{
    public:
        unsigned int weight;
        unsigned int parent;
        unsigned int lChild;
        unsigned int rChild;
};

typedef char *HuffmanCode;

/*从结点集合中选出权值最小的两个结点，将值分别赋给s1和s2*/
void Select(HuffmanTree *tree, int count, int *s1, int *s2);
/*Huffman编码函数*/
void HuffmanCodeing(HuffmanTree *tree, HuffmanCode *code, int *weight, int count);
/*在字符串中查找某个字符，如果找到，则返回其位置*/
int LookFor(char *str, char letter, int count);
void OutputWeight(char *data, int length, char **letter, int **weight, int *count);

#endif

posted on 2010-10-25 22:35 Blanche 阅读(1833) 评论(1) 编辑收藏举报