20192313 2020-2021-1 《数据结构与面向对象程序设计》 哈夫曼编码实践报告
实践内容
设有字符集:S={a,b,c,d,e,f,g,h,i,j,k,l,m,n.o.p.q,r,s,t,u,v,w,x,y,z}。
给定一个包含26个英文字母的文件,统计每个字符出现的概率,根据计算的概率构造一颗哈夫曼树。
并完成对英文文件的编码和解码。
要求:
(1)准备一个包含26个英文字母的英文文件(可以不包含标点符号等),统计各个字符的概率
(2)构造哈夫曼树
(3)对英文文件进行编码,输出一个编码后的文件
(4)对编码文件进行解码,输出一个解码后的文件
(5)撰写博客记录实验的设计和实现过程,并将源代码传到码云
(6)把实验结果截图上传到云班课
哈夫曼树相关知识
概念
-
路径:在一棵树中从一个结点往下到孩子或孙子结点之间的通路
-
结点的路径长度:从根节点到该节点的路径上分支的数目
-
树的路径长度:树中每个结点的路径长度之和
-
结点的权:给树中的结点赋予一个某种含义的值,则该值为该节点的权
-
结点的带权路径长度:结点的路径长度乘以结点的权
-
树的带权路径长度(WPL):树中所有叶子结点的带权路径长度 (Weight Path Length)
-
最优二叉树(哈夫曼树):带权路径长度最小的二叉树
构造哈夫曼树
- 给定n个权值{w1,w2,…wn},则构造出的哈夫曼树有n个叶子结点,构造过程如下:
1.将w1,w2…wn按从小到大排序,并将他们看做n棵只有一个结点的树组成的森林;
2.选出两个根节点权值最小的树合并,作为新树的左右子树,新树的根节点权值是左右子树根节点权值之和
3.从森林中删除选取的两棵树,将新树加入森林
4.重复2,3,直到只剩一棵树,所得即为最优二叉树
相关代码
所需实现代码的方法和要实现Comparable接口,比较权重好确定放的位置,编码是0还是1
public class HuffmanNode<T> implements Comparable<HuffmanNode<T>>{
private T name;
private double length;
private HuffmanNode<T> left;
private HuffmanNode<T> right;
String code;
public HuffmanNode(T name, double length){
this.name = name;
this.length = length;
code = "";
}
public T getName() {
return name;
}
public void setName(T name) {
this.name = name;
}
public double getLength() {
return length;
}
public void setLength(double length) {
this.length = length;
}
public HuffmanNode<T> getLeft() {
return left;
}
public void setLeft(HuffmanNode<T> left) {
this.left = left;
}
public HuffmanNode<T> getRight() {
return right;
}
public void setRight(HuffmanNode<T> right) {
this.right = right;
}
public String getCode(){
return code;
}
public void setCode(String str){
code = str;
}
@Override
public String toString(){
return "name:"+this.name+";length:"+this.length+";编码为: "+this.code;
}
@Override
//确定位置
public int compareTo(HuffmanNode<T> other) {
if(other.getLength() > this.getLength()){
return 1;
}
if(other.getLength() < this.getLength()){
return -1;
}
return 0;
}
}
第二步,创建树。当还有结点时,对结点进行排序,然后左孩子为数组中的个数-2的结点,右孩子为数组中的个数-1的结点(用数组实现树的那一章说过左右孩子在数组中的索引),赋予左孩子的编码为0,右孩子的编码为1,双亲结点则为左右孩子相加的权重(也就是左右孩子的概率和),把双亲结点加入链表中,从链表中把旧的左右孩子删除,直至链表中的结点只剩一个(也就是根结点)
public HuffmanNode<T> createTree(List<HuffmanNode<T>> nodes) {
while (nodes.size() > 1) {
Collections.sort(nodes);
HuffmanNode<T> left = nodes.get(nodes.size() - 2);
left.setCode(0 + "");
HuffmanNode<T> right = nodes.get(nodes.size() - 1);
right.setCode(1 + "");
HuffmanNode<T> parent = new HuffmanNode<T>(null, left.getLength() + right.getLength());
parent.setLeft(left);
parent.setRight(right);
nodes.remove(left);
nodes.remove(right);
nodes.add(parent);
}
return nodes.get(0);
}
确定相应字符的编码值
public List<HuffmanNode> breadth(HuffmanNode root) {
List<HuffmanNode> list = new ArrayList<HuffmanNode>();
Queue<HuffmanNode> queue = new ArrayDeque<HuffmanNode>();
if (root != null) {
queue.offer(root);
root.getLeft().setCode(root.getCode() + "0");
root.getRight().setCode(root.getCode() + "1");
}
while (!queue.isEmpty()) {
list.add(queue.peek());
HuffmanNode node = queue.poll();
if (node.getLeft() != null)
node.getLeft().setCode(node.getCode() + "0");
if (node.getRight() != null)
node.getRight().setCode(node.getCode() + "1");
if (node.getLeft() != null) {
queue.offer(node.getLeft());
}
if (node.getRight() != null) {
queue.offer(node.getRight());
}
}
return list;
}
建立编码文件,将要编码的字符串中的字符逐一与预先生成哈夫曼树时保存的字符编码对照表进行比较,找到以后,将字符的编码写入代码文件,直至所有的字符处理完为止
public static void main(String[] args) throws IOException {
List<String> list = new ArrayList<String>();
list.add("a");
list.add("b");
list.add("c");
list.add("d");
list.add("e");
list.add("f");
list.add("g");
list.add("h");
list.add("i");
list.add("j");
list.add("k");
list.add("l");
list.add("m");
list.add("n");
list.add("o");
list.add("p");
list.add("q");
list.add("r");
list.add("s");
list.add("t");
list.add("u");
list.add("v");
list.add("w");
list.add("x");
list.add("y");
list.add("z");
list.add(" ");
int[] number = new int[]{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
File file = new File("C:\\developer\\Huffman", "cyf.txt");
if (!file.exists()) {
file.createNewFile();
}
BufferedReader br = new BufferedReader(new FileReader(file));
String s;
String message = "";
while((s = br.readLine()) != null){
message += s;
}
得到每个字符(包括空格)出现的次数(类似于累加的形式)
for (int n = 0;n < result.length; n++){
for (int i = 0; i < 27; i++){
if (result[n].equals(list.get(i))){
number[i] += 1;
}
}
}
计算相应字母、空格出现的概率
List<HuffmanNode> nodeList = new ArrayList<HuffmanNode>();
DecimalFormat df = new DecimalFormat( "0.0000000");
double wei;
double sum = result.length;
for(int i = 0;i<27;i++){
wei = ((double) number[i]/sum);
System.out.println(list.get(i) + "出现" + number[i] + "次,概率为" + df.format(wei));
nodeList.add(new HuffmanNode(list.get(i),number[i]));
}
译码的基本思想是:读文件中的编码,并与原生成的哈夫曼编码表比较,遇到相等时即取出其对应的字符存入一个新串中
Collections.sort(nodeList);
HuffmanTree huffmanTree = new HuffmanTree();
HuffmanNode node = huffmanTree.createTree(nodeList);
List<HuffmanNode> inlist = new ArrayList<HuffmanNode>();
inlist = huffmanTree.breadth(node);
String[] name = new String[number.length];
String[] code = new String[number.length];
File file1 = new File("编码文件.txt");
File file2 = new File("解码文件.txt");
FileWriter fileWriter1 = new FileWriter(file1);
FileWriter fileWriter2 = new FileWriter(file2);
int temp = 0;
for(int e = 0;e<inlist.size();e++){
if(inlist.get(e).getName() != null){
System.out.println(inlist.get(e).getName()+"的编码为"+ inlist.get(e).getCode()+" ");
name[temp] = (String) inlist.get(e).getName();
code[temp] = inlist.get(e).getCode();
temp++;
}
}
String res = "";
for(int f = 0; f < sum; f++){
for(int j = 0;j<name.length;j++){
if(message.charAt(f) == name[j].charAt(0))
res += code[j];
}
}
System.out.println("编码后:"+ res);
List<String> putlist = new ArrayList<String>();
for(int i = 0;i < res.length();i++){
putlist.add(res.charAt(i)+"");
}
String string1 = "";
String string2 = "";
for(int h = putlist.size(); h > 0; h--){
string1 = string1 + putlist.get(0);
putlist.remove(0);
for(int i=0;i<code.length;i++){
if (string1.equals(code[i])) {
string2 = string2+""+ name[i];
string1 = "";
}
}
}
System.out.println("解码后:" + string2);
fileWriter1.write(res);
fileWriter2.write(string2);
fileWriter1.close();
fileWriter2.close();
}
private static int getFileLineCount(File file) {
int cnt = 0;
InputStream is = null;
try {
is = new BufferedInputStream(new FileInputStream(file));
byte[] c = new byte[1024];
int readChars = 0;
while ((readChars = is.read(c)) != -1) {
for (int i = 0; i < readChars; ++i) {
if (c[i] == '\n') {
++cnt;
}
}
}
} catch (Exception ex) {
cnt = -1;
ex.printStackTrace();
} finally {
try {
is.close();
} catch (Exception ex) {
ex.printStackTrace();
}
}
return cnt;
}
测试结果
文本为
hello
nice to meet you
huffman is yyds
测试截图