随笔 - 33  文章 - 2 评论 - 66 阅读 - 31万
< 2025年3月 >
23 24 25 26 27 28 1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31 1 2 3 4 5

  Now I will introduce a way to compress a text. When we are confronted with numerous data, and the data has a similar structure, we can take advantage of the feature to improve the performance of compression. In most of times, we could take the method to compress a text as its feature of data structure.

  we classify the method named dictionary method into two categories. One is static dictionary method, and the other is auto or dynamic dictionary method.

Now I plan to describe the first shortly with a routine example.

  if we have much information about a structure of a text , it is available to take the static dictionary method. We could use many ways to implement the method varying with occasions, but a way named double letters code is popular with programmers.

  To make it clearer, I prefer to take a simple example to explain the method, as follows.

  Now there is a signal composed by five letters, that is 'a', 'b', 'c', 'd' and 'r'. Then we get a dictionary accroding to our signal knowledge. The dictionary is

code letter
000 a
001 b
010 c
011 d
100 r
101 ab
110 ac
111 ad

  Then I will code a sequence that is 'abracadabra'.

  At first, the coder will read the first of two letters, which are 'ab'. After that, the coder have to find if the pair of letters is in our dictionary. If it does,  the coder will return the letters's code and read the next letters. otherwise it will return the first letter's code and read the following letter. In this example, the coder will find the code in the dictionary, and return '101'. Following the step, the coder reads 'ra', but it cann't find the value of our dictionary by key 'ra'. So it have to return the code of 'r' that is '100', and read the letter 'c' following 'a' to compose of a new pair of letters  that is 'ac'. The coder return '110'. Then read 'ad', return '110'. ...

  The output is '101100110111101100000'.

  The routine written by python is as follows.  

复制代码
 1 def getCodeDict():
 2     codeDict = {}
 3     codeDict['a'] = '000'
 4     codeDict['b'] = '001'
 5     codeDict['c'] = '010'
 6     codeDict['d'] = '011'
 7     codeDict['r'] = '100'
 8     codeDict['ab'] = '101'
 9     codeDict['ac'] = '110'
10     codeDict['ad'] = '111'
11     return codeDict
12 
13 def compress(code):
14     print('start to compress')
15     result = ''
16     codeDict = getCodeDict()
17     offset = 2
18     unCodedCode = code
19     while unCodedCode != '':
20         targetCode = unCodedCode[0 : 2] 
21         if targetCode in codeDict:
22             #find a pair of letters, and move two steps
23             result = result + codeDict[targetCode]
24             offset = 2
25         else :
26             #not find a pair of letters, and move only one step
27             result = result + codeDict[targetCode[0]]
28             offset = 1
29         unCodedCode = unCodedCode[offset : ]
30     print('complete to compress')
31     return result  
32     
33 if __name__=='__main__':
34     signals = 'abracadabra'
35     result = compress(signals)
36     print(result)
复制代码

 

posted on   转瞬之夏  阅读(434)  评论(0编辑  收藏  举报
编辑推荐:
· go语言实现终端里的倒计时
· 如何编写易于单元测试的代码
· 10年+ .NET Coder 心语,封装的思维:从隐藏、稳定开始理解其本质意义
· .NET Core 中如何实现缓存的预热?
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
阅读排行:
· 周边上新:园子的第一款马克杯温暖上架
· Open-Sora 2.0 重磅开源!
· 分享 3 个 .NET 开源的文件压缩处理库,助力快速实现文件压缩解压功能!
· Ollama——大语言模型本地部署的极速利器
· DeepSeek如何颠覆传统软件测试?测试工程师会被淘汰吗?
点击右上角即可分享
微信分享提示