[IR] Arithmetic Coding
Statistical methods的除了huffman外的另一种常见压缩方式。
Huffman coding的非连续数值特性成为了无法达到香农极限的先天无法弥补的缺陷,但Arithmetic coding给出了better solution。
当然,最好的东西往往伴随着各种专利。
2012年之后,貌似可以有一部分可以用了呢。
Encoding:
每个字符分配一个Range,size就是其比例(Probability)。
Algorithm:
Set low to 0.0
Set high to 1.0
While there are still input symbols do
get an input symbol
code_range = high - low.
high = low + range*high_range(symbol)
low = low + range*low_range (symbol)
End of While
output low or a number within the range
Decoding:
第四行:0.72167752, Low:0.6, High:0.8, 那么,下一个char会是什么?
range=0.8-0.6=0.2
encoded number = (0.72167752-0.6)/0.2 = 0.6083876 --> L
Algorithm:
get encoded number
Do
find symbol whose range straddles the encoded number
output the symbol
range = symbol high value - symbol low value
subtract symbol low value from encoded number
divide encoded number by range
until no more symbols
优化技巧:
其实,0.45即能解码成功。
大大地提高了压缩率。
Bzip2 and JPG use Huffman as AC protected by patents
PackJPG using AC shows 25% of size saving
关于专利:
U.S. Patent 4,122,440 — (IBM) Filed 4 March 77, Granted 24 October 78 (Now expired)
U.S. Patent 4,286,256 — (IBM) Granted 25 August 81 (Now expired)
U.S. Patent 4,467,317 — (IBM) Granted 21 August 84 (Now expired)
U.S. Patent 4,652,856 — (IBM) Granted 4 February 86 (Now expired)
U.S. Patent 4,891,643 — (IBM) Filed 15 September 86, granted 2 January 90 (Now expired)
U.S. Patent 4,905,297 — (IBM) Filed 18 November 88, granted 27 February 90 (Now expired)
U.S. Patent 4,933,883 — (IBM) Filed 3 May 88, granted 12 June 90 (Now expired)
U.S. Patent 4,935,882 — (IBM) Filed 20 July 88, granted 19 June 90 (Now expired)
U.S. Patent 4,989,000 — Filed 19 June 89, granted 29 January 91 (Now expired)
U.S. Patent 5,099,440 — (IBM) Filed 5 January 90, granted 24 March 92 (Now expired)
U.S. Patent 5,272,478 — (Ricoh) Filed 17 August 92, granted 21 December 93 (Now expired)
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 单元测试从入门到精通
· 上周热点回顾(3.3-3.9)
· winform 绘制太阳,地球,月球 运作规律