Python之zlib模块的使用
zlib模块作用:
压缩数据存放在硬盘或内存等设备
1、内存中数据的压缩与解压
#!/usr/bin/env python # -*- coding: utf-8 -*- import zlib import binascii original_data = b'This is the original text.' print('源始数据:长度 : {},内容 : {}'.format(len(original_data), original_data)) #压缩数据 compressed_data = zlib.compress(original_data) print('压缩的数据:长度 : {},内容 : {}'.format(len(compressed_data), binascii.hexlify(compressed_data))) #binascii.hexlify主要作用是将字节类转为16进制显示 #解压数据 decompress_data = zlib.decompress(compressed_data) print('压缩的数据:长度 : {},内容 : {}'.format(len(decompress_data), decompress_data))
运行效果
[root@ mnt]# python3 zlib_memory.py 源始数据:长度 : 26,内容 : b'This is the original text.' 压缩的数据:长度 : 32,内容 : b'789c0bc9c82c5600a2928c5485fca2ccf4ccbcc41c8592d48a123d007f2f097e' #小文件压缩未必减少文件或内存的大小 压缩的数据:长度 : 26,内容 : b'This is the original text.'
2、计算出大小达到多少时进行压缩才有用的示例
#!/usr/bin/env python # -*- coding: utf-8 -*- import zlib import binascii original_data = b'This is the original text.' template = '{:>15} {:>15}' print(template.format('原始长度', '压缩长度')) print(template.format('-' * 25, '-' * 25)) for i in range(5): data = original_data * i #数据倍增 compressed = zlib.compress(data) #压缩数据 highlight = '*' if len(data) < len(compressed) else '' #三目运算法,如果原始数据长度小于压缩的长度就显示* print(template.format(len(data), len(compressed)), highlight)
运行效果
[root@ mnt]# python3 zlib_lengths.py 原始长度 压缩长度 ------------------------- ------------------------- 0 8 * 26 32 * #从这里开始,压缩变得有优势 52 35 78 35 104 36
3、设置压缩级别来进行压缩数据的示例
#!/usr/bin/env python # -*- coding: utf-8 -*- import zlib import binascii original_data = b'This is the original text.' * 1024 template = '{:>15} {:>15}' print(template.format('压缩级别', '压缩大小')) print(template.format('-' * 25, '-' * 25)) for i in range(0, 10): data = zlib.compress(original_data, i) # 设置压缩级别进行压缩 print(template.format(i, len(data)))
运行效果
[root@python-mysql mnt]# python3 zlib_compresslevel.py 压缩级别 压缩大小 ------------------------- ------------------------- 0 26635 1 215 2 215 3 215 4 118 5 118 <==推荐 6 118 <==推荐 7 118 8 118 9 118
4、zlib增量压缩与解压
#!/usr/bin/env python # -*- coding: utf-8 -*- import zlib import binascii compressor = zlib.compressobj(1) with open('content.txt', 'rb') as input: while True: block = input.read(64) # 每次读取64个字节 if not block: break compressed = compressor.compress(block) if compressed: print('压缩数据: {}'.format( binascii.hexlify(compressed))) else: print('数据缓存中...') remaining = compressor.flush() # 刷新返回压缩的数据 print('Flushed: {}'.format(binascii.hexlify(remaining))) #一次性解压数据,需要注意的是增量压缩,默认会把zlib压缩的头部信息去除,所以解压时需要带上789c zlib_head = binascii.unhexlify('789c') decompress_data = zlib.decompress(zlib_head + remaining) print(decompress_data)
运行效果
[root@ mnt]# python3 zlib_incremental.py 压缩数据: b'7801' 数据缓存中... 数据缓存中... 数据缓存中... 数据缓存中... 数据缓存中... Flushed: b'55904b6ac4400c44f73e451da0f129b20c2110c85e696b8c40ddedd167ce1f7915025a087daa9ef4be8c07e4f21c38962e834b800647435fd3b90747b2810eb9c4bbcc13ac123bded6e4bef1c91ee40d3c6580e3ff52aad2e8cb2eb6062dad74a89ca904cbb0f2545e0db4b1f2e01955b8c511cb2ac08967d228af1447c8ec72e40c4c714116e60cdef171bb6c0feaa255dff1c507c2c4439ec9605b7e0ba9fc54bae39355cb89fd6ebe5841d673c7b7bc68a46f575a312eebd220d4b32441bdc1b36ebf0aedef3d57ea4b26dd986dd39af57dfb05d32279de' #解压的数据 b'Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Donec\negestas, enim et consectetuer ullamcorper, lectus ligula rutrum leo, a\nelementum elit tortor eu quam. Duis tincidunt nisi ut ante. Nulla\nfacilisi. Sed tristique eros eu libero. Pellentesque vel arcu. Vivamus\npurus orci, iaculis ac, suscipit sit amet, pulvinar eu,\nlacus.\n'
5、压缩与未压缩数据混合在一起的解压示例
#!/usr/bin/env python # -*- coding: utf-8 -*- import zlib lorem = open('zlib_mixed.py', 'rb').read() compressed = zlib.compress(lorem) # 压缩数据和没有压缩拼接在一起 combined = compressed + lorem # 创建一个压缩对象 decompressor = zlib.decompressobj() decompressed = decompressor.decompress(combined) # 这里只解压压缩的数据 decompressed_matches = decompressed == lorem print('解压数据的匹配:', decompressed_matches) unused_matches = decompressor.unused_data == lorem print('使用不解压数据的匹配 :', unused_matches)
运行效果
[root@ mnt]# python3 zlib_mixed.py
解压数据的匹配: True
使用不解压数据的匹配 : True
6、校验数据的完整性CRC32和adler32算法
#!/usr/bin/env python # -*- coding: utf-8 -*- import zlib data = open('test.py', 'rb').read() cksum = zlib.adler32(data) print('Adler32: {:12d}'.format(cksum)) print(' : {:12d}'.format(zlib.adler32(data, cksum))) cksum = zlib.crc32(data) print('CRC-32 : {:12d}'.format(cksum)) print(' : {:12d}'.format(zlib.crc32(data, cksum)))
运行效果
[root@ mnt]# python3 zlib_checksums.py Adler32: 4272063592 : 539822302 CRC-32 : 2072120480 : 1894987964
7、zlib网络传输压缩与解压数据的示例(示例最终会读取文件跟服务端传过来文件比较是否相等)
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Donec
egestas, enim et consectetuer ullamcorper, lectus ligula rutrum leo, a
elementum elit tortor eu quam. Duis tincidunt nisi ut ante. Nulla
facilisi. Sed tristique eros eu libero. Pellentesque vel arcu. Vivamus
purus orci, iaculis ac, suscipit sit amet, pulvinar eu,
lacus.
#!/usr/bin/env python # -*- coding: utf-8 -*- import socket import logging from io import BytesIO import binascii # 每次读取的块大小 import zlib BLOCK_SIZE = 64 if __name__ == '__main__': logging.basicConfig( level=logging.DEBUG, format='%(name)s : %(message)s' ) logger = logging.getLogger('Client') ip_port = ('127.0.0.1', 8000) logging.info('开始连接服务器:{}'.format(ip_port[0] + ':' + str(ip_port[1]))) # 创建socket对象 sk = socket.socket(family=socket.AF_INET, type=socket.SOCK_STREAM) # 连接服务器 sk.connect(ip_port) # 服务端需要读取的文件名 request_file = 'content.txt' logging.debug('发送文件名:{}'.format(request_file)) sk.send(request_file.encode('utf-8')) # 接收服务端数据 buffer = BytesIO() # 创建一个解压对象 decompressor = zlib.decompressobj() while True: response = sk.recv(BLOCK_SIZE) if not response: break logger.debug('从服务端读取数据:{}'.format(binascii.hexlify(response))) to_decompress = decompressor.unconsumed_tail + response while to_decompress: decompressed = decompressor.decompress(to_decompress) if decompressed: logger.debug('解压数据:{}'.format(decompressed)) buffer.write(decompressed) to_decompress = decompressor.unconsumed_tail else: logger.debug('缓存中...') to_decompress = None remainder = decompressor.flush() if remainder: logger.debug('刷新数据 {}'.format(remainder)) buffer.write(remainder) # 获取所有的解压数据 full_reponse = buffer.getvalue() read_file = open(request_file, 'rb').read() logger.debug('服务器传过来的文件与客户端读取的文件是否相等 : {}'.format(full_reponse == read_file)) sk.close()
#!/usr/bin/env python # -*- coding: utf-8 -*- import zlib import socketserver import logging import binascii # 每次读取的块大小 BLOCK_SIZE = 64 class ZlibRquestHandler(socketserver.BaseRequestHandler): logger = logging.getLogger('Server') def handle(self): # 创建一个压缩的对象 compressor = zlib.compressobj(1) # 接收客户端传来的文件名 filename = self.request.recv(1024) self.logger.debug('接收客户端数据,文件名 {}'.format(filename)) with open(filename, 'rb') as rf: while True: block = rf.read(BLOCK_SIZE) if not block: break self.logger.debug('读取文件内容:{}'.format(block)) # 压缩数据 compressed = compressor.compress(block) if compressed: self.logger.debug('发送的十六进制:{}'.format(binascii.hexlify(compressed))) self.request.send(compressed) else: self.logger.debug('缓存中...') # 获取压缩缓存剩下的数据 remaining = compressor.flush() while remaining: # 循环结束条件,就是刷新压缩缓存的数据,直到空为止 to_send = remaining[:BLOCK_SIZE] remaining = remaining[BLOCK_SIZE:] self.logger.debug('刷新缓存数据:{}'.format(binascii.hexlify(to_send))) self.request.send(to_send) return if __name__ == '__main__': logging.basicConfig( level=logging.DEBUG, format='%(name)s : %(message)s' ) ip_port = ('127.0.0.1', 8000) socketserver.TCPServer.allow_reuse_address = True server = socketserver.TCPServer(ip_port, ZlibRquestHandler) server.serve_forever()
运行效果
[root@ mnt]# python3 zlib_server.py Server : 接收客户端数据,文件名 b'content.txt' Server : 读取文件内容:b'Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Donec\n' Server : 发送的十六进制:b'7801' Server : 读取文件内容:b'egestas, enim et consectetuer ullamcorper, lectus ligula rutrum ' Server : 缓存中... Server : 读取文件内容:b'leo, a\nelementum elit tortor eu quam. Duis tincidunt nisi ut ant' Server : 缓存中... Server : 读取文件内容:b'e. Nulla\nfacilisi. Sed tristique eros eu libero. Pellentesque ve' Server : 缓存中... Server : 读取文件内容:b'l arcu. Vivamus\npurus orci, iaculis ac, suscipit sit amet, pulvi' Server : 缓存中... Server : 读取文件内容:b'nar eu,\nlacus.\n\n' Server : 缓存中... Server : 刷新缓存数据:b'55904b4a05410c45e7b58abb80a257e15044109cc7eaf808a4aada7cdefa4d8f44c820e473ef495eb7f1845c9e13e7d66d7009d0e4e8187b398fe04836d02997' Server : 刷新缓存数据:b'f890f500abc48197bd78347eb00779072f99e0f8bf94aa34c7b68bad434b2b1d2a8f54826558792aef0e6aac3c7945156e71c4b60a70e2276996578a23640d39' Server : 刷新缓存数据:b'730596b8200b73051f78bb5dda370dd1aa1ff8e01361e2213fc960db7e0ba97c557ae09d55cb89fd6e3e594136f2c0a73c69a6b72bad18b70de9101a5992a0d1' Server : 刷新缓存数据:b'e159b75f85f6f79e2bf5298b6eccdeb466fd68ed174d1979e8'
[root@p mnt]# python zlib_client.py root : 开始连接服务器:127.0.0.1:8000 root : 发送文件名:content.txt Client : 从服务端读取数据:780155904b4a05410c45e7b58abb80a257e15044109cc7eaf808a4aada7cdefa4d8f44c820e473ef495eb7f1845c9e13e7d66d7009d0e4e8187b398fe04836d0 Client : 解压数据:Lorem ipsum dolor sit amet, consectetuer a Client : 从服务端读取数据:2997f890f500abc48197bd78347eb00779072f99e0f8bf94aa34c7b68bad434b2b1d2a8f54826558792aef0e6aac3c7945156e71c4b60a70e2276996578a2364 Client : 解压数据:dipiscing elit. Donec egestas, enim et consectetuer ullamcorper, lectus ligula rutrum leo, a elementum elit tortor eu quam. Duis ti Client : 从服务端读取数据:0d39730596b8200b73051f78bb5dda370dd1aa1ff8e01361e2213fc960db7e0ba97c557ae09d55cb89fd6e3e594136f2c0a73c69a6b72bad18b70de9101a5992 Client : 解压数据:ncidunt nisi ut ante. Nulla facilisi. Sed tristique eros eu libero. Pellentesque vel arcu. Vivamus purus orci, iaculis Client : 从服务端读取数据:a0d1e159b75f85f6f79e2bf5298b6eccdeb466fd68ed174d1979e8 Client : 解压数据: ac, suscipit sit amet, pulvinar eu, lacus. Client : 服务器传过来的文件与客户端读取的文件是否相等 : True