编码

acsii：只有英文
　　　　字符：00000000  8位  1个字节表示1个字符

unicode：一个字符都是32位
　　　　英文字符：00000000 00000000 00000000 00000000  32位  4个字节表示1个字符
　　　　中文字符：00000000 00000000 00000000 00000000  32位  4个字节表示1个字符

utf-8：
　　　　英文字符：00000000  8位  1个字节表示1个字符
　　　　中文字符：00000000 00000000 00000000  24位  3个字节表示1个字符

gbk：
　　　　英文字符：00000000  8位  1个字节表示1个字符
　　　　中文字符：00000000 00000000   16位  2个字节表示1个字符

①各个编码之间的二进制，是不能互相识别的，会产生乱码

②文件的储存，传输，不能是unicode（只能是utf-8,utf-16,gbk,gb2312,asciid等）

③在python3中：

str在内存中是用unicode编码，不能直接传输和存储，需经过bytes类型才能完成
      对于英文：
　　　　　　str ：表现形式：s = "ppd"    print(s,type(s))    #ppd <class 'str'>
　　　　　　编码方式： 010101010 unicode
　　　　　　bytes ：表现形式：s1 = b"ppd" print(s1,type(s1))    #b'ppd' <class 'bytes'>
　　　　　　编码方式： 000101010 utf-8 gbk...

　　　　对于中文：
　　　　　　str ：表现形式：s2 = "中国" print(s2,type(s2))    #中国 <class 'str'>
　　　　　　编码方式： 010101010 unicode
　　　　　　bytes ：表现形式：s3 = b"x\e91\e91\e01\e21\e31\e32" print(s3,type(s3))    #SyntaxError: bytes can only contain ASCII literal characters.（报错）
　　　　　　编码方式： 000101010 utf-8 gbk...

④encode()：如何将str转换成bytes类型

s = "ppd"
s1 = s.encode("utf-8")    
print(s1)    #b'ppd'
s2 = s2.encode("gbk")    
print(s2)    #b'ppd'

s = '中国'
s1 = s.encode("utf-8")    
print(s1)    #b'\xe4\xb8\xad\xe5\x9b\xbd'
s2 = s.encode("gbk")    
print(s2)    #b'\xd6\xd0\xb9\xfa'

posted @ 2018-08-27 22:27 就俗人一个阅读(216) 评论(0) 编辑收藏举报

刷新页面返回顶部

编码

公告