在Python中使用protobuf2.6.1 string format utf-8 and unicode error
版本信息:
protobuf: v2.6.1
python: 2.7
关于在Python中使用protobuf时 string格式字段的编码问题
在python中编码格式多采用utf-8格式。而protobuf
官网中这样说到:
如果不做处理,在message 中定义了一个string类型的字段后,出现错误如下:
ERROR: ValueError: '\xe5\x94\x90\xe6\x9e\x9c' has type bytes, but isn't in 7-bit ASCII encoding. Non-ASCII strings must be converted to unicode objects before being added.
解决办法有两种。如下:
1) 一劳永逸的方法-修改源码
a. 文件../google/protobuf/internal/decoder.py
def StringDecoder(field_number, is_repeated, is_packed, key, new_default): """Returns a decoder for a string field.""" local_DecodeVarint = _DecodeVarint local_unicode = unicode def _ConvertToUnicode(byte_str): try: #return local_unicode(byte_str, 'utf-8') # 注释掉 不转码 return byte_str except UnicodeDecodeError, e: # add more information to the error message and re-raise it. e.reason = '%s in field: %s' % (e, key.full_name) raise
b. 文件../google/protobuf/internal/type_checkers.py
class UnicodeValueChecker(object): """Checker used for string fields. Always returns a unicode value, even if the input is of type str. """ def CheckValue(self, proposed_value): if not isinstance(proposed_value, (bytes, unicode)): message = ('%.1024r has type %s, but expected one of: %s' % (proposed_value, type(proposed_value), (bytes, unicode))) raise TypeError(message) # If the value is of type 'bytes' make sure that it is in 7-bit ASCII # encoding. # if isinstance(proposed_value, bytes): # try: # proposed_value = proposed_value.decode('ascii') # except UnicodeDecodeError: # raise ValueError('%.1024r has type bytes, but isn\'t in 7-bit ASCII ' # 'encoding. Non-ASCII strings must be converted to ' # 'unicode objects before being added.' % # (proposed_value)) return proposed_value
2) 很烦的方法-手动转码
在message中赋值时 都带上 decode("utf-8")