pbjs 无法编码 bytes 类型数据问题的解决方案

问题背景

之前写过一篇《使用脚本收发 protobuf 协议数据 》,通过 pbjs 命令可以将 protobuf 二进制数据转换为 json:

> pbjs msg.proto --decode ProbeIpv6Response < response.bin
{
  "selfAddr": {
    "addrV6": "2409:8900:7900:8f0d:ecd9:4aee:aa3:7ad",
    "portV6": 46066
  },
  "brosAddr": [
    {
      "addrV6": "2409:8a34:4405:6624:5250:9d04:cf77:d",
      "portV6": 18720
    },
    {
      "addrV6": "2409:8a34:401a:4151:59e6:69b4:37ad:dea2",
      "portV6": 18679
    },
    {
      "addrV6": "2409:8a20:2a02:20c0:7d11:9a6b:6b51:a9bb",
      "portV6": 18824
    },
    {
      "addrV6": "2409:8a20:e0d:7773:50d4:93b0:680a:b555",
      "portV6": 18968
    },
    {
      "addrV6": "2409:8a44:5b20:edf2:7c09:a5e1:cdbf:69c6",
      "portV6": 18008
    }
  ]
}

反过来将 json 编码为二进制数据也没问题:

> pbjs msg.proto --encode ProbeIpv6Response < response.json > response2.bin
> xxd response2.bin
00000000: 122b 0a25 3234 3039 3a38 3930 303a 3739  .+.%2409:8900:79
00000010: 3030 3a38 6630 643a 6563 6439 3a34 6165  00:8f0d:ecd9:4ae
00000020: 653a 6161 333a 3761 6410 f2e7 021a 2a0a  e:aa3:7ad.....*.
00000030: 2432 3430 393a 3861 3334 3a34 3430 353a  $2409:8a34:4405:
00000040: 3636 3234 3a35 3235 303a 3964 3034 3a63  6624:5250:9d04:c
00000050: 6637 373a 6410 a092 011a 2d0a 2732 3430  f77:d.....-.'240
00000060: 393a 3861 3334 3a34 3031 613a 3431 3531  9:8a34:401a:4151
00000070: 3a35 3965 363a 3639 6234 3a33 3761 643a  :59e6:69b4:37ad:
00000080: 6465 6132 10f7 9101 1a2d 0a27 3234 3039  dea2.....-.'2409
00000090: 3a38 6132 303a 3261 3032 3a32 3063 303a  :8a20:2a02:20c0:
000000a0: 3764 3131 3a39 6136 623a 3662 3531 3a61  7d11:9a6b:6b51:a
000000b0: 3962 6210 8893 011a 2c0a 2632 3430 393a  9bb.....,.&2409:
000000c0: 3861 3230 3a65 3064 3a37 3737 333a 3530  8a20:e0d:7773:50
000000d0: 6434 3a39 3362 303a 3638 3061 3a62 3535  d4:93b0:680a:b55
000000e0: 3510 9894 011a 2d0a 2732 3430 393a 3861  5.....-.'2409:8a
000000f0: 3434 3a35 6232 303a 6564 6632 3a37 6330  44:5b20:edf2:7c0
00000100: 393a 6135 6531 3a63 6462 663a 3639 6336  9:a5e1:cdbf:69c6
00000110: 10d8 8c01

编码生成的 response2.bin 与原始的 response.bin 完全一致。

然而后来在编码另一种消息格式的时候,重新生成的 bin 文件和原始文件有很大差异,导致不能通过 pbjs 将 json 转化为 binary 数据。

问题现象

为了说明白这个问题,先来看消息定义:

message common
{
    required uint32 mem1 = 1;
    required uint32 mem2 = 2;
    required bytes  mem3 = 3;
    required uint32 mem4 = 4;
    required uint64 mem5 = 5;
    optional uint32 mem6 = 6;
    optional bytes  mem7 = 7;
    optional uint32 mem8 = 8;
    optional uint64 mem9 = 9;
}

message query_md5
{
    required common mema = 1;
    required uint32 memb = 2;
    required bytes  memc = 3;
    required uint32 memd = 4; 
    required uint64 meme = 5; 
    repeated bytes  memf = 6; 
}

出于协议安全考虑,这里字段全部使用 memxx 代替。下面是 proto 消息对应的原始数据:

> xxd tmp/resp.bin
0000000: 0a37 0802 10c3 8040 1a10 ba38 ba93 af7a  .7.....@...8...z
0000010: dae8 1967 2b89 ddd2 6b5c 200b 28b4 baba  ...g+...k\ .(...
0000020: a8b6 0130 003a 0a32 2e32 2e31 3031 2e32  ...0.:.2.2.101.2
0000030: 3740 0348 f0db 8883 0910 001a 1067 c607  7@.H.........g..
0000040: 215e 47ae 8925 272d 6da0 f602 2d20 0028  !^G..%'-m...- .(
0000050: a0cd c90a 3210 d15b f326 4708 bfc7 01e0  ....2..[.&G.....
0000060: 4b3d c624 38a3 3210 3195 44f3 2f32 1b96  K=.$8.2.1.D./2..
0000070: 7865 6b82 fdb8 9560 3210 9a75 1735 fcca  xek....`2..u.5..
0000080: e66f 7486 e9fa dc6a 9fab 3210 284c ebbf  .ot....j..2.(L..
0000090: 36e0 1d57 5ca6 93de 391b 7a7d 3210 3e0b  6..W\...9.z}2.>.
00000a0: 439c 62a5 a401 c3ff cf00 3299 bc7e 3210  C.b.......2..~2.
00000b0: f6b9 9746 9ce6 9555 52d3 f50b 6ca3 8eb1  ...F...UR...l...
00000c0: 3210 9852 e7f1 2530 cb6b 7aa0 5569 fbcd  2..R..%0.kz.Ui..
00000d0: 0a5c 3210 d333 33b1 d516 d868 3938 f307  .\2..33....h98..
00000e0: bffe d4c0 3210 a646 0cdf 2874 486a 0bc0  ....2..F..(tHj..
00000f0: edf1 6f51 b59e 3210 1eee e679 5bf1 0832  ..oQ..2....y[..2
0000100: d5a7 fc4f 60cf 48ab 3210 c446 9663 f6a4  ...O`.H.2..F.c..
0000110: 87cd fc3f d560 285c 0ea4                 ...?.`(\..

经过 pbjs 解码后得到如下 json:

> pbjs query_md5.proto --decode query_md5 < tmp/resp.bin > resp.json
> jq -c '.' resp.json
{"mema":{"mem1":2,"mem2":1048643,"mem3":{"type":"Buffer","data":[186,56,186,147,175,122,218,232,25,103,43,137,221,210,107,92]},"mem4":11,"mem5":{"low":1695456564,"high":11,"unsigned":true},"mem6":0,"mem7":{"type":"Buffer","data":[50,46,50,46,49,48,49,46,50,55]},"mem8":3,"mem9":{"low":-1872613904,"high":0,"unsigned":true}},"memb":0,"memc":{"type":"Buffer","data":[103,198,7,33,94,71,174,137,37,39,45,109,160,246,2,45]},"memd":0,"meme":{"low":22177440,"high":0,"unsigned":true},"memf":[{"type":"Buffer","data":[209,91,243,38,71,8,191,199,1,224,75,61,198,36,56,163]},{"type":"Buffer","data":[49,149,68,243,47,50,27,150,120,101,107,130,253,184,149,96]},{"type":"Buffer","data":[154,117,23,53,252,202,230,111,116,134,233,250,220,106,159,171]},{"type":"Buffer","data":[40,76,235,191,54,224,29,87,92,166,147,222,57,27,122,125]},{"type":"Buffer","data":[62,11,67,156,98,165,164,1,195,255,207,0,50,153,188,126]},{"type":"Buffer","data":[246,185,151,70,156,230,149,85,82,211,245,11,108,163,142,177]},{"type":"Buffer","data":[152,82,231,241,37,48,203,107,122,160,85,105,251,205,10,92]},{"type":"Buffer","data":[211,51,51,177,213,22,216,104,57,56,243,7,191,254,212,192]},{"type":"Buffer","data":[166,70,12,223,40,116,72,106,11,192,237,241,111,81,181,158]},{"type":"Buffer","data":[30,238,230,121,91,241,8,50,213,167,252,79,96,207,72,171]},{"type":"Buffer","data":[196,70,150,99,246,164,135,205,252,63,213,96,40,92,14,164]}]}

内容比较多使用 jq -c 列为一行了。将 json 再次编码后,得到的 bin 文件内容如下:

> pbjs query_md5.proto --encode query_md5 < resp.json > resp.bin
> xxd resp.bin
0000000: 0a08 0802 10c3 8040 1a00 1000 1a00       .......@......

从数据长度就能看出来,明显与第一次不一样。

初步分析

既然之前 pbjs 能成功的恢复 binary 数据,说明它本身的问题不大,复习下第一个消息的格式:

> cat msg.proto
message ProbeIpv6Request {
    string xxxxx     = 1;
    string xxxx      = 2;
    string xxxxxxxx  = 3;
    string xxxxxxx   = 4;
}

message V6AddrType {
    string addrV6 = 1;
    uint32 portV6 = 2;
}

message ProbeIpv6Response {
    string              xxxxx    = 1;
    V6AddrType          selfAddr = 2;
    repeated V6AddrType brosAddr = 3;
}

与出问题的消息区别主要在于:前者使用 string,后者使用 bytes。

bytes vs string

难道问题出在 bytes 类型上?尝试将第二个消息中的 bytes 替换为 string:

message common
{
    required uint32 mem1 = 1;
    required uint32 mem2 = 2;
    required string mem3 = 3;
    required uint32 mem4 = 4;
    required uint64 mem5 = 5;
    optional uint32 mem6 = 6;
    optional string mem7 = 7;
    optional uint32 mem8 = 8;
    optional uint64 mem9 = 9;
}

message query_md5
{
    required common mema = 1;
    required uint32 memb = 2;
    required string memc = 3;
    required uint32 memd = 4; 
    required uint64 meme = 5; 
    repeated string memf = 6; 
}

但愿 pbjs 对它这两种类型做了兼容,按 string 类型直接解析 binary 数据:

> pbjs query_md5.proto --decode query_md5 < tmp/resp.bin > resp.json
> cat resp.json
{
  "mema": {
    "mem1": 2,
    "mem2": 1048643,
    "mem3": "�8���z��\u0019g+���k\\",
    "mem4": 11,
    "mem5": {
      "low": 1695456564,
      "high": 11,
      "unsigned": true
    },
    "mem6": 0,
    "mem7": "2.2.101.27",
    "mem8": 3,
    "mem9": {
      "low": -1872613904,
      "high": 0,
      "unsigned": true
    }
  },
  "memb": 0,
  "memc": "g�\u0007!^G��%'-m��\u0002-",
  "memd": 0,
  "meme": {
    "low": 22177440,
    "high": 0,
    "unsigned": true
  },
  "memf": [
    "�[�&G\b��\u0001�K=�$8�",
    "1�D�/2\u001b�xek����`",
    "�u\u00175���ot����j��",
    "(L��6�\u001dW\\���9\u001bz}",
    ">\u000bC�b��\u0001���\u00002��~",
    "���F���UR��\u000bl���",
    "�R��%0�kz�Ui��\n\\",
    "�33��\u0016�h98�\u0007����",
    "�F\f�(tHj\u000b���oQ��",
    "\u001e��y[�\b2է�O`�H�",
    "�F�c�����?�`(\\\u000e�"
  ]
}

哈哈,居然解出来了,虽然 bytes 字段出现了乱码。如果原封不动的再 encode 回去,应该没问题吧?

> pbjs query_md5.proto --encode query_md5 < resp.json > resp.bin
> xxd resp.bin
0000000: 0a49 0802 10c3 8040 1a22 efbf bd38 efbf  .I.....@."...8..
0000010: bdef bfbd efbf bd7a efbf bdef bfbd 1967  .......z.......g
0000020: 2bef bfbd efbf bdef bfbd 6b5c 200b 28b4  +.........k\ .(.
0000030: baba a8b6 0130 003a 0a32 2e32 2e31 3031  .....0.:.2.2.101
0000040: 2e32 3740 0348 f0db 8883 0910 001a 1a67  .27@.H.........g
0000050: efbf bd07 215e 47ef bfbd efbf bd25 272d  ....!^G......%'-
0000060: 6def bfbd efbf bd02 2d20 0028 a0cd c90a  m.......- .(....
0000070: 321e efbf bd5b efbf bd26 4708 efbf bdef  2....[...&G.....
0000080: bfbd 01ef bfbd 4b3d efbf bd24 38ef bfbd  ......K=...$8...
0000090: 321e 31ef bfbd 44ef bfbd 2f32 1bef bfbd  2.1...D.../2....
00000a0: 7865 6bef bfbd efbf bdef bfbd efbf bd60  xek............`
00000b0: 3224 efbf bd75 1735 efbf bdef bfbd efbf  2$...u.5........
00000c0: bd6f 74ef bfbd efbf bdef bfbd efbf bd6a  .ot............j
00000d0: efbf bdef bfbd 321c 284c efbf bdef bfbd  ......2.(L......
00000e0: 36ef bfbd 1d57 5cef bfbd efbf bdef bfbd  6....W\.........
00000f0: 391b 7a7d 3220 3e0b 43ef bfbd 62ef bfbd  9.z}2 >.C...b...
0000100: efbf bd01 efbf bdef bfbd efbf bd00 32ef  ..............2.
0000110: bfbd efbf bd7e 3226 efbf bdef bfbd efbf  .....~2&........
0000120: bd46 efbf bdef bfbd efbf bd55 52ef bfbd  .F.........UR...
0000130: efbf bd0b 6cef bfbd efbf bdef bfbd 321e  ....l.........2.
0000140: efbf bd52 efbf bdef bfbd 2530 efbf bd6b  ...R......%0...k
0000150: 7aef bfbd 5569 efbf bdef bfbd 0a5c 3222  z...Ui.......\2"
0000160: efbf bd33 33ef bfbd efbf bd16 efbf bd68  ...33..........h
0000170: 3938 efbf bd07 efbf bdef bfbd efbf bdef  98..............
0000180: bfbd 321e efbf bd46 0cef bfbd 2874 486a  ..2....F....(tHj
0000190: 0bef bfbd efbf bdef bfbd 6f51 efbf bdef  ..........oQ....
00001a0: bfbd 321c 1eef bfbd efbf bd79 5bef bfbd  ..2........y[...
00001b0: 0832 d5a7 efbf bd4f 60ef bfbd 48ef bfbd  .2.....O`...H...
00001c0: 3222 efbf bd46 efbf bd63 efbf bdef bfbd  2"...F...c......
00001d0: efbf bdef bfbd efbf bd3f efbf bd60 285c  .........?...`(\
00001e0: 0eef bfbd                                ....

可以是可以,但还是和原始数据有很大差异:

这次是多了很多内容,给我的热情浇了一大盆冷水。抱着试试看的态度,将这个 binary 数据发给服务器,果然报错了:

{"error_code":196608,"error_msg":"fgid not find","request_id":3933672364}

看起来是解析 bytes 字段时失败了。

在我的场景中,使用 pbjs 主要就是根据 json 生成请求的 protobuf 数据并发送给服务器,从而得到 protobuf 响应,之后通过 pbjs 解析响应数据得到 json 数据,最后喂给 jq 来获取想要的各种信息。

如果这一步走不通,后面的就全阻塞了,即使在本地可以使用 string 类型来回转换数据。

json unicode

一开始怀疑 string 类型中一些字符没能成功转换为对应的二进制数据,以上例中的 memc 字段为例:

"memc":{"type":"Buffer","data":[103,198,7,33,94,71,174,137,37,39,45,109,160,246,2,45]}

转换后变为:

"memc": "g�\u0007!^G��%'-m��\u0002-",

一些乱码字符看起来很可疑,如何在 json 中表示一个字符的二进制形式?搜到了 json 中的 unicode 表达式 \u,它要求后面必需跟四位 hex 值,因此这里做了一些转换:

"memc": "\u0067\u00c6\u0007\u0021\u005e\u0047\u00ae\u0089\u0025\u0027\u002d\u006d\u00a0\u00f6\u0002\u002d",

将其它的几个 string 类型字段也如法炮制:

{
  "mema": {
    "mem1": 2,
    "mem2": 1048643,
    "mem3": "\u00ba\u0038\u00ba\u0093\u00af\u007a\u00da\u00e8\u0019\u0067\u002b\u0089\u00dd\u00d2\u006b\u005c",
    "mem4": 11,
    "mem5": {
      "low": 1695456564,
      "high": 11,
      "unsigned": true
    },
    "mem6": 0,
    "mem7": "2.2.101.27",
    "mem8": 3,
    "mem9": {
      "low": -1872613904,
      "high": 0,
      "unsigned": true
    }
  },
  "memb": 0,
  "memc": "\u0067\u00c6\u0007\u0021\u005e\u0047\u00ae\u0089\u0025\u0027\u002d\u006d\u00a0\u00f6\u0002\u002d",
  "memd": 0,
  "meme": {
    "low": 22177440,
    "high": 0,
    "unsigned": true
  },
  "memf": [
    "\u00d1\u005b\u00f3\u0026\u0047\u0008\u00bf\u00c7\u0001\u00e0\u004b\u003d\u00c6\u0024\u0038\u00a3",
    "\u0031\u0095\u0044\u00f3\u002f\u0032\u001b\u0096\u0078\u0065\u006b\u0082\u00fd\u00b8\u0095\u0060",
    "\u009a\u0075\u0017\u0035\u00fc\u00ca\u00e6\u006f\u0074\u0086\u00e9\u00fa\u00dc\u006a\u009f\u00ab",
    "\u0028\u004c\u00eb\u00bf\u0036\u00e0\u001d\u0057\u005c\u00a6\u0093\u00de\u0039\u001b\u007a\u007d",
    "\u003e\u000b\u0043\u009c\u0062\u00a5\u00a4\u0001\u00c3\u00ff\u00cf\u0000\u0032\u0099\u00bc\u007e",
    "\u00f6\u00b9\u0097\u0046\u009c\u00e6\u0095\u0055\u0052\u00d3\u00f5\u000b\u006c\u00a3\u008e\u00b1",
    "\u0098\u0052\u00e7\u00f1\u0025\u0030\u00cb\u006b\u007a\u00a0\u0055\u0069\u00fb\u00cd\u000a\u005c",
    "\u00d3\u0033\u0033\u00b1\u00d5\u0016\u00d8\u0068\u0039\u0038\u00f3\u0007\u00bf\u00fe\u00d4\u00c0",
    "\u00a6\u0046\u000c\u00df\u0028\u0074\u0048\u006a\u000b\u00c0\u00ed\u00f1\u006f\u0051\u00b5\u009e",
    "\u001e\u00ee\u00e6\u0079\u005b\u00f1\u0008\u0032\u00d5\u00a7\u00fc\u004f\u0060\u00cf\u0048\u00ab",
    "\u00c4\u0046\u0096\u0063\u00f6\u00a4\u0087\u00cd\u00fc\u003f\u00d5\u0060\u0028\u005c\u000e\u00a4"
  ]
}

使用 pbjs 编码新的 json 文件尝试:

> pbjs query_md5.proto --encode query_md5 < resp.uni.json > resp.uni.bin
> xxd resp.uni.bin
0000000: 0a40 0802 10c3 8040 1a19 c2ba 38c2 bac2  .@.....@....8...
0000010: 93c2 af7a c39a c3a8 1967 2bc2 89c3 9dc3  ...z.....g+.....
0000020: 926b 5c20 0b28 b4ba baa8 b601 3000 3a0a  .k\ .(......0.:.
0000030: 322e 322e 3130 312e 3237 4003 48f0 db88  2.2.101.27@.H...
0000040: 8309 1000 1a15 67c3 8607 215e 47c2 aec2  ......g...!^G...
0000050: 8925 272d 6dc2 a0c3 b602 2d20 0028 a0cd  .%'-m.....- .(..
0000060: c90a 3217 c391 5bc3 b326 4708 c2bf c387  ..2...[..&G.....
0000070: 01c3 a04b 3dc3 8624 38c2 a332 1731 c295  ...K=..$8..2.1..
0000080: 44c3 b32f 321b c296 7865 6bc2 82c3 bdc2  D../2...xek.....
0000090: b8c2 9560 321a c29a 7517 35c3 bcc3 8ac3  ...`2...u.5.....
00000a0: a66f 74c2 86c3 a9c3 bac3 9c6a c29f c2ab  .ot........j....
00000b0: 3216 284c c3ab c2bf 36c3 a01d 575c c2a6  2.(L....6...W\..
00000c0: c293 c39e 391b 7a7d 3218 3e0b 43c2 9c62  ....9.z}2.>.C..b
00000d0: c2a5 c2a4 01c3 83c3 bfc3 8f00 32c2 99c2  ............2...
00000e0: bc7e 321b c3b6 c2b9 c297 46c2 9cc3 a6c2  .~2.......F.....
00000f0: 9555 52c3 93c3 b50b 6cc2 a3c2 8ec2 b132  .UR.....l......2
0000100: 17c2 9852 c3a7 c3b1 2530 c38b 6b7a c2a0  ...R....%0..kz..
0000110: 5569 c3bb c38d 0a5c 3219 c393 3333 c2b1  Ui.....\2...33..
0000120: c395 16c3 9868 3938 c3b3 07c2 bfc3 bec3  .....h98........
0000130: 94c3 8032 17c2 a646 0cc3 9f28 7448 6a0b  ...2...F...(tHj.
0000140: c380 c3ad c3b1 6f51 c2b5 c29e 3218 1ec3  ......oQ....2...
0000150: aec3 a679 5bc3 b108 32c3 95c2 a7c3 bc4f  ...y[...2......O
0000160: 60c3 8f48 c2ab 3219 c384 46c2 9663 c3b6  `..H..2...F..c..
0000170: c2a4 c287 c38d c3bc 3fc3 9560 285c 0ec2  ........?..`(\..
0000180: a4                                       .

新版本看起来比之前有一些变化:

缩短了一些,然而服务器仍然报相同的错误。

事实证明这个方案不可行,使用 string 类型替换 bytes 类型这个方向走到头儿了。

解决方案

既然必需使用 bytes 类型,而 pbjs 又有问题,那有没有其它转换工具呢?

protobufjs

一般的 pbjs help 输出如下:

> pbjs
Usage: pbjs [options] <schema_path>

Options:
  -V, --version        output the version number
  --es5 <js_path>      Generate ES5 JavaScript code
  --es6 <js_path>      Generate ES6 JavaScript code
  --ts <ts_path>       Generate TypeScript code
  --decode <msg_type>  Decode standard input to JSON
  --encode <msg_type>  Encode standard input to JSON
  -h, --help           output usage information

无意间我的 pbjs 输出了下面的信息:

> pbjs
protobuf.js v1.1.2 CLI for JavaScript

Translates between file formats and generates static code.

  -t, --target     Specifies the target format. Also accepts a path to require a custom target.

                   json          JSON representation
                   json-module   JSON representation as a module
                   proto2        Protocol Buffers, Version 2
                   proto3        Protocol Buffers, Version 3
                   static        Static code without reflection (non-functional on its own)
                   static-module Static code without reflection as a module

  -p, --path       Adds a directory to the include path.

  --filter         Set up a filter to configure only those messages you need and their dependencies to compile, this will effectively reduce the final file size
                   Set A json file path, Example of file content: {"messageNames":["mypackage.messageName1", "messageName2"] }

  -o, --out        Saves to a file instead of writing to stdout.

  --sparse         Exports only those types referenced from a main file (experimental).

  Module targets only:

  -w, --wrap       Specifies the wrapper to use. Also accepts a path to require a custom wrapper.

                   default   Default wrapper supporting both CommonJS and AMD
                   commonjs  CommonJS wrapper
                   amd       AMD wrapper
                   es6       ES6 wrapper (implies --es6)
                   closure   A closure adding to protobuf.roots where protobuf is a global

  --dependency     Specifies which version of protobuf to require. Accepts any valid module id

  -r, --root       Specifies an alternative protobuf.roots name.

  -l, --lint       Linter configuration. Defaults to protobuf.js-compatible rules:

                   eslint-disable block-scoped-var, id-length, no-control-regex, no-magic-numbers, no-prototype-builtins, no-redeclare, no-shadow, no-var, sort-vars

  --es6            Enables ES6 syntax (const/let instead of var)

  Proto sources only:

  --keep-case      Keeps field casing instead of converting to camel case.
  --alt-comment    Turns on an alternate comment parsing mode that preserves more comments.

  Static targets only:

  --no-create      Does not generate create functions used for reflection compatibility.
  --no-encode      Does not generate encode functions.
  --no-decode      Does not generate decode functions.
  --no-verify      Does not generate verify functions.
  --no-convert     Does not generate convert functions like from/toObject
  --no-delimited   Does not generate delimited encode/decode functions.
  --no-typeurl     Does not generate getTypeUrl function.
  --no-beautify    Does not beautify generated code.
  --no-comments    Does not output any JSDoc comments.
  --no-service     Does not output service classes.

  --force-long     Enforces the use of 'Long' for s-/u-/int64 and s-/fixed64 fields.
  --force-number   Enforces the use of 'number' for s-/u-/int64 and s-/fixed64 fields.
  --force-message  Enforces the use of message instances instead of plain objects.

  --null-defaults  Default value for optional fields is null instead of zero value.

usage: pbjs [options] file1.proto file2.json ...  (or pipe)  other | pbjs [options] -

原来有两个 pbjs,一个是 npm install pbjs 所得,一个是 npm install protobufjs[-cli] 所得,后者是用来生成处理 protobuf 数据的 javascript 代码的。

如果先安装了一个,另外一个就会报错:

$ sudo npm install protobufjs -g
npm ERR! code EEXIST
npm ERR! path /usr/local/bin/pbjs
npm ERR! EEXIST: file already exists
npm ERR! File exists: /usr/local/bin/pbjs
npm ERR! Remove the existing file and try again, or run npm
npm ERR! with --force to overwrite files recklessly.

npm ERR! A complete log of this run can be found in:
npm ERR!     /root/.npm/_logs/2023-09-24T03_19_13_647Z-debug-0.log

需要卸载之前安装的才行。网上搜索 pbjs 关键字,有的讲的是第一种,有的讲的是第二种,原因就是安装的包不同,千万不要将这二者混为一谈。

有一种方法可以同时保有两者,就是将另外一个安装在本地:

> npm install protobufjs-cli

added 84 packages in 2m
> ls node_modules/
acorn           brace-expansion  entities              esutils           inherits      lodash              minimatch         protobufjs      strip-json-comments  underscore
acorn-jsx       catharsis        escape-string-regexp  fast-levenshtein  js2xmlparser  long                minimist          @protobufjs     supports-color       word-wrap
ansi-styles     chalk            escodegen             fs.realpath       jsdoc         lru-cache           mkdirp            protobufjs-cli  tmp                  wrappy
argparse        color-convert    eslint-visitor-keys   glob              @jsdoc        markdown-it         once              requizzle       type-check           xmlcreate
@babel          color-name       espree                graceful-fs       klaw          markdown-it-anchor  optionator        rimraf          @types               yallist
balanced-match  concat-map       esprima               has-flag          levn          marked              path-is-absolute  semver          uc.micro
bluebird        deep-is          estraverse            inflight          linkify-it    mdurl               prelude-ls        source-map      uglify-js
> find . -type f -name "pbjs"
./node_modules/protobufjs-cli/bin/pbjs
> ./node_modules/protobufjs-cli/bin/pbjs
protobuf.js v1.1.2 CLI for JavaScript

Translates between file formats and generates static code.
......
usage: pbjs [options] file1.proto file2.json ...  (or pipe)  other | pbjs [options] -

缺点是只能用下面的方式引用了:

> ./node_modules/protobufjs-cli/bin/pbjs

关于 protobufjs,主要关注它将 proto 消息转换为 json 描述的格式以便 js 代码直接使用:

> ./node_modules/protobufjs-cli/bin/pbjs -t json query_md5.proto > query_md5.json
> cat query_md5.json
{{
  "nested": {
    "common": {
      "fields": {
        "mem1": {
          "rule": "required",
          "type": "uint32",
          "id": 1
        },
        "mem2": {
          "rule": "required",
          "type": "uint32",
          "id": 2
        },
        "mem3": {
          "rule": "required",
          "type": "bytes",
          "id": 3
        },
        "mem4": {
          "rule": "required",
          "type": "uint32",
          "id": 4
        },
        "mem5": {
          "rule": "required",
          "type": "uint64",
          "id": 5
        },
        "mem6": {
          "type": "uint32",
          "id": 6
        },
        "mem7": {
          "type": "bytes",
          "id": 7
        },
        "mem8": {
          "type": "uint32",
          "id": 8
        },
        "mem9": {
          "type": "uint64",
          "id": 9
        }
      }
    },
    "query_md5": {
      "fields": {
        "mema": {
          "rule": "required",
          "type": "common",
          "id": 1
        },
        "memb": {
          "rule": "required",
          "type": "uint32",
          "id": 2
        },
        "memc": {
          "rule": "required",
          "type": "bytes",
          "id": 3
        },
        "memd": {
          "rule": "required",
          "type": "uint32",
          "id": 4
        },
        "meme": {
          "rule": "required",
          "type": "uint64",
          "id": 5
        },
        "memf": {
          "rule": "repeated",
          "type": "bytes",
          "id": 6
        }
      }
    }
  }

稍后会用到。

javascript

无论是 protobufjs 还是 pbjs,都可以根据 proto 文件生成 javascript 代码,回顾 pbjs 的帮助信息:

> pbjs
Usage: pbjs [options] <schema_path>

Options:
  -V, --version        output the version number
  --es5 <js_path>      Generate ES5 JavaScript code
  --es6 <js_path>      Generate ES6 JavaScript code
  --ts <ts_path>       Generate TypeScript code
  --decode <msg_type>  Decode standard input to JSON
  --encode <msg_type>  Encode standard input to JSON
  -h, --help           output usage information

主要是通过 --es5/6 选项来实现,protobufjs 也有类似选项,这里出于描述方便,统一使用 pbjs 说明。

通过运行 js 代码来将 binary 数据转换为 json,也不失为一种解决方案。参考网上的帖子,得到下面的 js 代码:

let pbroot = require("protobufjs").Root;
let json = require("./query_md5.json");
let root = pbroot.fromJSON(json);
// console.log (root);

var fs = require('fs');
fs.readFile('./tmp/resp.bin', function (err, data) {
    if (err) {
        console.log(err);
    } else {
        console.log(data);
        console.log(data.length + ' bytes');

        let Message = root.lookupType("query_md5");
        try{
            let message = Message.decode(data);
            console.log(message);
        }catch(e){
            console.log(e);
        }
    }
});

注意第 2 行中的 query_md5.json 文件就是上一节中通过 protobufjs 生成的。对上面的代码做个简单说明:

  • 加载 query_md5.json 中定义的 proto 类型 (query_md5)
  • 读取 binary 数据 (tmp/resp.bin) 并进行解析
  • 输出解析结果

运行 js 代码得到下面的输出:

> node index.js
<Buffer 0a 37 08 02 10 c3 80 40 1a 10 ba 38 ba 93 af 7a da e8 19 67 2b 89 dd d2 6b 5c 20 0b 28 b4 ba ba a8 b6 01 30 00 3a 0a 32 2e 32 2e 31 30 31 2e 32 37 40 ... 232 more bytes>
282 bytes
query_md5 {
  memf: [
    <Buffer d1 5b f3 26 47 08 bf c7 01 e0 4b 3d c6 24 38 a3>,
    <Buffer 31 95 44 f3 2f 32 1b 96 78 65 6b 82 fd b8 95 60>,
    <Buffer 9a 75 17 35 fc ca e6 6f 74 86 e9 fa dc 6a 9f ab>,
    <Buffer 28 4c eb bf 36 e0 1d 57 5c a6 93 de 39 1b 7a 7d>,
    <Buffer 3e 0b 43 9c 62 a5 a4 01 c3 ff cf 00 32 99 bc 7e>,
    <Buffer f6 b9 97 46 9c e6 95 55 52 d3 f5 0b 6c a3 8e b1>,
    <Buffer 98 52 e7 f1 25 30 cb 6b 7a a0 55 69 fb cd 0a 5c>,
    <Buffer d3 33 33 b1 d5 16 d8 68 39 38 f3 07 bf fe d4 c0>,
    <Buffer a6 46 0c df 28 74 48 6a 0b c0 ed f1 6f 51 b5 9e>,
    <Buffer 1e ee e6 79 5b f1 08 32 d5 a7 fc 4f 60 cf 48 ab>,
    <Buffer c4 46 96 63 f6 a4 87 cd fc 3f d5 60 28 5c 0e a4>
  ],
  mema: common {
    mem1: 2,
    mem2: 1048643,
    mem3: <Buffer ba 38 ba 93 af 7a da e8 19 67 2b 89 dd d2 6b 5c>,
    mem4: 11,
    mem5: Long { low: 1695456564, high: 11, unsigned: true },
    mem6: 0,
    mem7: <Buffer 32 2e 32 2e 31 30 31 2e 32 37>,
    mem8: 3,
    mem9: Long { low: -1872613904, high: 0, unsigned: true }
  },
  memb: 0,
  memc: <Buffer 67 c6 07 21 5e 47 ae 89 25 27 2d 6d a0 f6 02 2d>,
  memd: 0,
  meme: Long { low: 22177440, high: 0, unsigned: true }
}
<Buffer 0a 37 08 02 10 c3 80 40 1a 10 ba 38 ba 93 af 7a da e8 19 67 2b 89 dd d2 6b 5c 20 0b 28 b4 ba ba a8 b6 01 30 00 3a 0a 32 2e 32 2e 31 30 31 2e 32 37 40 ... 232 more bytes>

能正确的解析 binary 数据。对代码稍加改动:

...
            let buffer= Message.encode(Message.create(message)).finish();
            console.log (buffer);
            fs.writeFile('./resp.bin', buffer, function (err) {
                if (err) {
                    console.log(err);
                } else {
                    console.log('success');
                }
            });
...

将解析后的数据 (message) 再编码为二进制 (buffer) 并输出到文件 (resp.bin):

...
<Buffer 0a 37 08 02 10 c3 80 40 1a 10 ba 38 ba 93 af 7a da e8 19 67 2b 89 dd d2 6b 5c 20 0b 28 b4 ba ba a8 b6 01 30 00 3a 0a 32 2e 32 2e 31 30 31 2e 32 37 40 ... 52 more bytes>
success
> xxd resp.bin
0000000: 0a37 0802 10c3 8040 1a10 ba38 ba93 af7a  .7.....@...8...z
0000010: dae8 1967 2b89 ddd2 6b5c 200b 28b4 baba  ...g+...k\ .(...
0000020: a8b6 0130 003a 0a32 2e32 2e31 3031 2e32  ...0.:.2.2.101.2
0000030: 3740 0348 f0db 8883 0910 001a 1067 c607  7@.H.........g..
0000040: 215e 47ae 8925 272d 6da0 f602 2d20 0028  !^G..%'-m...- .(
0000050: a0cd c90a 3210 d15b f326 4708 bfc7 01e0  ....2..[.&G.....
0000060: 4b3d c624 38a3 3210 3195 44f3 2f32 1b96  K=.$8.2.1.D./2..
0000070: 7865 6b82 fdb8 9560 3210 9a75 1735 fcca  xek....`2..u.5..
0000080: e66f 7486 e9fa dc6a 9fab 3210 284c ebbf  .ot....j..2.(L..
0000090: 36e0 1d57 5ca6 93de 391b 7a7d 3210 3e0b  6..W\...9.z}2.>.
00000a0: 439c 62a5 a401 c3ff cf00 3299 bc7e 3210  C.b.......2..~2.
00000b0: f6b9 9746 9ce6 9555 52d3 f50b 6ca3 8eb1  ...F...UR...l...
00000c0: 3210 9852 e7f1 2530 cb6b 7aa0 5569 fbcd  2..R..%0.kz.Ui..
00000d0: 0a5c 3210 d333 33b1 d516 d868 3938 f307  .\2..33....h98..
00000e0: bffe d4c0 3210 a646 0cdf 2874 486a 0bc0  ....2..F..(tHj..
00000f0: edf1 6f51 b59e 3210 1eee e679 5bf1 0832  ..oQ..2....y[..2
0000100: d5a7 fc4f 60cf 48ab 3210 c446 9663 f6a4  ...O`.H.2..F.c..
0000110: 87cd fc3f d560 285c 0ea4                 ...?.`(\..

与原始数据做个对比:

完全一致!看起来这种方法可行,只是有些麻烦。

protoc

说到通过 proto 文件编解码二进制数据,最拿手的就不应该是 protobuf 自带的 protoc 工具吗?

$ ./protoc --help
Usage: ./protoc [OPTION] PROTO_FILES
Parse PROTO_FILES and generate output based on the options given:
  -IPATH, --proto_path=PATH   Specify the directory in which to search for
                              imports.  May be specified multiple times;
                              directories will be searched in order.  If not
                              given, the current working directory is used.
  --version                   Show version info and exit.
  -h, --help                  Show this text and exit.
  --encode=MESSAGE_TYPE       Read a text-format message of the given type
                              from standard input and write it in binary
                              to standard output.  The message type must
                              be defined in PROTO_FILES or their imports.
  --decode=MESSAGE_TYPE       Read a binary message of the given type from
                              standard input and write it in text format
                              to standard output.  The message type must
                              be defined in PROTO_FILES or their imports.
  --decode_raw                Read an arbitrary protocol message from
                              standard input and write the raw tag/value
                              pairs in text format to standard output.  No
                              PROTO_FILES should be given when using this
                              flag.
  -oFILE,                     Writes a FileDescriptorSet (a protocol buffer,
    --descriptor_set_out=FILE defined in descriptor.proto) containing all of
                              the input files to FILE.
  --include_imports           When using --descriptor_set_out, also include
                              all dependencies of the input files in the
                              set, so that the set is self-contained.
  --include_source_info       When using --descriptor_set_out, do not strip
                              SourceCodeInfo from the FileDescriptorProto.
                              This results in vastly larger descriptors that
                              include information about the original
                              location of each decl in the source file as
                              well as surrounding comments.
  --dependency_out=FILE       Write a dependency output file in the format
                              expected by make. This writes the transitive
                              set of input file paths to FILE
  --error_format=FORMAT       Set the format in which to print errors.
                              FORMAT may be 'gcc' (the default) or 'msvs'
                              (Microsoft Visual Studio format).
  --print_free_field_numbers  Print the free field numbers of the messages
                              defined in the given proto files. Groups share
                              the same field number space with the parent
                              message. Extension ranges are counted as
                              occupied fields numbers.

  --plugin=EXECUTABLE         Specifies a plugin executable to use.
                              Normally, protoc searches the PATH for
                              plugins, but you may specify additional
                              executables not in the path using this flag.
                              Additionally, EXECUTABLE may be of the form
                              NAME=PATH, in which case the given plugin name
                              is mapped to the given executable even if
                              the executable's own name differs.
  --cpp_out=OUT_DIR           Generate C++ header and source.
  --csharp_out=OUT_DIR        Generate C# source file.
  --java_out=OUT_DIR          Generate Java source file.
  --javanano_out=OUT_DIR      Generate Java Nano source file.
  --js_out=OUT_DIR            Generate JavaScript source.
  --objc_out=OUT_DIR          Generate Objective C header and source.
  --php_out=OUT_DIR           Generate PHP source file.
  --python_out=OUT_DIR        Generate Python source file.
  --ruby_out=OUT_DIR          Generate Ruby source file.

说干就干:

> ./protoc --decode=query_md5 query_md5.proto < tmp/resp.bin > resp.pb
[libprotobuf WARNING ../../src/google/protobuf/compiler/parser.cc:546] No syntax specified for the proto file: query_md5.proto. Please use 'syntax = "proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2 syntax.)
> cat resp.pb
mema {
  mem1: 2
  mem2: 1048643
  mem3: "\2728\272\223\257z\332\350\031g+\211\335\322k\\"
  mem4: 11
  mem5: 48940096820
  mem6: 0
  mem7: "2.2.101.27"
  mem8: 3
  mem9: 2422353392
}
memb: 0
memc: "g\306\007!^G\256\211%\'-m\240\366\002-"
memd: 0
meme: 22177440
memf: "\321[\363&G\010\277\307\001\340K=\306$8\243"
memf: "1\225D\363/2\033\226xek\202\375\270\225`"
memf: "\232u\0275\374\312\346ot\206\351\372\334j\237\253"
memf: "(L\353\2776\340\035W\\\246\223\3369\033z}"
memf: ">\013C\234b\245\244\001\303\377\317\0002\231\274~"
memf: "\366\271\227F\234\346\225UR\323\365\013l\243\216\261"
memf: "\230R\347\361%0\313kz\240Ui\373\315\n\\"
memf: "\32333\261\325\026\330h98\363\007\277\376\324\300"
memf: "\246F\014\337(tHj\013\300\355\361oQ\265\236"
memf: "\036\356\346y[\361\0102\325\247\374O`\317H\253"
memf: "\304F\226c\366\244\207\315\374?\325`(\\\016\244"

生成的文件并非 json 格式,属于 protobuf 定义的一种通用文本格式。将它原封不动的 encode 回去:

> ./protoc --encode=query_md5 query_md5.proto < resp.pb > resp.bin
[libprotobuf WARNING ../../src/google/protobuf/compiler/parser.cc:546] No syntax specified for the proto file: query_md5.proto. Please use 'syntax = "proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2 syntax.)
> xxd resp.bin
0000000: 0a37 0802 10c3 8040 1a10 ba38 ba93 af7a  .7.....@...8...z
0000010: dae8 1967 2b89 ddd2 6b5c 200b 28b4 baba  ...g+...k\ .(...
0000020: a8b6 0130 003a 0a32 2e32 2e31 3031 2e32  ...0.:.2.2.101.2
0000030: 3740 0348 f0db 8883 0910 001a 1067 c607  7@.H.........g..
0000040: 215e 47ae 8925 272d 6da0 f602 2d20 0028  !^G..%'-m...- .(
0000050: a0cd c90a 3210 d15b f326 4708 bfc7 01e0  ....2..[.&G.....
0000060: 4b3d c624 38a3 3210 3195 44f3 2f32 1b96  K=.$8.2.1.D./2..
0000070: 7865 6b82 fdb8 9560 3210 9a75 1735 fcca  xek....`2..u.5..
0000080: e66f 7486 e9fa dc6a 9fab 3210 284c ebbf  .ot....j..2.(L..
0000090: 36e0 1d57 5ca6 93de 391b 7a7d 3210 3e0b  6..W\...9.z}2.>.
00000a0: 439c 62a5 a401 c3ff cf00 3299 bc7e 3210  C.b.......2..~2.
00000b0: f6b9 9746 9ce6 9555 52d3 f50b 6ca3 8eb1  ...F...UR...l...
00000c0: 3210 9852 e7f1 2530 cb6b 7aa0 5569 fbcd  2..R..%0.kz.Ui..
00000d0: 0a5c 3210 d333 33b1 d516 d868 3938 f307  .\2..33....h98..
00000e0: bffe d4c0 3210 a646 0cdf 2874 486a 0bc0  ....2..F..(tHj..
00000f0: edf1 6f51 b59e 3210 1eee e679 5bf1 0832  ..oQ..2....y[..2
0000100: d5a7 fc4f 60cf 48ab 3210 c446 9663 f6a4  ...O`.H.2..F.c..
0000110: 87cd fc3f d560 285c 0ea4                 ...?.`(\..

与原始数据做个对比:

也能对得上!不过这种方案的缺点是 pb 文件不能交给 jq 命令处理,后期集成时工作量会大不少。

问题根因

标准的 pbjs 命令其实是一个链接:

> which pbjs
/usr/local/bin/pbjs
> ls -lh /usr/local/bin/pbjs
lrwxrwxrwx 1 root root 31 Sep 24 11:33 /usr/local/bin/pbjs -> ../lib/node_modules/pbjs/cli.js
> ls /usr/local/lib/node_modules/pbjs/
cli.js                        index.d.ts                    node_modules/                 test.js                       test.proto.js
cli.ts                        index.js                      package.json                  test.proto                    test.proto.ts
generate.js                   index.ts                      protocol-buffers-schema.d.ts  test.proto.es5.js             test.ts
generate.ts                   LICENSE.md                    README.md                     test.proto.es6.js             tsconfig.json

对应的是 cli.js 文件,出于好奇,查看了一下它是如何处理 bytes 类型的 encode 的,这主要位于 generate.js 文件:

function encodeValue(name, buffer, value, nested = 'nested') {
    let type;
    let write;
    switch (name) {
        case 'bool':
            type = TYPE_VAR_INT;
            write = [`writeByte(${buffer}, ${value} ? 1 : 0)`];
            break;
        case 'bytes':
            type = TYPE_SIZE_N;
            write = [`writeVarint32(${buffer}, ${value}.length), writeBytes(${buffer}, ${value})`];
            break;
        case 'int32':
            type = TYPE_VAR_INT;
            write = [`writeVarint64(${buffer}, intToLong(${value}))`];
            break;
        case 'int64':
            type = TYPE_VAR_INT;
            write = [`writeVarint64(${buffer}, ${value})`];
            break;
        case 'string':
            type = TYPE_SIZE_N;
            write = [`writeString(${buffer}, ${value})`];
            break;
        ....
    }
    return { type, write };
}

为了突出重点代码有删减。对比 bytes 类型与其它类型,发现它会首先 encode 一个数组的长度,然后才是数组内容。

数组内容的写入是由一个 writeBytes 的例程负责的:

lines.push(`function writeBytes(bb${ts('ByteBuffer')}, buffer${ts('Uint8Array')})${ts('void')} {`);
lines.push(`  ${varOrLet} offset = grow(bb, buffer.length);`);
lines.push(`  bb.bytes.set(buffer, offset);`);
lines.push(`}`);

看它的实现,首先增长底层缓存区以确保可以容纳数组,然后一整个写入进去。

还记得 pbjs decode 二进制数据后的形式吗?这里回顾一下:

"mem3":{"type":"Buffer","data":[186,56,186,147,175,122,218,232,25,103,43,137,221,210,107,92]},

数据是包在一个 object 里的,而这里要求的是直接的数组类型,会不会是这一步出现了匹配问题?

将 pbjs 反解二进制数据得到的 json 稍加修改,去掉包在 bytes 数据外面的对象:

> jq -c '.' resp.json
{"mema":{"mem1":2,"mem2":1048643,"mem3":[186,56,186,147,175,122,218,232,25,103,43,137,221,210,107,92],"mem4":11,"mem5":{"low":1695456564,"high":11,"unsigned":true},"mem6":0,"mem7":[50,46,50,46,49,48,49,46,50,55],"mem8":3,"mem9":{"low":-1872613904,"high":0,"unsigned":true}},"memb":0,"memc":[103,198,7,33,94,71,174,137,37,39,45,109,160,246,2,45],"memd":0,"meme":{"low":22177440,"high":0,"unsigned":true},"memf":[[209,91,243,38,71,8,191,199,1,224,75,61,198,36,56,163],[49,149,68,243,47,50,27,150,120,101,107,130,253,184,149,96],[154,117,23,53,252,202,230,111,116,134,233,250,220,106,159,171],[40,76,235,191,54,224,29,87,92,166,147,222,57,27,122,125],[62,11,67,156,98,165,164,1,195,255,207,0,50,153,188,126],[246,185,151,70,156,230,149,85,82,211,245,11,108,163,142,177],[152,82,231,241,37,48,203,107,122,160,85,105,251,205,10,92],[211,51,51,177,213,22,216,104,57,56,243,7,191,254,212,192],[166,70,12,223,40,116,72,106,11,192,237,241,111,81,181,158],[30,238,230,121,91,241,8,50,213,167,252,79,96,207,72,171],[196,70,150,99,246,164,135,205,252,63,213,96,40,92,14,164]]}

再对这个 json 进行编码:

> pbjs query_md5.proto --encode query_md5 < resp.json > resp.bin
> xxd resp.bin
0000000: 0a37 0802 10c3 8040 1a10 ba38 ba93 af7a  .7.....@...8...z
0000010: dae8 1967 2b89 ddd2 6b5c 200b 28b4 baba  ...g+...k\ .(...
0000020: a8b6 0130 003a 0a32 2e32 2e31 3031 2e32  ...0.:.2.2.101.2
0000030: 3740 0348 f0db 8883 0910 001a 1067 c607  7@.H.........g..
0000040: 215e 47ae 8925 272d 6da0 f602 2d20 0028  !^G..%'-m...- .(
0000050: a0cd c90a 3210 d15b f326 4708 bfc7 01e0  ....2..[.&G.....
0000060: 4b3d c624 38a3 3210 3195 44f3 2f32 1b96  K=.$8.2.1.D./2..
0000070: 7865 6b82 fdb8 9560 3210 9a75 1735 fcca  xek....`2..u.5..
0000080: e66f 7486 e9fa dc6a 9fab 3210 284c ebbf  .ot....j..2.(L..
0000090: 36e0 1d57 5ca6 93de 391b 7a7d 3210 3e0b  6..W\...9.z}2.>.
00000a0: 439c 62a5 a401 c3ff cf00 3299 bc7e 3210  C.b.......2..~2.
00000b0: f6b9 9746 9ce6 9555 52d3 f50b 6ca3 8eb1  ...F...UR...l...
00000c0: 3210 9852 e7f1 2530 cb6b 7aa0 5569 fbcd  2..R..%0.kz.Ui..
00000d0: 0a5c 3210 d333 33b1 d516 d868 3938 f307  .\2..33....h98..
00000e0: bffe d4c0 3210 a646 0cdf 2874 486a 0bc0  ....2..F..(tHj..
00000f0: edf1 6f51 b59e 3210 1eee e679 5bf1 0832  ..oQ..2....y[..2
0000100: d5a7 fc4f 60cf 48ab 3210 c446 9663 f6a4  ...O`.H.2..F.c..
0000110: 87cd fc3f d560 285c 0ea4                 ...?.`(\..

看起来有戏!与原始数据做个对比:

完全一致!

结语

本文记叙了 protobuf 的 js 工具 pbjs 在遇到 bytes 类型时编解码方面的一些问题,通过几次尝试最终找到了三种解决方案:

  • 使用 pbjs & protobufjs 生成 js 代码将 json 编码为二进制数据
  • 使用 protoc 编码 pb 文本为二进制数据
  • 修改解码后的 json,去掉 bytes 数组外包的 object 层,使用 pbjs 编码修改后的 json 为二进制数据

方案 I 稍微复杂一点;方案 II 的 pb 文本不通用,特别是不能传递给下游 jq 做事先处理;方案 III 兼顾了便利性与兼容性,是最优解。

特别是修改 json 去掉 objet 包裹层这一工作,对于 jq 来说就是手到擒来:

local req=$(jq -c ".mema.mem3=${mem3}|.mema.mem4=${mem4}|.mema.mem5.low=${mem5_lo}|.mema.mem5.high=${mem5_hi}|.mema.mem7=${mem7}|.mema.mem8=${mem8}|.mema.mem9.low=${mem9_lo}|.mema.mem9.high=${mem9_hi}|.memc=${memc}" query_md5.json)

jq 首先读取原始 json (resp.json),然后通过层级管道对各个字段进行赋值 (json 只是一个模板,没有请求需要的数据),在赋值过程中,对于 bytes 类型,通过直接设置以下形式的值:

[186,56,186,147,175,122,218,232,25,103,43,137,221,210,107,92]

来将默认的 object 替换为字节数组。jq 变量的方式也能替换值,但是在更改字段类型时遇到了一些困难,像下面这样:

local req=$(jq --arg mm3 "[${mem3}]" --arg mm4 "${mem4}" --arg mm5h "${mem5_hi}" --arg mm5o "${mem5_lo}" --arg mm7 "[${mem7}]" --arg mm8 "${mem8}" --arg mm9 "${mem9}" --arg mmc "[${memc}]"  -c '{ mema: { mem1 : .mema.mem1, mem2: .mema.mem2, mem3: $mm3, mem4: $mm4, mem5: { low: $mm5_lo, high: $mm5_hi, unsigned: true }, mem6: .mema.mem6, mem7: $mm7, mem8: $mm8, mem9: { low: $mm9, high: 0, unsigned: true } }, memb: .memb, memc: $mmc, memd: .memd, meme: .meme, memf: .memf }' query_md5.json)

更新后的 json 会变成这样:

req={"mema":{"mem1":2,"mem2":1048642,"mem3":"[186,56,186,147,175,122,218,232,25,103,43,137,221,210,107,92]","mem4":"11","mem5":{"low":"1695625406","high":"11","unsigned":true},"mem6":0,"mem7":"2.2.101.27","mem8":"3","mem9":{"low":"2422353392","high":0,"unsigned":true}},"memb":0,"memc":"[103,198,7,33,94,71,174,137,37,39,45,109,160,246,2,45]","memd":"0",...}

发现所有字节数组外面都套了双引号变字符串了!再加上这种方式比较繁琐,就不推荐了。

后记

根因定位的过程有一些潦草了,记得当时确实是看到了相关可疑的点,不过后来复盘的时候,却怎么也回忆不起来是哪里引发了怀疑,所以就将就看吧,哈哈。

现在回过头来看,这应该是 pbjs 的一个 bug,在将 Uint8Array 解码时,使用了 wrapper 类直接写入,导致有 object 层包裹,而在编码时又只接收纯 bytes 数组,最终导致数据匹配不上没有编入二进制结果中。

如果仅使用 pbjs 生成的 js/ts 代码,应该不受影响,甚至直接使用 protoc 生成 pb 文件也是正常的,只在使用 pbjs 将二进制数据和 json 之间转换时才会出现上面问题,希望 pbjs 的作者能早日修复这个问题。

参考

[1]. JSON 序列化中的转义和 Unicode 编码

[2]. protobufjs

[3]. node.js读本地文件

[4]. 当creator遇上protobufjs—起步

posted @ 2023-09-25 15:12  goodcitizen  阅读(722)  评论(0编辑  收藏  举报