Protobuf for the future

Modern distributed systems often exchange information of length of 32, 64, 128, 256 and 512 bits due to heavy use of various cryptographic primitives. Because those distributed systems are often used on mobile devices with limited networking and computational resources they have a need for a data encoding system which is both space-efficient and delivers reasonable encoding and decoding performance. One of such widely-used encoding systems is Protobuf.

Protobuf uses Type–length–value encoding with a tag of a length of one byte assigning meaning to bits in the following way:

  • 3 bits encode the wire type
  • 4 bits encode the field number
  • 1 bit is the continuation marker

Therefore the tag of length of one byte can encode:

  • 16 field numbers
  • 8 wire types

Using the current iteration of Protobuf wire types we can construct the following table listing message size required to encode fields which hold values of specific lengths. In the table we display message lengths for one occurence of a (repeated) field and 100 occurences of a repeated field. We assume packed encoding.

Length of value Number of values Message length for 1 occurence Message length for 100 occurences
32 bits 1 5 bytes 402 bytes
64 bits 1 9 bytes 802 bytes
128 bits 1 18 bytes 1800 bytes
256 bits 1 34 bytes 3400 bytes
512 bits 1 66 bytes 6600 bytes

We can conceive the following revision to the Protobuf wire type table. This table is designed to more efficiently encode values of lengths of 128, 256 and 512 bits.

ID Name
0 VARINT
1 LEN
2 I32
3 I64
4 I128
5 I256
6 I512

With the aforementioned revision in mind we can revisit the table listing message sizes. This time I annotated them with percentages indicating relative message size in comparison to the previous table.

Length of value Number of values Message length for 1 occurence Message length for 100 occurences
32 bits 1 5 bytes (100%) 402 bytes (100%)
64 bits 1 9 bytes (100%) 802 bytes (100%)
128 bits 1 17 bytes (94%) 1602 bytes (89%)
256 bits 1 33 bytes (97%) 3202 bytes (94%)
512 bits 1 65 bytes (98%) 6402 bytes (97%)

The new wire type table reduces the size of messages which hold values which are 128 bits, 256 bits or 512 bits in size. While the savings are not ground breaking I belive that they are also not completely insignificant.

2023-03-10