The "1 MB message size rule of thumb" in the docs

37 views
Skip to first unread message

Itamar Katz

unread,
Aug 12, 2025, 9:57:53 AM Aug 12
to Protocol Buffers
I read this message:  https://groups.google.com/g/protobuf/c/f-mKyzyeySI/m/Yzn0yywKAgAJ , but I wonder what is the current best-practice.

The protocol buffer docs say:
" Protocol Buffers are not designed to handle large messages. As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy."

Does this refer to a message containing a single data-block (as opposed to a repeated message), or to any message defined in the schema?
For example, if my schema is:
message Location {
  float x = 1;
  float y = 2;
  float z = 3;
}
message LocationArray {
  repeated Location row = 1;
}
message Locations {
  repeated  LocationArray car  = 1;
  repeated  LocationArray  boat  = 2;
}

Does the size limit best-practice refers to each "Location", or the the whole upper level " Locations"?



Cassondra Foesch

unread,
Aug 12, 2025, 10:10:33 AM Aug 12
to Itamar Katz, Protocol Buffers
> Does this refer to a message containing a single data-block

Protobuf bytes type efficiently encodes in wire format as a length (in bytes) and a simple series of byte values, as such, it does not suffer many of the pitfalls of other large data sets. However, in proto json format, note that it is encoded into base64 and back, and thus incurs some penalties there. Best to not stuff too much into a data block. (You can always stream smaller data block pages.)


> or to any message defined in the schema?

This best practice guide applies to all messages, and recursively so. In fact, embedded messages are usually even more taxing than a simple flat message, because allocations have to be done for all of the sub messages present.

--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+u...@googlegroups.com .
To view this discussion visit https://groups.google.com/d/msgid/protobuf/3d383a9e-968b-47af-b43c-a0218c7e6f98n%40googlegroups.com .

Em Rauch

unread,
Aug 12, 2025, 10:13:55 AM Aug 12
to Cassondra Foesch, Itamar Katz, Protocol Buffers
That 1MB rule of thumb is a guideline about the total wire serialized size of the top-level message.

If you have a very large data set and want each point to be a message, theres a number of different approaches that can be taken for the top-level-set-of-messages instead of having a single top-level message, including length-prefixed serialization (the functions named like parseDelimited), RecordIO, Riegeli, etc

Note that the true limit for what the max encoded size is 2GB, and many people do use Protobuf to serialize individual messages up to which are much larger than 1MB while still being less than 2 GB. The listed documentation is only about what Protobuf is best optimized for, where notably we don't really try to support parsing a subset of a `LocationArray` from your example; other formats would let you do parallelized parsing / mapreduce style operations over the data sets and have data sets which are much larger than 2GB.

Reply all
Reply to author
Forward
0 new messages