Size limits for Proto Messages https://protobuf.dev/programming-guides/proto-limits/

86 views
Skip to first unread message

Somak Dutta

unread,
Jul 3, 2025, 4:21:40 AM Jul 3
to Protocol Buffers
Hello, 

From  https://protobuf.dev/programming-guides/proto-limits/ i understand across all ecosystems 

Any proto in serialized form must be <2GiB, as that is the maximum size supported by all implementations. It’s recommended to bound request and response sizes.


However wanted to check where exactly is the limitation set up, specifically in protobuf-java library.

I can see safe checks in only message_lite.cc files , but i dont think this would be reflected across ecosystems?

if (size > INT_MAX) {
GOOGLE_LOG (ERROR) << "Exceeded maximum protobuf size of 2GB: " << size;
return false ;
}

Regards

Confidentiality Notice: This email and any attachments are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the sender immediately and delete it from your system. Unauthorized use, disclosure, or copying of this email or its contents is strictly prohibited.

Cassondra Foesch

unread,
Jul 4, 2025, 5:57:21 AM Jul 4
to Somak Dutta, Protocol Buffers
I’m pretty sure that since 2 GiB is the maximum value an int32 could carry, that is where the requirement is coming from. It’s entirely possible that it is not actually enforced across the whole ecosystem, but is essentially enforced by “if you exceed this boundary, some code will not work with your protobuf.”

Like, for instance, it is impossible for a 32-bit Golang implementation do deal with more than 2 GiB data in a single slice. (Since the length of the slice is stored as a 32-bit signed integer.)

--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+u...@googlegroups.com .
To view this discussion visit https://groups.google.com/d/msgid/protobuf/e0d724d8-2a45-4ef1-aaac-c3e6d1077306n%40googlegroups.com .

Somak Dutta

unread,
Jul 4, 2025, 10:40:26 AM Jul 4
to Protocol Buffers
Exactly, could not agree more. There are current limit set to Integer.MAX_VALUE in CodedInputStream

May be a bit of context here would help, I am coming from the point of view  https://groups.google.com/g/protobuf/c/vvP4uajRE60

If the potential fix for it was to set limit to 2g in message_lite.c, in memory safe language like Java it is anyways default to 2g. I wonder if the vulnerability data in the world that marks java as impacted by the vulnerability is really over estimating.

```

Somak Dutta
Jul 3, 2025, 1:51:24 PM (yesterday) 
to Protocol Buffers
Hi,

I am writing to ask about vulnerability reported  GHSA-jwvw-v7c5-m82h  for  protobuf-java  which specifically talks about " protobuf allows remote authenticated attackers to cause a heap-based buffer overflow.

Specifically to ask about earlier versions < 3.4.0. 
Take for example a version 2.5.0, based on all the code i see for  CodedInputStream
- methods such as readRawBytes/refillBuffer, which are performing either copy to/from or resizing , are all pretty safe from integer overflows.
- there is also present a slow path, where we read buffer in chunks to potentially prevent out of memory issues.

First Question:
However i am not seeing any evidence where the package can be vulnerable to a buffer overflows issues
Additionally given java is memory safe language i am failing to see how java ecosystem is susceptible to the afore mentioned vulnerability.

Second Question:
There is a question related / or along the same veins here  https://github.com/protocolbuffers/protobuf/issues/760?reload=1#issuecomment-847162817  . The potential fix also suggests issue might be present only in c/c++ ecosystems.

```

Regards,
Somak

Em Rauch

unread,
Jul 7, 2025, 9:46:00 AM Jul 7
to Somak Dutta, Protocol Buffers
The context is that Message-typed fields (and string-typed fields) are encoded with an int32 of "number of bytes that follow this are this message". It's not possible to encode a message which is larger than that from a binary wire format technical point of view, and thats an ecosystem wide implication.

This leaves some wiggle room though, notably top level messages are not encoded with a length prefix, which means they don't have any such technical constraint. But also more notably, if you just construct a message in memory and then call some setters it will build up the arbitrarily large message.

> May be a bit of context here would help, I am coming from the point of view  https://groups.google.com/g/protobuf/c/vvP4uajRE60
> If the potential fix for it was to set limit to 2g in message_lite.c,

Without other context and doing more archeology, I actually suspect the 'attack' was more that e.g. sufficiently smart attackers could send a string which is length "2GB minus one byte", and then know that the service boxes up that input in a protobuf message (adding a few bytes over overhead), and then encode that to the next backend server.

And the fix C++ issue back then was not simply to try to enforce a conceptual limit on 2GB, it instead required changing the C++ API to use a `long` (int64) for the encoded size of messages instead of an int32 (the size getter is called `ByteSizeLong()`). That made it much easier to write correct behavior against 2GB limits; because when you have an `int EncodedLength()` function, once you e.g. have 10 strings that are each 512MB, set them all as separate fields on the same parent message, then try to see what the `int` serialized size should be, there's no way to handle it gracefully. By making it a `long` instead it is able to return the actual size without a 2GB limit, and then if you try to serialize a message where that size is too large it will fail to serialize (serialize has a bool return value on it).

Somak Dutta

unread,
Jul 13, 2025, 10:12:13 AM Jul 13
to Protocol Buffers
Thank  Em Rauch for your response.

Your suspicion on attack surface actually piqued my interest and I wanted to check if its possible
```
Without other context and doing more archeology, I actually suspect the 'attack' was more that e.g. sufficiently smart attackers could send a string which is length "2GB minus one byte", and then know that the service boxes up that input in a protobuf message (adding a few bytes over overhead), and then encode that to the next backend server.

```
Given data is read via the codedinputstream a small unit test would have been able to uncover the possibility of the same

1. For test purposes reduced the current limit size to say 12 and buffer size to 4
2. Interestingly there are already present defensive measures to check that size limits are respected, eg in 2.5 version we have this check

// if (totalBytesRetired + bufferPos + size > currentLimit) {raise exception

public void testCurrentLimitExceededOld() throws Exception {
        byte[] bytes = "123456789999".getBytes("UTF-8");
        ByteArrayOutputStream rawOutput = new ByteArrayOutputStream();
        CodedOutputStream output = CodedOutputStream.newInstance(rawOutput, bytes.length);
        int tag = WireFormat.makeTag(1, WireFormat.WIRETYPE_LENGTH_DELIMITED);
        output.writeRawVarint32(bytes.length);
        output.writeRawBytes(bytes);
        output.flush();
        byte[] rawInput = rawOutput.toByteArray();
        CodedInputStream input = CodedInputStream.newInstance(
                new ByteArrayInputStream(rawInput));
        // The length of the whole rawInput
        input.setSizeLimit(14);
        System.out.println(input.readString());
    }

By the way, there are definite JNI calls for memory copy starting from 3.0 version , however all these calls are well protected via length and index checks. 
So it does appear the version is immune to the vulnerability in question



Regards,
Somak
Reply all
Reply to author
Forward
0 new messages