Cloud Load Balancing callouts overview

Service Extensions lets you instruct supported Application Load Balancers to send a callout from the load balancing data path to user-managed callout services or Google services.

Callouts data flow

A load balancer communicates with a callout by using Envoy's ext proc gRPC API . This API lets the extension service respond to events in the lifecycle of an HTTP request by examining and modifying the headers or the body of the request.

An abbreviated version of the API is as follows.

 // 
  
 The 
  
 gRPC 
  
 API 
  
 to 
  
 be 
  
 implemented 
  
 by 
  
 the 
  
 external 
  
 processing 
  
 server 
 service 
  
 ExternalProcessor 
  
 { 
  
 rpc 
  
 Process 
 ( 
 stream 
  
 ProcessingRequest 
 ) 
  
 returns 
  
 ( 
 stream 
  
 ProcessingResponse 
 ) 
  
 { 
  
 } 
 } 
 // 
  
 Envoy 
  
 sets 
  
 one 
  
 of 
  
 these 
  
 fields 
  
 depending 
  
 on 
  
 the 
  
 processing 
  
 stage 
 . 
 message 
  
 ProcessingRequest 
  
 { 
  
 oneof 
  
 request 
  
 { 
  
 HttpHeaders 
  
 request_headers 
  
 = 
  
 2 
 ; 
  
 HttpHeaders 
  
 response_headers 
  
 = 
  
 3 
 ; 
  
 HttpBody 
  
 request_body 
  
 = 
  
 4 
 ; 
  
 HttpBody 
  
 response_body 
  
 = 
  
 5 
 ; 
  
 } 
 } 
 // 
  
 Depending 
  
 on 
  
 the 
  
 processing 
  
 mode 
  
 configuration 
 , 
  
 the 
  
 server 
  
 might 
  
 operate 
  
 in 
 // 
  
 one 
  
 of 
  
 two 
  
 ways 
 . 
  
 One 
  
 way 
  
 is 
  
 to 
  
 send 
  
 a 
  
 ProcessingResponse 
  
 response 
  
 for 
  
 each 
 // 
  
 message 
  
 received 
 . 
  
 The 
  
 other 
  
 is 
  
 to 
  
 buffer 
  
 and 
  
 process 
  
 many 
  
 body 
  
 chunks 
  
 before 
 // 
  
 splitting 
  
 the 
  
 processed 
  
 body 
  
 into 
  
 smaller 
  
 chunks 
 , 
  
 with 
  
 each 
  
 response 
  
 chunk 
 // 
  
 sent 
  
 in 
  
 a 
  
 separate 
  
 ProcessingResponse 
  
 message 
 . 
 message 
  
 ProcessingResponse 
  
 { 
  
 // 
  
 The 
  
 server 
  
 must 
  
 set 
  
 one 
  
 of 
  
 these 
  
 fields 
  
 corresponding 
  
 to 
  
 the 
  
 field 
  
 set 
  
 in 
  
 // 
  
 the 
  
 ProcessingRequest 
  
 message 
 . 
  
 Alternatively 
 , 
  
 the 
  
 server 
  
 can 
  
 set 
  
 the 
  
 // 
  
 immediate_response 
  
 field 
  
 to 
  
 make 
  
 the 
  
 load 
  
 balancer 
  
 terminate 
  
 request 
  
 // 
  
 processing 
  
 and 
  
 send 
  
 the 
  
 specified 
  
 response 
  
 back 
  
 to 
  
 the 
  
 client 
 . 
  
 oneof 
  
 response 
  
 { 
  
 HeadersResponse 
  
 request_headers 
  
 = 
  
 1 
 ; 
  
 HeadersResponse 
  
 response_headers 
  
 = 
  
 2 
 ; 
  
 BodyResponse 
  
 request_body 
  
 = 
  
 3 
 ; 
  
 BodyResponse 
  
 response_body 
  
 = 
  
 4 
 ; 
  
 ImmediateResponse 
  
 immediate_response 
  
 = 
  
 7 
 ; 
  
 } 
 }

Figure 3 shows how you can deploy the callout backend service with a gRPC server on a user-managed compute resource such as virtual machine (VM) instances or Google Kubernetes Engine (GKE) and represent it to the load balancer as a regular backend service.

Application Load Balancers use callouts to include custom logic from callout backend services. — **Figure 3.** Application Load Balancers send Service Extensions callouts to callout backend services (click to enlarge).

For example, on receiving the headers for an HTTP request, the load balancer sends the ProcessingRequest message to the extension service with the request_headers field set to the HTTP headers from the client. The extension service must respond with a suitable ProcessingResponse message with any configured changes to the headers or body.

For REQUEST_HEADER and RESPONSE_HEADER events, the extension service can manipulate the HTTP headers in the request or response. The service can add, modify, or delete headers by setting the request_headers or response_headers field in the ProcessingResponse message appropriately. Use the raw_value field for headers.

You can deploy the ext_proc gRPC service on VM instances or on GKE and configure an instance group or network endpoint group (NEG) to represent the endpoints of this service.

Traffic extensions allow changing the headers and the body of both requests and responses. The extension server can override the processing mode dynamically and allow it to enable or disable the extension for subsequent phases of request processing.

The other extensions have the following restrictions:

They allow changing only the request headers. So, the extension service must not set anything other than request_headers in the ProcessingResponse message.
They can't override the processing mode of the ext_proc stream. Load balancers call them only for request headers.

Load balancers don't re-evaluate route rules after calling a traffic extension.

Supported backends for user-managed callout backend services

You can host user-managed callout extensions on a backend service that uses one of the following types of backends that run the ext_proc gRPC service:

All managed and unmanaged instance group backends
All zonal NEGs
All hybrid connectivity NEGs
Private Service Connect NEGs pointing to VPC services
Serverless NEGs pointing to Cloud Run services

Recommended optimizations for callouts

Integrating an extension into the load balancing processing path incurs additional latency for requests and responses. Each type of data that the extension service processes—including request headers, request body, response headers, and response body—adds latency.

Consider the following optimizations to minimize the latency:

Configure the extension to process only the data that you need. For example, to modify only request headers, set the supported_events field in the extension to REQUEST_HEADERS .
Deploy callouts in the same zones as the regular destination backend service for the load balancer. When using a cross-region internal Application Load Balancer, place the extension service backends in the same region as the load balancer's proxy-only subnets.
When using a global external Application Load Balancer, place the callout service backends in the geographic regions where the regular load balancer's destination VMs, GKE workloads, and Cloud Run functions are located.

Limitations

This section lists some limitations with callouts.

Limitations with header manipulation

You cannot change some headers. The following are the limitations with header manipulation:

Header manipulation is not supported for the following headers:
- X-user-IP
- CDN-Loop
- Headers starting with X-Forwarded , X-Google , X-GFE , or X-Amz-
- connection
- keep-alive
- transfer-encoding , te
- upgrade
- proxy-connection , proxy-authenticate , proxy-authorization
- trailers
For LbTrafficExtension , header manipulation is also not supported for these: :method , :authority , :scheme , or host headers.
When the ext_proc server specifies header values in HeaderMutation , the load balancer ignores the value field. Use the raw_value field instead.

Limitations with HTTP/1.1 clients and backends

The following are the limitations with HTTP/1.1 clients and backends:

When you configure either REQUEST_BODY or RESPONSE_BODY for an extension, if the load balancer receives a matching request, it removes the Content-Length header from the response and switches to chunked body encoding.
While streaming a message body to the ext_proc server, at the end, the load balancer might send a tailing ProcessingRequest message with an empty body with end_stream set to true to indicate that the stream has ended.

Other limitations

The following is a limitation with ProcessingResponse messages:

The maximum size of one ProcessingResponse message is 128KB. If a message received is over this limit, the stream is closed with a RESOURCE_EXHAUSTED error.
The callout backend service cannot use Cloud Armor, IAP, or Cloud CDN policies.
The callout backend service must use HTTP/2 as the protocol.
The callout backend service used by route extensions cannot override the processing mode of ext_proc stream.

What's next

Configure a callout backend service .

This is a prerequisite to configuring route, authorization, and user-managed traffic extensions by using callouts.
Configure a route extension
Configure an authorization extension
Configure a traffic extension
Configure an extension to a Google service