Reference documentation and code samples for the Google Cloud Gke Recommender V1 Client class PerformanceStats.
Performance statistics for a model deployment.
Generated from protobuf message google.cloud.gkerecommender.v1.PerformanceStats
Namespace
Google \ Cloud \ GkeRecommender \ V1Methods
__construct
Constructor.
data
array
Optional. Data for populating the Message object.
↳ queries_per_second
float
Output only. The number of queries per second. Note: This metric can vary widely based on context length and may not be a reliable measure of LLM throughput.
↳ output_tokens_per_second
int
Output only. The number of output tokens per second. This is the throughput measured as total_output_tokens_generated_by_server / elapsed_time_in_seconds.
↳ ntpot_milliseconds
int
Output only. The Normalized Time Per Output Token (NTPOT) in milliseconds. This is the request latency normalized by the number of output tokens, measured as request_latency / total_output_tokens.
↳ ttft_milliseconds
int
Output only. The Time To First Token (TTFT) in milliseconds. This is the time it takes to generate the first token for a request.
↳ cost
getQueriesPerSecond
Output only. The number of queries per second.
Note: This metric can vary widely based on context length and may not be a reliable measure of LLM throughput.
float
setQueriesPerSecond
Output only. The number of queries per second.
Note: This metric can vary widely based on context length and may not be a reliable measure of LLM throughput.
var
float
$this
getOutputTokensPerSecond
Output only. The number of output tokens per second. This is the throughput measured as total_output_tokens_generated_by_server / elapsed_time_in_seconds.
int
setOutputTokensPerSecond
Output only. The number of output tokens per second. This is the throughput measured as total_output_tokens_generated_by_server / elapsed_time_in_seconds.
var
int
$this
getNtpotMilliseconds
Output only. The Normalized Time Per Output Token (NTPOT) in milliseconds.
This is the request latency normalized by the number of output tokens, measured as request_latency / total_output_tokens.
int
setNtpotMilliseconds
Output only. The Normalized Time Per Output Token (NTPOT) in milliseconds.
This is the request latency normalized by the number of output tokens, measured as request_latency / total_output_tokens.
var
int
$this
getTtftMilliseconds
Output only. The Time To First Token (TTFT) in milliseconds. This is the time it takes to generate the first token for a request.
int
setTtftMilliseconds
Output only. The Time To First Token (TTFT) in milliseconds. This is the time it takes to generate the first token for a request.
var
int
$this
getCost
Output only. The cost of running the model deployment.
setCost
Output only. The cost of running the model deployment.
$this

