Google Cloud Gke Recommender V1 Client - Class PerformanceStats (0.2.0)

Reference documentation and code samples for the Google Cloud Gke Recommender V1 Client class PerformanceStats.

Performance statistics for a model deployment.

Generated from protobuf message google.cloud.gkerecommender.v1.PerformanceStats

Namespace

Google \ Cloud \ GkeRecommender \ V1

Methods

__construct

Constructor.

Parameters

Name

Description

data

array

Optional. Data for populating the Message object.

↳ queries_per_second

float

Output only. The number of queries per second. Note: This metric can vary widely based on context length and may not be a reliable measure of LLM throughput.

↳ output_tokens_per_second

int

Output only. The number of output tokens per second. This is the throughput measured as total_output_tokens_generated_by_server / elapsed_time_in_seconds.

↳ ntpot_milliseconds

int

Output only. The Normalized Time Per Output Token (NTPOT) in milliseconds. This is the request latency normalized by the number of output tokens, measured as request_latency / total_output_tokens.

↳ ttft_milliseconds

int

Output only. The Time To First Token (TTFT) in milliseconds. This is the time it takes to generate the first token for a request.

↳ cost

array< Cost 
>

Output only. The cost of running the model deployment.

getQueriesPerSecond

Output only. The number of queries per second.

Note: This metric can vary widely based on context length and may not be a reliable measure of LLM throughput.

Returns

Type

Description

float

setQueriesPerSecond

Output only. The number of queries per second.

Note: This metric can vary widely based on context length and may not be a reliable measure of LLM throughput.

Parameter

Name

Description

var

float

Returns

Type

Description

$this

getOutputTokensPerSecond

Output only. The number of output tokens per second. This is the throughput measured as total_output_tokens_generated_by_server / elapsed_time_in_seconds.

Returns

Type

Description

int

setOutputTokensPerSecond

Output only. The number of output tokens per second. This is the throughput measured as total_output_tokens_generated_by_server / elapsed_time_in_seconds.

Parameter

Name

Description

var

int

Returns

Type

Description

$this

getNtpotMilliseconds

Output only. The Normalized Time Per Output Token (NTPOT) in milliseconds.

This is the request latency normalized by the number of output tokens, measured as request_latency / total_output_tokens.

Returns

Type

Description

int

setNtpotMilliseconds

Output only. The Normalized Time Per Output Token (NTPOT) in milliseconds.

This is the request latency normalized by the number of output tokens, measured as request_latency / total_output_tokens.

Parameter

Name

Description

var

int

Returns

Type

Description

$this

getTtftMilliseconds

Output only. The Time To First Token (TTFT) in milliseconds. This is the time it takes to generate the first token for a request.

Returns

Type

Description

int

setTtftMilliseconds

Output only. The Time To First Token (TTFT) in milliseconds. This is the time it takes to generate the first token for a request.

Parameter

Name

Description

var

int

Returns

Type

Description

$this

getCost

Output only. The cost of running the model deployment.

Returns

Type

Description

 Google\Protobuf\RepeatedField 
< Cost 
>

setCost

Output only. The cost of running the model deployment.

Parameter

Name

Description

var

array< Cost 
>

Returns

Type

Description

$this

Google Cloud Gke Recommender V1 Client - Class PerformanceStats (0.2.0) Stay organized with collections Save and categorize content based on your preferences.

Namespace

Methods

__construct

getQueriesPerSecond

setQueriesPerSecond

getOutputTokensPerSecond

setOutputTokensPerSecond

getNtpotMilliseconds

setNtpotMilliseconds

getTtftMilliseconds

setTtftMilliseconds

getCost

setCost

Google Cloud Gke Recommender V1 Client - Class PerformanceStats (0.2.0)