Choose a transcription function

This document provides a comparison of the transcription functions available in BigQuery ML, which are ML.GENERATE_TEXT and ML.TRANSCRIBE .

You can use the information in this document to help you decide which function to use in cases where the functions have overlapping capabilities.

At a high level, the difference between these functions is as follows:

ML.GENERATE_TEXT is a good choice for transcription of audio clips that are 10 minutes or shorter, and you can also use it to perform natural language processing (NLP) tasks. Audio transcription with ML.GENERATE_TEXT is less expensive than with ML.TRANSCRIBE when you use the gemini-1.5-flash model.
ML.TRANSCRIBE is a good choice for performing transcription on audio clips that are longer than 10 minutes. It also supports a wider range of languages than ML.GENERATE_TEXT .

Supported models

Supported models are as follows:

ML.GENERATE_TEXT : you can use a subset of the Vertex AI Gemini models to generate text. For more information on supported models, see the ML.GENERATE_TEXT syntax .
ML.TRANSCRIBE : you use the default model of the Speech-to-Text API . Using the Document AI API gives you access to transcription with the Chirp speech model .

Supported tasks are as follows:

ML.GENERATE_TEXT : you can perform audio transcription and natural language processing (NLP) tasks.
ML.TRANSCRIBE : you can perform audio transcription.

Pricing is as follows:

ML.GENERATE_TEXT : for pricing of the Vertex AI models that you use with this function, see Vertex AI pricing . Supervised tuning of supported models is charged at dollars per node hour. For more information, see Vertex AI custom training pricing .
ML.TRANSCRIBE : For pricing of the Cloud AI service that you use with this function, see Speech-to-Text API pricing .

Supervised tuning support is as follows:

QPM limits are as follows:

ML.GENERATE_TEXT : 60 QPM in the default us-central1 region for gemini-1.5-pro models, and 200 QPM in the default us-central1 region for gemini-1.5-flash models. For more information, see Generative AI on Vertex AI quotas .
ML.TRANSCRIBE : 900 QPM per project. For more information, see Quotas and limits .

To increase your quota, see Request a quota adjustment .

Token limits are as follows:

ML.GENERATE_TEXT : 700 input tokens, and 8196 output tokens. This output token limit means that ML.GENERATE_TEXT has a limit of approximately 39 minutes for an individual audio clip.
ML.TRANSCRIBE : No token limit. However, this function does have a limit of 480 minutes for an individual audio clip.

Supported languages are as follows:

Region availability is as follows:

ML.GENERATE_TEXT : available in all Generative AI for Vertex AI regions .
ML.TRANSCRIBE : available in the EU and US multi-regions for all speech recognizers.