Stay organized with collectionsSave and categorize content based on your preferences.
Choose a document processing function
This document provides a comparison of the document processing functions
available in BigQuery ML, which areML.GENERATE_TEXTandML.PROCESS_DOCUMENT.
You can use the information in this document to help you decide which function
to use in cases where the functions have overlapping capabilities.
At a high level, the difference between these functions is as follows:
ML.GENERATE_TEXTis a good choice for performing natural
language processing (NLP) tasks where some of the content resides in
documents. This function offers the following benefits:
For example, given a financial document for a company, you can retrieve
document information by providing a prompt such asWhat is
the quarterly revenue for each division?.
Use theDocument AI APIto
perform specialized document processing for
different document types, such as invoices, tax forms, and financial
statements. You can also perform document chunking.
Billing
Incurs BigQuery ML charges for data processed. For more information, seeBigQuery ML pricing.
Incurs Vertex AI charges for calls to the model. If you are using
a Gemini 2.0 or greater model, the call is billed at the batch
API rate. For more information, seeCost of building and deploying AI models in Vertex AI.
Incurs BigQuery ML charges for data processed. For more information, seeBigQuery ML pricing.
Incurs charges for calls to the Document AI API. For more information, seeDocument AI API pricing.
Requests per minute (RPM)
Not applicable for Gemini models. Between 25 and 60
for partner models. For more information, seeRequests per minute limits.
120 RPM per processor type, with an overall limit of 600 RPM per project.
For more information, seeQuotas list.
Tokens per minute
Ranges from 8,192 to over 1 million, depending on the model used.
No token limit. However, this function does have different page limits
depending on the processor you use. For more information, seeLimits.
Language support depends on the document processor type; most only
support English. For more information, seeProcessor list.
Supported regions
Supported in all Generative AI for Vertex AIregions.
Supported in theEUandUSmulti-regions
for all processors. Some processors are also available in certain single
regions. For more information, seeRegional and multi-regional support.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-10 UTC."],[[["\u003cp\u003e\u003ccode\u003eML.GENERATE_TEXT\u003c/code\u003e is suitable for natural language processing tasks within documents, offering benefits like lower costs, broader language support, faster processing, model tuning, and multimodal model options.\u003c/p\u003e\n"],["\u003cp\u003e\u003ccode\u003eML.PROCESS_DOCUMENT\u003c/code\u003e excels in document processing tasks requiring structured responses and document parsing, and also supports working with different PDF file structures.\u003c/p\u003e\n"],["\u003cp\u003e\u003ccode\u003eML.GENERATE_TEXT\u003c/code\u003e uses a subset of Vertex AI Gemini models and supports a wide array of natural language processing tasks, while \u003ccode\u003eML.PROCESS_DOCUMENT\u003c/code\u003e utilizes the Document AI API, with specialized document processing for tasks like parsing invoices or tax forms.\u003c/p\u003e\n"],["\u003cp\u003e\u003ccode\u003eML.GENERATE_TEXT\u003c/code\u003e supports supervised tuning for certain models, while \u003ccode\u003eML.PROCESS_DOCUMENT\u003c/code\u003e does not have supervised tuning support.\u003c/p\u003e\n"],["\u003cp\u003e\u003ccode\u003eML.GENERATE_TEXT\u003c/code\u003e has higher language support and lower token limits, whereas \u003ccode\u003eML.PROCESS_DOCUMENT\u003c/code\u003e depends on the document processor for language support and has no token limit, only page limits.\u003c/p\u003e\n"]]],[],null,["Choose a document processing function\n\nThis document provides a comparison of the document processing functions\navailable in BigQuery ML, which are\n[`ML.GENERATE_TEXT`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-generate-text)\nand\n[`ML.PROCESS_DOCUMENT`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-process-document).\nYou can use the information in this document to help you decide which function\nto use in cases where the functions have overlapping capabilities.\n\nAt a high level, the difference between these functions is as follows:\n\n- `ML.GENERATE_TEXT` is a good choice for performing natural\n language processing (NLP) tasks where some of the content resides in\n documents. This function offers the following benefits:\n\n - Lower costs\n - More language support\n - Faster throughput\n - Model tuning capability\n - Availability of multimodal models\n\n For examples of document processing tasks that work best with this\n approach, see\n [Explore document processing capabilities with the Gemini API](https://ai.google.dev/gemini-api/docs/document-processing).\n- `ML.PROCESS_DOCUMENT` is a good choice for performing document processing\n tasks that require document parsing and a predefined, structured response.\n\nFunction comparison\n\nUse the following table to compare the `ML.GENERATE_TEXT` and\n`ML.PROCESS_DOCUMENT` functions:\n\n| | `ML.GENERATE_TEXT` | `ML.PROCESS_DOCUMENT` |\n|---------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| Purpose | Perform any document-related NLP task by passing a prompt to a [Gemini or partner model](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-remote-model) or to an [open model](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-remote-model-open). For example, given a financial document for a company, you can retrieve document information by providing a prompt such as `What is the quarterly revenue for each division?`. | Use the [Document AI API](/document-ai) to perform specialized document processing for different document types, such as invoices, tax forms, and financial statements. You can also perform document chunking. |\n| Billing | Incurs BigQuery ML charges for data processed. For more information, see [BigQuery ML pricing](/bigquery/pricing#bigquery-ml-pricing). Incurs Vertex AI charges for calls to the model. If you are using a Gemini 2.0 or greater model, the call is billed at the batch API rate. For more information, see [Cost of building and deploying AI models in Vertex AI](/vertex-ai/generative-ai/pricing). | Incurs BigQuery ML charges for data processed. For more information, see [BigQuery ML pricing](/bigquery/pricing#bigquery-ml-pricing). \u003cbr /\u003e Incurs charges for calls to the Document AI API. For more information, see [Document AI API pricing](/document-ai/pricing). |\n| Requests per minute (RPM) | Not applicable for Gemini models. Between 25 and 60 for partner models. For more information, see [Requests per minute limits](/bigquery/quotas#requests_per_minute_limits). | 120 RPM per processor type, with an overall limit of 600 RPM per project. For more information, see [Quotas list](/document-ai/quotas#quotas_list). |\n| Tokens per minute | Ranges from 8,192 to over 1 million, depending on the model used. | No token limit. However, this function does have different page limits depending on the processor you use. For more information, see [Limits](/document-ai/limits). |\n| Supervised tuning | [Supervised tuning](/vertex-ai/generative-ai/docs/learn/models#languages-gemini) is supported for some models. | Not supported. |\n| Supported languages | Support varies based on the LLM you choose. | Language support depends on the document processor type; most only support English. For more information, see [Processor list](/document-ai/docs/processors-list). |\n| Supported regions | Supported in all Generative AI for Vertex AI [regions](/vertex-ai/generative-ai/docs/learn/locations#available-regions). | Supported in the `EU` and `US` multi-regions for all processors. Some processors are also available in certain single regions. For more information, see [Regional and multi-regional support](/document-ai/docs/regions). |"]]