As of April 10, 2026, Dataplex Universal Catalog is now called Knowledge Catalog. The API, client library, CLI, and IAM names remain unchanged. For more information, seeIntroducing the Google Cloud Knowledge Catalog.
Integrate with OpenLineageStay organized with collectionsSave and categorize content based on your preferences.
This document explains how to integrate OpenLineage with Knowledge Catalog (formerly Dataplex Universal Catalog) to
import and visualize data lineage from external systems. By acting as an OpenLineage consumer
using theProcessOpenLineageRunEventREST API, Knowledge Catalog lets you unify custom pipeline
lineage alongside built-in lineage from Google Cloud services.
Overview
OpenLineageis an open platform for collecting and
analyzing data lineage information. Using an open standard for lineage data,
OpenLineage captures lineage events from data pipeline components which use an
OpenLineage API to report on runs, jobs, and datasets.
Through the Data Lineage API, you can import OpenLineage events to display
in the Knowledge Catalog web interface alongside lineage information from
Google Cloud services, such as BigQuery, Managed Service for Apache Airflow,
Cloud Data Fusion, and Managed Service for Apache Spark.
Supported versions:The Data Lineage API supports OpenLineage major version 1.
API actions:The Data Lineage API endpointProcessOpenLineageRunEventonly acts as aconsumerof OpenLineage messages, not aproducer. The
API lets you send lineage information generated by any OpenLineage-compliant
tool or system into Knowledge Catalog. Some Google Cloud services, such asManaged Service for Apache SparkandManaged Airflow, include
built-in OpenLineageproducersthat can send events to this endpoint,
automating lineage capture from those services.
Unsupported features:The Data Lineage API doesn't support the following:
Any subsequent OpenLineage release with message format changes
DatasetEvent
JobEvent
Message size:Maximum size of a single message is 5 MB.
Name length:Length of eachFully Qualified Namein inputs and outputs is limited to 4000 characters.
Link limits:Linksare grouped by events, with a maximum of 100 links per event. The maximum aggregate number of table-level links is 1000. If a message contains more than 1500 column-level links, the column-level information is skipped.
Graph scope:Knowledge Catalog displays a lineage graph for each job run, showing the
inputs and outputs of lineage events. It doesn't support lower-level
processes such as Spark stages.
//go:build examplespackagemainimport("context"lineage"cloud.google.com/go/datacatalog/lineage/apiv1"lineagepb"cloud.google.com/go/datacatalog/lineage/apiv1/lineagepb")funcmain(){ctx:=context.Background()// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in:// https://pkg.go.dev/cloud.google.com/go#hdr-Client_Optionsc,err:=lineage.NewClient(ctx)iferr!=nil{// TODO: Handle error.}deferc.Close()req:=&lineagepb.ProcessOpenLineageRunEventRequest{// TODO: Fill request struct fields.// See https://pkg.go.dev/cloud.google.com/go/datacatalog/lineage/apiv1/lineagepb#ProcessOpenLineageRunEventRequest.}resp,err:=c.ProcessOpenLineageRunEvent(ctx,req)iferr!=nil{// TODO: Handle error.}// TODO: Use resp._=resp}
importcom.google.cloud.datacatalog.lineage.v1.LineageClient;importcom.google.cloud.datacatalog.lineage.v1.ProcessOpenLineageRunEventRequest;importcom.google.cloud.datacatalog.lineage.v1.ProcessOpenLineageRunEventResponse;importcom.google.protobuf.Struct;publicclassSyncProcessOpenLineageRunEvent{publicstaticvoidmain(String[]args)throwsException{syncProcessOpenLineageRunEvent();}publicstaticvoidsyncProcessOpenLineageRunEvent()throwsException{// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_librarytry(LineageClientlineageClient=LineageClient.create()){ProcessOpenLineageRunEventRequestrequest=ProcessOpenLineageRunEventRequest.newBuilder().setParent("parent-995424086").setOpenLineage(Struct.newBuilder().build()).setRequestId("requestId693933066").build();ProcessOpenLineageRunEventResponseresponse=lineageClient.processOpenLineageRunEvent(request);}}}
# This snippet has been automatically generated and should be regarded as a# code template only.# It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in:# https://googleapis.dev/python/google-api-core/latest/client_options.htmlfromgoogle.cloudimportdatacatalog_lineage_v1defsample_process_open_lineage_run_event():# Create a clientclient=datacatalog_lineage_v1.LineageClient()# Initialize request argument(s)request=datacatalog_lineage_v1.ProcessOpenLineageRunEventRequest(parent="parent_value",)# Make the requestresponse=client.process_open_lineage_run_event(request=request)# Handle the responseprint(response)
require"google/cloud/data_catalog/lineage/v1"### Snippet for the process_open_lineage_run_event call in the Lineage service## This snippet has been automatically generated and should be regarded as a code# template only. It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in https://cloud.google.com/ruby/docs/reference.## This is an auto-generated example demonstrating basic usage of# Google::Cloud::DataCatalog::Lineage::V1::Lineage::Client#process_open_lineage_run_event.#defprocess_open_lineage_run_event# Create a client object. The client can be reused for multiple calls.client=Google::Cloud::DataCatalog::Lineage::V1::Lineage::Client.new# Create a request. To set request fields, pass in keyword arguments.request=Google::Cloud::DataCatalog::Lineage::V1::ProcessOpenLineageRunEventRequest.new# Call the process_open_lineage_run_event method.result=client.process_open_lineage_run_eventrequest# The returned object is of type Google::Cloud::DataCatalog::Lineage::V1::ProcessOpenLineageRunEventResponse.presultend
To simplify sending events to the Data Lineage API, you can use various
tools and libraries:
Google Cloud Java Producer Library:Google provides an open-source Java
library to help construct and send OpenLineage events to the
Data Lineage API. For more information, see the blog postProducer java library for Data Lineage is now open source.
The library is available onGitHubandMaven.
OpenLineage GCP Transport:For Java-based OpenLineage producers, a
dedicatedGcpLineage Transportis available. It simplifies integration with Data Lineage API, by
minimizing the code needed for sending events to Data Lineage API. TheGcpLineageTransportcan be configured as the event sink for any existing
OpenLineage producer such as Airflow, Spark, and Flink. For more information
and examples, seeGcpLineage.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2026-06-18 UTC."],[],[]]