OpenLineage mapping

The Data Lineage API can ingest lineage information from systems that integrate with OpenLineage , an open standard for lineage collection. When you send OpenLineage-formatted events to the Data Lineage API using the ProcessOpenLineageRunEvent method, the Data Lineage API maps attributes from the OpenLineage message to corresponding attributes in the Data Lineage API.

This document provides reference tables for these mappings.

Attribute mapping

The ProcessOpenLineageRunEvent REST API method maps OpenLineage attributes to Data Lineage API attributes as follows:

Data Lineage API attributes OpenLineage attributes
Process .name projects/ PROJECT_NUMBER /locations/ LOCATION /processes/ HASH_OF_NAMESPACE_AND_NAME
Process .displayName Job.namespace + ":" + Job.name
Process .attributes Job.facets (see Stored data )
Run .name projects/ PROJECT_NUMBER /locations/ LOCATION /processes/ HASH_OF_NAMESPACE_AND_NAME /runs/ HASH_OF_RUNID
Run .displayName Run.runId
Run .attributes Run.facets (see Stored data )
Run .startTime eventTime
Run .endTime eventTime
Run .state eventType
LineageEvent .name projects/ PROJECT_NUMBER /locations/ LOCATION /processes/ HASH_OF_NAMESPACE_AND_NAME /runs/ HASH_OF_RUNID /lineageEvents/ HASH_OF_JOB_RUN_INPUT_OUTPUTS_OF_EVENT (for example, projects/11111111/locations/us/processes/1234/runs/4321/lineageEvents/111-222-333)
LineageEvent .EventLinks.source inputs ( fqn is namespace and name concatenation)
LineageEvent .EventLinks.target outputs ( fqn is namespace and name concatenation)
LineageEvent .startTime eventTime
LineageEvent .endTime eventTime
requestId Defined by the method user

FQN mapping

The following table provides examples of OpenLineage namespace and name pairs for various systems, and their equivalent Dataplex Universal Catalog fully qualified names (FQN):

System
OpenLineage namespace
OpenLineage name
Dataplex Universal Catalog FQN
awsathena://athena.{region_name}.amazonaws.com
  • {catalog}
  • {catalog}.{database}
  • {catalog}.{database}.{table}
  • athena:{catalogId}.{region}
  • athena:{catalogId}.{region}.{databaseId}
  • athena:{catalogId}.{region}.{databaseId}.{tableId}
arn:aws:glue:{region}:{account id}
table/{database name}/{table name}
aws_glue:table:{region}.{account id}.{database name}.{table name}
azurecosmos://{host}/dbs/{database}
colls/{table}
  • cosmos-db:{host}.{database}
  • cosmos-db:{host}.{database}.{table}
azurekusto://{host}.kusto.windows.net
{database}/{table}
  • kusto:{host}.{region}.{database}
  • kusto:{host}.{region}.{database}.{table}
Azure Synapse
sqlserver://{host}:{port}
  • {database}
  • {database}.{schema}
  • {database}.{schema}.{table}
Not supported
bigquery
  • {project id}.{dataset name}
  • {project id}.{dataset name}.{table name}
  • bigquery:{projectId}.{datasetId}
  • bigquery:{projectId}.{datasetId}.{assetId}
cassandra://{host}:{port}
  • {keyspace}
  • {keyspace}.{table}
  • cassandra:{hostWithPort}.{keyspaceId}
  • cassandra:{hostWithPort}.{keyspaceId}.{tableId}
mysql://{host}:{port}
  • {database}
  • {database}.{table}
  • mysql:{hostWithPort}.{databaseId}
  • mysql:{hostWithPort}.{databaseId}.{tableId}
CrateDB
crate://{host}:{port}
{database}.{schema}.{table}
Not supported
DB2
db2://{host}:{port}
  • {database}
  • {database}.{schema}
  • {database}.{schema}.{table}
  • db2:{dns}.{databaseId}
  • db2:{dns}.{databaseId}.{schemaId}
  • db2:{dns}.{databaseId}.{schemaId}.{tableId}
Hive
hive://{host}:{port}
{database}.{table}
Not supported
MSSQL
mssql://{host}:{port}
{database}.{schema}.{table}
Not supported
OceanBase
oceanbase://{host}:{port}
{database}.{table}
Not supported
oracle://{host}:{port}
{serviceName}.{schema}.{table} or {sid}.{schema}.{table}
  • oracle:{hostWithPort}.{databaseId}
  • oracle:{hostWithPort}.{databaseId}.{schemaId}
  • oracle:{hostWithPort}.{databaseId}.{schemaId}.{tableId}
postgres://{host}:{port}
  • {database}
  • {database}.{schema}
  • {database}.{schema}.{table}
  • postgresql:{hostWithPort}.{databaseId}
  • postgresql:{hostWithPort}.{databaseId}.{schemaId}
  • postgresql:{hostWithPort}.{databaseId}.{schemaId}.{tableId}
Teradata
teradata://{host}:{port}
{database}.{table}
Not supported
redshift://{cluster_identifier}.{region_name}:{port}
  • {database}
  • {database}.{schema}
  • {database}.{schema}.{table}
  • redshift:{clusterId}.{region}.{port}.{databaseId}
  • redshift:{clusterId}.{region}.{port}.{databaseId}.{schemaId}
  • redshift:{clusterId}.{region}.{port}.{databaseId}.{schemaId}.{tableId}
snowflake://{organization name}-{account name} or snowflake://{account-locator}(.{compliance})(.{cloud_region_id})(.{cloud})
  • {database}
  • {database}.{schema}
  • {database}.{schema}.{table}
  • snowflake:{accountName}.{databaseId}
  • snowflake:{accountName}.{databaseId}.{schemaId}
  • snowflake:{accountName}.{databaseId}.{schemaId}.{tableId}
spanner://{projectId}:{instanceId}
{database}.{schema}.{table}
Supported in Dataplex Universal Catalog, but not supported in Data lineage
trino://{host}:{port}
  • {catalog}
  • {catalog}.{schema}
  • {catalog}.{schema}.{table}
  • trino:{hostWithPort}.{catalogId}
  • trino:{hostWithPort}.{catalogId}.{schemaId}
  • trino:{hostWithPort}.{catalogId}.{schemaId}.{tableId}
abfss://{container name}@{service name}.dfs.core.windows.net
{path}
  • abs:{serviceName}.{containerName}
  • abs:{serviceName}.{containerName}.{path}
dbfs://{workspace name}
{path}
  • dbfs:{workspace}
  • dbfs:{workspace}.{path}
gs://{bucket name}
{object key}
  • gcs:{bucketName}
  • gcs:{bucketName}.{virtualPath}
hdfs://{namenode host}:{namenode port}
{path}
  • hdfs:{namenodeHostWithPort}
  • hdfs:{namenodeHostWithPort}.{path}
kafka://{bootstrap server host}:{port}
{topic}
kafka:{serverHostWithPort}.{topicId}
file
{path}
filesystem:localhost.{path}
file://{host}
{path}
filesystem:{hostWithPort}.{path}
S3
s3://{bucket name}
{object key}
  • s3:{bucketName}
  • s3:{bucketName}.{objectKey}

Namespace prefixes s3a and s3n are also accepted and converted to s3
wasbs://{container name}@{service name}.dfs.core.windows.net
{object key}
  • abs:{serviceName}.{containerName}
  • abs:{serviceName}.{containerName}.{objectKey}
pubsub
topic:{projectId}:{topicId}
pubsub:topic:{projectId}.{topicId}
pubsub
subscription:{projectId}:{subscriptionId}
pubsub:subscription:{projectId}.{subscriptionId}

Additional accepted formats

While OpenLineage doesn't define standard namespace / name pairs for the following systems, the Data Lineage API accepts lineage events for them when formatted as described in the following table. Resources that are referenced in OpenLineage messages with the namespace custom are interpreted as custom fully qualified names.

System
OpenLineage namespace
OpenLineage name
Dataplex Universal Catalog FQN
custom
{some reference}
custom:{someReference}
dataproc_metastore
  • dataproc_metastore:{projectId}.{location}.{instanceId}
  • dataproc_metastore:{projectId}.{location}.{instanceId}.{databaseId}
  • dataproc_metastore:{projectId}.{location}.{instanceId}.{databaseId}.{tableId}
  • dataproc_metastore:{projectId}.{location}.{instanceId}
  • dataproc_metastore:{projectId}.{location}.{instanceId}.{databaseId}
  • dataproc_metastore:{projectId}.{location}.{instanceId}.{databaseId}.{tableId}

What's next

Create a Mobile Website
View Site in Mobile | Classic
Share by: