Data Catalog is deprecated and will be discontinued on January 30, 2026. For steps to transition your Data Catalog users, workloads, and content to Dataplex Universal Catalog, seeTransition from Data Catalog to Dataplex Universal Catalog.
A fileset must have one, and may have no more than 5, fileset patterns.
You can query Data Catalog filesets with Dataflow SQL but
only if they have a defined schema and contain only CSV files without header
rows.
Create entry groups and filesets
Filesets must be placed within a user-created entry group. If you have not
created an entry group, first create the entry group, then create the
fileset within the entry group. You canset IAM policieson the entry group to
define who has access to filesets and other entries within the entry group.
Complete theCreate Entry Groupform, then clickCREATE.
TheEntry group detailspage opens. With theENTRIEStab selected,
clickCREATE.
Complete theCreate Filesetform.
To attach a schema, clickDefine Schemato open the Schema form.
Click+ ADD FIELDSto add fields individually or toggleEdit as
textin the upper right of the form to specify the fields in JSON format.
Use thegcloud data-catalog entries createcommand to create a fileset within an entry group. This Google Cloud CLI command
example, below, creates a fileset entry that includes schema of fileset data.
importcom.google.cloud.datacatalog.v1.ColumnSchema;importcom.google.cloud.datacatalog.v1.CreateEntryRequest;importcom.google.cloud.datacatalog.v1.DataCatalogClient;importcom.google.cloud.datacatalog.v1.Entry;importcom.google.cloud.datacatalog.v1.EntryGroupName;importcom.google.cloud.datacatalog.v1.EntryType;importcom.google.cloud.datacatalog.v1.GcsFilesetSpec;importcom.google.cloud.datacatalog.v1.Schema;importjava.io.IOException;// Sample to create file set entrypublicclassCreateFilesetEntry{publicstaticvoidmain(String[]args)throwsIOException{// TODO(developer): Replace these variables before running the sample.StringprojectId="my-project-id";StringentryGroupId="fileset_entry_group";StringentryId="fileset_entry_id";createFilesetEntry(projectId,entryGroupId,entryId);}// Create Fileset Entry.publicstaticvoidcreateFilesetEntry(StringprojectId,StringentryGroupId,StringentryId)throwsIOException{// Currently, Data Catalog stores metadata in the us-central1 region.Stringlocation="us-central1";// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests. After completing all of your requests, call// the "close" method on the client to safely clean up any remaining background resources.try(DataCatalogClientdataCatalogClient=DataCatalogClient.create()){// Construct the Entry for the Entry request.Entryentry=Entry.newBuilder().setDisplayName("My Fileset").setDescription("This fileset consists of ....").setGcsFilesetSpec(GcsFilesetSpec.newBuilder().addFilePatterns("gs://cloud-samples-data/*").build()).setSchema(Schema.newBuilder().addColumns(ColumnSchema.newBuilder().setColumn("first_name").setDescription("First name").setMode("REQUIRED").setType("STRING").build()).addColumns(ColumnSchema.newBuilder().setColumn("last_name").setDescription("Last name").setMode("REQUIRED").setType("STRING").build()).addColumns(ColumnSchema.newBuilder().setColumn("addresses").setDescription("Addresses").setMode("REPEATED").setType("RECORD").addSubcolumns(ColumnSchema.newBuilder().setColumn("city").setDescription("City").setMode("NULLABLE").setType("STRING").build()).addSubcolumns(ColumnSchema.newBuilder().setColumn("state").setDescription("State").setMode("NULLABLE").setType("STRING").build()).build()).build()).setType(EntryType.FILESET).build();// Construct the Entry request to be sent by the client.CreateEntryRequestentryRequest=CreateEntryRequest.newBuilder().setParent(EntryGroupName.of(projectId,location,entryGroupId).toString()).setEntryId(entryId).setEntry(entry).build();// Use the client to send the API request.EntryentryCreated=dataCatalogClient.createEntry(entryRequest);System.out.printf("Entry created with name: %s",entryCreated.getName());}}}
// Import the Google Cloud client library.const{DataCatalogClient}=require('@google-cloud/datacatalog').v1;constdatacatalog=newDataCatalogClient();asyncfunctioncreateFileset(){// Create a fileset within an entry group./*** TODO(developer): Uncomment the following lines before running the sample.*/// const projectId = 'my_project';// const entryGroupId = 'my_entry_group';// const entryId = 'my_entry';// Currently, Data Catalog stores metadata in the us-central1 region.constlocation='us-central1';// Delete any pre-existing Entry with the same name that will be used// when creating the new Entry.try{constformattedName=datacatalog.entryPath(projectId,location,entryGroupId,entryId);awaitdatacatalog.deleteEntry({name:formattedName});}catch(err){console.log('Entry does not exist.');}// Delete any pre-existing Entry Group with the same name// that will be used to create the new Entry Group.try{constformattedName=datacatalog.entryGroupPath(projectId,location,entryGroupId);awaitdatacatalog.deleteEntryGroup({name:formattedName});}catch(err){console.log('Entry Group does not exist.');}// Construct the Entry Group for the Entry Group request.constentryGroup={displayName:'My Fileset Entry Group',description:'This Entry Group consists of ....',};// Construct the Entry Group request to be sent by the client.constentryGroupRequest={parent:datacatalog.locationPath(projectId,location),entryGroupId:entryGroupId,entryGroup:entryGroup,};// Use the client to send the API request.awaitdatacatalog.createEntryGroup(entryGroupRequest);// Construct the Entry for the Entry request.constFILESET_TYPE=4;constentry={displayName:'My Fileset',description:'This fileset consists of ....',gcsFilesetSpec:{filePatterns:['gs://my_bucket/*']},schema:{columns:[{column:'city',description:'City',mode:'NULLABLE',type:'STRING',},{column:'state',description:'State',mode:'NULLABLE',type:'STRING',},{column:'addresses',description:'Addresses',mode:'REPEATED',subcolumns:[{column:'city',description:'City',mode:'NULLABLE',type:'STRING',},{column:'state',description:'State',mode:'NULLABLE',type:'STRING',},],type:'RECORD',},],},type:FILESET_TYPE,};// Construct the Entry request to be sent by the client.constrequest={parent:datacatalog.entryGroupPath(projectId,location,entryGroupId),entryId:entryId,entry:entry,};// Use the client to send the API request.const[response]=awaitdatacatalog.createEntry(request);console.log(`Name:${response.name}`);console.log(`Display name:${response.displayName}`);console.log(`Type:${response.type}`);}createFileset();
# Import required modules.fromgoogle.cloudimportdatacatalog_v1# TODO: Set these values before running the sample.project_id="project_id"fileset_entry_group_id="entry_group_id"fileset_entry_id="entry_id"# For all regions available, see:# https://cloud.google.com/data-catalog/docs/concepts/regionslocation="us-central1"datacatalog=datacatalog_v1.DataCatalogClient()# Create an Entry Group.entry_group_obj=datacatalog_v1.types.EntryGroup()entry_group_obj.display_name="My Fileset Entry Group"entry_group_obj.description="This Entry Group consists of ...."entry_group=datacatalog.create_entry_group(parent=datacatalog_v1.DataCatalogClient.common_location_path(project_id,location),entry_group_id=fileset_entry_group_id,entry_group=entry_group_obj,)print(f"Created entry group:{entry_group.name}")# Create a Fileset Entry.entry=datacatalog_v1.types.Entry()entry.display_name="My Fileset"entry.description="This fileset consists of ...."entry.gcs_fileset_spec.file_patterns.append("gs://my_bucket/*.csv")entry.type_=datacatalog_v1.EntryType.FILESET# Create the Schema, for example when you have a csv file.entry.schema.columns.append(datacatalog_v1.types.ColumnSchema(column="first_name",description="First name",mode="REQUIRED",type_="STRING",))entry.schema.columns.append(datacatalog_v1.types.ColumnSchema(column="last_name",description="Last name",mode="REQUIRED",type_="STRING"))# Create the addresses parent columnaddresses_column=datacatalog_v1.types.ColumnSchema(column="addresses",description="Addresses",mode="REPEATED",type_="RECORD")# Create sub columns for the addresses parent columnaddresses_column.subcolumns.append(datacatalog_v1.types.ColumnSchema(column="city",description="City",mode="NULLABLE",type_="STRING"))addresses_column.subcolumns.append(datacatalog_v1.types.ColumnSchema(column="state",description="State",mode="NULLABLE",type_="STRING"))entry.schema.columns.append(addresses_column)entry=datacatalog.create_entry(parent=entry_group.name,entry_id=fileset_entry_id,entry=entry)print(f"Created fileset entry:{entry.name}")
REST and Command line
REST
If you don't have access to Cloud Client libraries for your language or
want to test the API using REST requests, see the following examples
and refer to the Data Catalog REST APIentryGroups.createandentryGroups.entries.createdocumentation.
Create an entry group
Before using any of the request data,
make the following replacements:
project-id: Your Google Cloud project ID
entryGroupId:
The ID must begin with a letter or underscore, contain
only English letters, numbers and underscores, and be at most 64 characters.
displayName:
The textual name for the entry group.
HTTP method and URL:
POST https://datacatalog.googleapis.com/v1/projects/project-id/locations/region/entryGroups?entryGroupId=entryGroupId
Request JSON body:
{
"displayName": "Entry Group display name"
}
To send your request, expand one of these options:
curl (Linux, macOS, or Cloud Shell)
Save the request body in a file namedrequest.json,
and execute the following command:
Before using any of the request data,
make the following replacements:
project_id: Your Google Cloud project ID
entryGroupId: ID of existing entryGroup. Fileset will be create in this sntryGroup.
entryId: EntryId of the new fileset. ID must begin with a letter or underscore, contain
only English letters, numbers and underscores, and be at most 64 characters.
description: Fileset description.
displayName: The textual name for the fileset entry.
Data Catalog defines entry and entry group roles to
facilitate permission management of filesets and other Data Catalog
resources.
Entry roles
Description
dataCatalog.entryOwner
Owner of a particular entry or group of entries.
Permissions:
datacatalog.entries.(*)
datacatalog.entryGroups.get
Applicability:
Organization, project, and entryGroup level
dataCatalog.entryViewer
Can view details of entry and entryGroup.
Permissions
datacatalog.entries.get
datacatalog.entryGroups.get
Applicability:
Organization, project, and entryGroup level
Entry group roles
Description
dataCatalog.entryGroupOwner
Owner of a particular entryGroup.
Permissions:
datacatalog.entryGroups.(*)
datacatalog entries.(*)
Applicability:
Organization, project, and entryGroup level
dataCatalog.entryGroupCreator
Can create entryGroups within a project. The creator of an entryGroup is automatically granted thedataCatalog.entryGroupOwnerrole.
Permissions:
datacatalog.entryGroups.(get | create)
Applicability:
Organization and project level
Set IAM policies
Users withdatacatalog.<resource>.setIamPolicypermission
can set IAM policies on Data Catalog entry groups
and other Data Catalog resources (seeData Catalog roles).
Console
Navigate to theEntry group detailspage in theData Catalog UIthen use the IAM panel located on the right side to grant or
revoke permissions.
A company with different business contexts for its filesets
creates separateorder-filesanduser-filesentry groups:
Figure 1.An example of how to store order data and user data in different entry groups.
The company grants users the entry group viewer role fororder-files, meaning
they can only search for entries contained in that entry group. Their search
results don't return entries inuser-filesentry group.
Example 2
A company grants the entry group viewer role to a user only in theproject_entry_groupproject. The user will only be able to view
entries within that project.
Search filesets
Users can restrict the scope of search in Data Catalog by using
thetypefacet.type=entry_grouprestricts the search query to
entry groups whiletype=filesetsearches only for filesets.typefacets can be used in conjunction with other facets, such asprojectid.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-04 UTC."],[[["\u003cp\u003eThe Data Catalog API allows users to create and search for Cloud Storage fileset entries, which are organized within user-created entry groups.\u003c/p\u003e\n"],["\u003cp\u003eFileset entries are defined by one to five file patterns that specify Cloud Storage files, with the requirement that each pattern must start with \u003ccode\u003egs://bucket_name/\u003c/code\u003e and wildcards can only be used in the folder and file sections, not bucket names.\u003c/p\u003e\n"],["\u003cp\u003eFilesets can be managed, including listing, editing, and deleting, through the Google Cloud CLI or the Data Catalog API, and they can be queried with Dataflow SQL if they have a defined schema and contain only headerless CSV files.\u003c/p\u003e\n"],["\u003cp\u003eCreating filesets requires first creating an entry group, which allows setting IAM policies to control access to filesets and other entries within it, with roles like \u003ccode\u003edataCatalog.entryOwner\u003c/code\u003e and \u003ccode\u003edataCatalog.entryGroupOwner\u003c/code\u003e managing permissions.\u003c/p\u003e\n"],["\u003cp\u003eUsers can search filesets or entry groups with the gcloud command line using the type facet, allowing them to restrict searches to specific types of entities and can include project IDs to filter down the scope of their search.\u003c/p\u003e\n"]]],[],null,[]]