Label Google Drive files automatically using AI classification

This feature is included with Frontline Plus and Enterprise Plus. It's also included with the Gemini Enterprise, Gemini Education Premium, and AI Security add-ons. Compare add-ons

AI classification can automatically label your organization’s sensitive content in Google Drive using custom AI models your organization trains, without the need for programming. As an administrator, you control which data your models train on, so each model is unique to and can be used only by your organization. You can create up to 5 unique AI classification models for your organization.

You can leverage your AI-classified files in security policies like data protection rules, Vault, and more.

Note: To be labeled by AI classification, files must be in shared drives or owned by users with licenses that support classification labels .

How Al classification works

Here's an overview of the steps you'll follow to set up AI classification to automatically label new and existing files in Drive.

1. Create a model :First, you choose or create a classification label, which the AI model will automatically apply to files after it's trained. You also create the training label, which is used to train the model to identify your organization's sensitive content. Then you create an AI model to use these labels.

2. Train the model :After you create your labels, designated labelers classify Drive files with the training label to create your training dataset. Your model then uses the dataset to learn how to classify sensitive files.

3. Turn on AI classification : Once the model is trained, you can set up automatic file labeling, called auto-apply. During setup, you select which label options to enable and which users own the files on which you want AI classification to apply labels. Your model then starts to automatically label sensitive files.

4. Monitor your model : You can use Drive events log to monitor how many files were classified, as well as how many users accepted or modified an auto-applied label (if they have permissions).

Before you begin

Understand how classification labels workand how to create them. For details, go to Get started as a classification labels admin .
Choose your designated labelers—a group of users at your organization who can correctly apply the training label manually to sensitive files.
Create a configuration groupjust for your designated labelers. For instructions, go to Customize service settings with configuration groups .

Create a model

Expand all | Collapse all

To create a model, you first need to select an existing classification label or create a new one. Next, you need to create a matching training label—either automatically (recommended) or manually using label manager—which your designated labelers will use.

Choose or create a classification label

Your classification must be enabled for Drive and Docs. After training, the AI model automatically applies your classification label to sensitive Drive files. The model is trained on only one field per label, which must be either a badge list or an options list.

We recommend a badged sensitivity label, since it shows prominently on documents:

When you use an options list or a badge list field for a classification label, it must:

Have at least 2 and no more than 7 options
Be published

If you have an existing label that meets these requirements, you can use it as a classification label. Otherwise, use label manager to create a label, either before or when setting up the model (later on this page). For details, go to Create classification labels for your organization .

Create a training label

Your training label is nearly identical to the classification label and is used only for training purposes by designated labelers. When creating your model (later on this page), you can automatically create the training label so you can be sure it matches the classification label.

You can also choose to create your own training label manually using label manager, either before or when setting up the model. For details, see How do I manually create a training labels? later on the page.

Create the model

Sign in with an administrator account to the Google Admin console.
If you aren’t using an administrator account, you can’t access the Admin console.
Go to Menu Security > Access and data control > Data classification .
Requires having the Manage Classification Labels administrator privileges.
In the AI classificationsection, click Create model.
In the Classification labellist, select an existing classification label and field to train a model for, or click Create label to create one using label manager.
If you created a label in label manager, return to the Create modelpage. You might need to refresh the page to see your new label in the list.
For your classification label, select the eligible field you want to use in Field namelist.
Click Continue.
(Optional) Automatically create and publish a training label that matches your classification label:
1. Click Create training label.
2. Click Update label permissions in the message that appears. The label opens in Edit mode in label manager in a separate tab.
3. Click Permissions Edit, then grant the Can apply labels and set valuespermission to the configuration group that contains your labelers.
4. Click Saveand close the label manager tab.
  Note:You can also set label permissions later. But it’s important that only your labelers have access to the training label.
(Optional) If you already created a training label, select it in the Training labellist.
(Optional) Create your own training label now by clicking Go to label manager.
Important: Make sure your label meets the training label criteria and you set label permissions so only your labelers can access it. For details, go to training label guidelines later on the page.

Return to the Create modelpage. You might need to refresh the page to see your new training label in the list.
On the Create modelpage, click Continue.
Enter a descriptive name for the model.
Click Create model.

After you create your model, the Model detailspage shows your selected training label and classification label.

Train the model

Expand all | Collapse all

To train the AI model, you need to create a training dataset and then start its initial training run. During a training run, the model learns from the examples in the dataset.

Retraining is automatic:After the initial training run, your model retrains every 2 weeks to help improve or keep its level of accuracy. You can retrain your model manually at any time. After each training run, a new model is released, and the automatic 2-week retraining schedule is reset.

Create a training dataset

To create a training dataset, your designated labelers need to apply the training label at least 100 files per label option. For example, if your label has 3 options—say “Need to Know”, “Confidential” and “Public”—you need at least 300 training files. However, it's best to have more than 100 files per label option, because it's likely that some files won't be eligible for the training dataset. Learn about labeling high-quality examples for training .

Note: Your training dataset can have a maximum of 1 million files.

After you create the model, it automatically checks to see how many files have been labeled for training in about 24 hours. After that, it checks continuously throughout the day.

To check how many files have been labeled:

Sign in with an administrator account to the Google Admin console.
If you aren’t using an administrator account, you can’t access the Admin console.
Go to Menu Security > Access and data control > Data classification .
Requires having the Manage Classification Labels administrator privileges.
In the AI classificationsection, click View nn models.
Under Actionsfor the model, select View details.
In the panel at the top of the page, under Training files for active model, view the number of labeled files.

Start a training run

A training run typically takes 4 to 6 hours, but can take longer for larger datasets. Your model will likely need multiple training runs to learn how to label your files accurately.

During a training run, the model compares the classification it selects for a file to the training label applied to the file to generate scores. For details, go to How are scores calculated .

After a training run, you can check the accuracy of the model.

To start a training run:

Sign in with an administrator account to the Google Admin console.
If you aren’t using an administrator account, you can’t access the Admin console.
Go to Menu Security > Access and data control > Data classification .
Requires having the Manage Classification Labels administrator privileges.
In the AI classificationsection, click View nn models.
On the Model detailspage, under Actionsfor the model, select View details.
In the training panel at the top of the page, click Start a training run.
Note:This button is available only if your labelers have labeled the minimum number of training files.

After training: Check model scores

After a training run, your model is released with percentage scores for each label option. Each score, called a recall score , is the percentage of training examples the model classified correctly after testing itself:

Below 50%—Low accuracy. The model needs better data and isn’t ready yet.
From 50-80%—Medium accuracy. The model may be ready on a limited basis.
Above 80%—High accuracy. The model is ready to classify files for your organization.

To check the accuracy of your model after a training run:

On the Model detailspage, you can view model scores:

In the training results panel at the top of the page, under Current files used and scores
In the Current training datasetpanel

Turn on AI classification

Expand all | Collapse all

After the AI model is trained to achieve a minimum level of accuracy (at least 50%), you can choose label options and turn on automatic file labeling, or auto-apply. However, for best results, it's recommended to wait for your model scores for all label options to reach at least 80%.

To turn on auto-apply

Sign in with an administrator account to the Google Admin console.
If you aren’t using an administrator account, you can’t access the Admin console.
Go to Menu Security > Access and data control > Data classification .
Requires having the Manage Classification Labels administrator privileges.
In the AI classificationsection, click View nn models.
On the Model detailspage, under Actionsfor the model, select View details.
Under AI-labeled files, click Edit auto-apply.
Note:This button is available only if your at least 1 label option has reached 50% accuracy.
Check the boxes for the label options you want to allow the AI model to auto-apply.
Click Save and continueto select which organizational units or groups own the files on which the model should auto-apply labels. The default setting is your top-level parent organization.
Or click Saveto select users later.
If you chose to select users, at the side, select an organizational unitor configuration group. Show me how
Group settings override organizational units. Learn more
Click On - Label is auto-applied with one of the options below.
Click Save.
On the Model detailspage, Current auto-apply statusfor the rule is On.

Note:You can monitor AI classification using the Drive events log. For details, see Monitor AI classification label events later on this page.

When AI classification scans files

After auto-apply is turned on for files owned by users and shared drives, AI classification scans their files (at rest) at least once within 1 to 2 weeks. AI Classification also scans files whenever they're uploaded or modified, and might change the applied label if the file's content changes.

How auto-apply conflicts are handled

Data protection rules

Label values set by data protection rules take priority over AI classification, and both take priority over default classification.

Multiple rules

When 2 or more of the same kind of rules try to apply different label options to the same file, the option that's higher in the label's options list is applied. For example, you might have a label with a field that has 3 options in the label manager:

Confidential
Internal
Public

If Rule 1 tries to set the label as Confidential, and Rule 2 tries to set the label as Publicfor the same file, Confidentialis applied. Make sure that a label's field options are listed in your preferred order of priority before setting up rules.

User-applied labels

Labels that users apply to files take priority over AI-applied labels—that is, AI classification doesn't modify a label that a user previously set.

Monitor your model

Expand all | Collapse all

Get details on how AI classification is labeling files in the Drive events log. For each label option, the log shows many files were classified using auto-apply and how many users accepted the auto-applied label or modified it. Users need permissions to take actions on auto-applied labels. Permissions users need to interact with auto-applied labels

Users need file and label permissions to interact with auto-applied labels. You can set permissions for your classification label in label manager. For details, see Create classification labels for your organization .

To view auto-applied labels, users need the Can view this labelpermission for your classification label.
To accept and modify auto-applied labels, users need the Can apply labels and set values permission for your classification label and must be an Editoror Owneron the file.

View AI classification events in the Drive events log

Sign in with an administrator account to the Google Admin console.
If you aren’t using an administrator account, you can’t access the Admin console.
Go to Menu Security > Access and data control > Data classification .
Requires having the Manage Classification Labels administrator privileges.
In the AI classificationsection, click View nn models.
On the Model detailspage, under AI-labeled files, select View filesfor the label option you want to view events for.
The Security Investigation Tool opens in a new tab, showing search results for the Drive events log for two AI classification-related events: Label appliedand Label field value changed.
Click the event Descriptionto get additional details, such as:
- Name and type of the document that was labeled
- Label field value assigned to the document (for example, Confidential or Restricted)

Manage your model

Expand all | Collapse all

Turn off auto-apply for a classification label

To turn off auto-apply for all or just specific label options:

Sign in with an administrator account to the Google Admin console.
If you aren’t using an administrator account, you can’t access the Admin console.
Go to Menu Security > Access and data control > Data classification .
Requires having the Manage Classification Labels administrator privileges.
In the AI classificationsection, click View nn models.
On the Model detailspage, under Actionsfor the model, select View details.
Under AI-labeled files, click Edit auto-apply.
Uncheck the boxes for the label options for which you want to turn off auto-apply.
Or, to completely pause auto-apply, uncheck all options.

To turn off auto-apply completely for specific organizational units or groups:

You can turn off auto-apply completely for content owned by users in specific organizational units or groups.

Sign in with an administrator account to the Google Admin console.
If you aren’t using an administrator account, you can’t access the Admin console.
Go to Menu Security > Access and data control > Data classification .
Requires having the Manage Classification Labels administrator privileges.
In the AI classificationsection, click View nn models.
On the Model detailspage, under Actionsfor the model, select View details.
In the More actions, menu at the top of the page, click Manage auto-apply Update enabled OUs/Groups .
Click an organizational unit or group on the left to select it.
Select Off - Label is not auto-applied.
Click Save.

Delete a model

You may need to delete a model—for example, because model accuracy is not improving. If you delete a model, all its AI classification settings are permanently removed. Note:

Labels used only in this model are removed from classification settings, and all versions of the model are deleted.
Training labels remain on the files. After deleting the model, you can choose to configure a new model to use the same training label (or a different one).
Any auto-apply labeling you turned on for this model stops immediately, but labels previously auto-applied remain on files.
Model results will be similar if you retrain on your existing training label and training files.
If you recreate the same classification label for a new model, the AI classification feature ignores and overwrites the classifications of previous models. In this way, you can "reprocess" your organization's Drive files. This can be useful if you made significant improvements to model quality since your initial deployment.

To delete a model:

Sign in with an administrator account to the Google Admin console.
If you aren’t using an administrator account, you can’t access the Admin console.
Go to Menu Security > Access and data control > Data classification .
Requires having the Manage Classification Labels administrator privileges.
In the AI classificationsection, click View nn models.
On the Model detailspage, under Actionsfor the model, select View details.
On the Model details page, for Actionsat right, click Delete model.
The Delete modeldialog lists the effects of deleting the model.
To continue, click Delete model.

FAQ

Expand all | Collapse all

Training and classification labels

What are the requirements for the training and classification labels?

Both the classification label and the training label must meet the following criteria:

Must contain a minimum of 2, and a maximum of 7 options.
Must have their options in the same order.
For example, if the classification label has options in this order:
1. Option 1
2. Option 2
3. Option 3
The training label options can’t be ordered as follows:
1. Option 2
2. Option 1
3. Option 3
Must be published.
Have labels with different access permissions. The training label should be available only to designated labelers who can train the model. The classification label can have broader access.

How do I manually create a training label?

Although it's best practice to create the training label automatically when setting up your model, you can create one manually in label manager by following these guidelines:

Make sure the label meets the required label criteria .
Identify the training label with the word "train" or "training" to make it easier for your designated labelers to recognize the label and apply it when created your training dataset.
Add a description field to the training label to further help designated labelers understand its purpose.
Be sure to set the label permissions to onlyyour designated labelers—that is, those who will identify files for model training—using the the configuration group you created for labelers. Labelers need the Can apply labels and set valuespermission. For details, go to Create classification labels for your organization .

Can I use the classification label as the training label?

No, the classification label and the training label must be different. The label you choose as your classification label is not available for the training label.

Training datasets

What are good files for the model to train on?

For best results in training the model, have your designated labelers follow these guidelines:

Ensure each file has a minimum of 500 characters.
Select files that represent content users create, share, and use in your organization.
Label roughly the same number of files per label option, with a minimum of 100 files for each option. This helps the model to gain a comprehensive understanding of your data and improve scores.
Include a representative variety of files for each option type. For example, don't label 100 resumes as your total set of example files for Top Secret if contracts are also a common Top Secret file type in your organization.
Apply the training label only to files owned by your organization, either owned directly by users or stored in shared drives. AI classification doesn’t process files that external users own or are located in external shared drives.

Can the model be trained on previously labeled files?

Training on previously labeled files isn't currently possible. A model requires a training label to be a replica of the label that it will auto-apply to files, but they can't be the same label.

Can the model train on multiple languages?

The model does support multiple languages; however, a representative sample of files for each option type and language should be included in the training data. This increases the number of files required to successfully train the model. Only Latin character-based languages are supported.

How are scores calculated during training?

During training, the AI model uses 75% of the input data to train itself on how to label files and reserves 25% to periodically test its own performance. In other words, for 25% of the labeled files, the model analyzes those files as if it didn’t know what label has been applied. The AI model then makes its own label choice and compares that choice with the actual label applied by the designated labeler. The scores show what proportion of the reserved files it correctly assigned the right label to.

Once I train a model, can I “freeze” it to stop retraining automatically?

AI classification models train using files in Drive. When those files are deleted (often on retention schedules through Google Vault) the model also needs to be subsequently deleted to ensure the files' content doesn't persist in some fashion. For this reason, model retraining is done on a continuous loop and can't be suspended.

Can users change or fix labels and field values?

Users with permission can update a label or field value, but AI classification doesn’t revise the classification model based on that change. If you notice the model has applied labels and field values incorrectly, you can ask your designated labelers to assign the correct training label to the files. AI classification then incorporates this data into the next model self-training cycle.

Auto-apply

Can AI classification evaluate images, video, and audio files?

AI classification uses the same indexable text processing as Drive DLP. Any file from which Drive can extract indexable text can be evaluated for AI classification-applied labels. This includes Optical Character Recognition (OCR) to extract text from images. However, AI classification doesn't evaluate video or audio files.

Does AI classification work for labeling only sensitive content?

Sensitive content is the primary focus for AI classification, but any label with up to 4 options can be trained for automatic labeling. Classification labels are also used for auditing, findability, and retention management.

Does AI classification work when Client-side encryption (CSE) is turned on?

Because Google can't decrypt files encrypted with CSE (only your private encryption key can), AI classification can't train on files encrypted with CSE and can't auto-apply labels to these files.

How and when does AI classification revise the auto-applied labels?

After auto-apply is turned on, AI classification scans and classifies all files at rest for which sufficient text can be extracted. These files are scanned at least once.

AI classification reprocesses files periodically as content is modified. Content changes may result in a different prediction for a file. When AI classification has both an old and a new predicted option for a file, it will prefer the option that is higher in the option list. For example, if a field has three options listed in the label manager:

Confidential
Internal
Public

Suppose AI classification classifies a file as Internal, and the content changes so that the AI classification model predicts Confidential. In this case, the classification on the file is changed to Confidential. However, if the AI classification model predicts Public, the classification on the file remains as Internal.

AI classification doesn't revise auto-applied labels and field values that have been reviewed or modified by users.

Does AI classification take priority over other classification methods when several are active?

Data classification can be overridden. Data classification is done in the following order:

DLP rule without user overwrite
Manual classification
DLP rule with user overwrite
AI classification
Default classification

Removing a label or field allows a lower-tier classification mechanism to take effect. For example, a file with a label removed by a user can later have the same label auto-applied by AI classification.

What types of files can AI classification apply labels to?

AI classification uses the same indexable text processing as Drive DLP. For details, see the list of file types scanned by DLP . Audio and video files aren’t supported.
A file must have a minimum amount of text for AI classification to apply a label. As a result, some files, such as very short documents and images with small amounts of text, might not get classified.

What happens when an option is disabled for auto-apply?

During scanning, if a file is predicted to have an option for which auto-apply is disabled, AI Classification applies no label or field value to the file.

Files that AI classification has previously labeled retain the applied label and option values even after the option is disabled.

Can I roll back auto-applied labels?

You can't undo the application of labels. We recommend that you refine and test your models with a small audience before broad deployment. For example, you can train your models with a temporary label. Then, once the model performance is satisfactory, you can "reset" the model by deleting it and creating a new model with the same training label (same training data set) but with your permanent label.

Licensing

How does the feature work for users without an eligible license?

If an admin in your organization has a license that supports AI classification, they can train a model. Designated labelers (the users who apply the training label) don't need to have a license with AI classification.

Files with the training label can be owned by any users with a license that supports Drive classification labels . However, AI classification only labels files that are in shared drives or owned by users with licenses that support AI classification. Files owned by users without a supported license aren't processed by AI classification.

If no users have a license that supports AI classification, auto-apply is turned off and the classification model is deleted. However, training labels and labels applied by AI classification persist on files.

Was this helpful?

How can we improve it?

Label Google Drive files automatically using AI classification

How Al classification works

Before you begin

Create a model

Train the model

Turn on AI classification

Monitor your model

Manage your model

FAQ

Training and classification labels

Training datasets

Auto-apply

Licensing

Related topic

Was this helpful?

Need more help?

Try these next steps: