Create a conversation dataset

A conversation dataset contains conversation transcript data, and is used to train either a Smart Reply or Summarization custom model. Smart Reply uses the conversation transcripts to recommend text responses to human agents conversing with an end-user. Summarization custom models are trained on conversation datasets that contain both transcripts and annotationdata. They use the annotations to generate conversation summaries to human agents after a conversation has completed.

There are two ways to create a dataset: Using the Console tutorial workflows, or manually creating a dataset in the Console using the Data -> Datasetstab. We recommend that you use the Console tutorials as a first option. To use the Console tutorials, navigate to the Agent Assist Console and click the Get startedbutton under the feature you'd like to test.

This page demonstrates how to create a dataset manually.

Before you begin

Follow the Dialogflow setup instructions to enable Dialogflow on a Google Cloud Platform project.
We recommend that you read the Agent Assist basics page before starting this tutorial.
If you are implementing Smart Reply using your own transcript data, make sure your transcripts are in JSON in the specified format and stored in a Google Cloud Storage bucket . A conversation dataset must contain at least 30,000 conversations, otherwise model training will fail. As a general rule, the more conversations you have the better your model quality will be. We suggest that you remove any conversations with fewer than 20 messages or 3 conversation turns (changes in which participant is making an utterance). We also suggest that you remove any bot messages or messages automatically generated by systems (for example, "Agent enters the chat room"). We recommend that you upload at least 3 months of conversations to ensure coverage of as many use cases as possible. The maximum number of conversations in a conversation dataset is 1,000,000.
If you are implementing Summarization using your own transcript and annotation data, make sure your transcripts are in the specified format and stored in a Google Cloud Storage bucket . The recommended minimum number of training annotations is 1000. The enforced minimum number is 100.
Navigate to the Agent Assist Console . Select your Google Cloud Platform project, then click on the Datamenu option on the far left margin of the page. The Datamenu displays all of your data. There are two tabs, one each for conversation datasetsand knowledge bases.
Click on the conversation datasetstab, then on the +Create newbutton at the top right of the conversation datasets page.