Create a custom translation model

Train and use a custom translation model by using the Google Cloud console. The following example uses AutoML Translation to train an English-to-Spanish translation model by using a dataset that contains technology-oriented segment pairs from software localization.

Before you begin

Before you can start using AutoML Translation, your project must have the Cloud Translation API enabled, and you must have the permissions that are granted by the following roles:

  • Viewerrole to view existing resources in your project
  • Cloud Translation API Editorrole to create and manage datasets and models
  • Storage Adminrole to upload training data to a Cloud Storage bucket

Create a translation dataset and import segment pairs

  1. Download the archive file that contains the sample data for training the model, and extract the files.

    For this tutorial, you'll use the English to Spanish TSV file.

  2. Go to the AutoML Translation console.

    Go to the Translation page

  3. From the navigation pane, click Datasetsto go to the Datasetspage.

  4. Click Create dataset.

  5. In the Create datasetdialog, specify details about the dataset:

    1. Enter tutorial_dataset as the name for the dataset.
    2. Select English (EN)as your source language from the drop-down list.
    3. Select Spanish (ES)as your target language.
    4. Click Create.
  6. After the dataset is created, click the dataset name to view its details.

  7. Go to the Importtab and upload the en-es.tsv dataset to Cloud Storage:

    1. Select Upload files from your computer.
    2. Click Select files, and choose the en-es.tsv file that you previously downloaded and extracted.
    3. Click Browseto select or create a new Cloud Storage bucket where your TSV is stored. The bucket region must be us-central1 .
  8. Click Continue.

    AutoML Translation automatically splits your data into training, validation, and testing sets. You can view these splits and the imported sentence pairs in the Sentencestab of your dataset.

Train a model

  1. Go to the AutoML Translation console.

    Go to the Translation page

  2. From the navigation pane, go to the Datasetspage.

  3. Click the tutorial_datasetdataset.

  4. Go to the Traintab.

  5. Click Start training, which opens the Train new modelpane.

  6. Enter tutorial_model for the model name.

  7. Click Start training.

Training a model can take several hours to complete.

Evaluate the model

Check to see how the model compares to the default Google NMT model that is based on segment pairs from your test set.

  1. Go to the AutoML Translation console.

    Go to the Translation page

  2. From the navigation pane, go to the Modelspage.

  3. Click the tutorial_modelmodel.

  4. Click the Evaluatetab.

In the Previous evaluationssection, Cloud Translation shows your model's BLEU score compared to the Google NMT model. The BLEU (Bilingual Evaluation Understudy) score indicates how similar the candidate text is to the reference texts; values closer to 100 represent more similar texts.

Use the translation model

From the Google Cloud console, you can use your custom model to translate some text.

  1. Go to the AutoML Translation console.

    Go to the Translation page

  2. From the navigation pane, go to the Modelspage.

  3. Click the tutorial_modelmodel.

  4. Click the Predicttab.

  5. In the Englishtext box, enter text to translate and then click Translate.

    You can compare the results from your custom model to the Google NMT model.

Clean up

To avoid unnecessary Google Cloud charges, delete your model , dataset , and en-es.tsv file. You can also use the Google Cloud console to delete your project if you don't need it.

What's next

Design a Mobile Site
View Site in Mobile | Classic
Share by: