Train an object detection model with AutoML Vision Edge

An object detection model is similar to an image labeling model, but rather than assign labels to entire images, it assigns labels to regions of images. You can use object detection models to recognize and locate objects in an image or to track an object's movements across a series of images.

To train an object detection model, you provide AutoML Vision Edge a set of images with corresponding object labels and object boundaries. AutoML Vision Edge uses this dataset to train a new model in the cloud, which you can use for on-device object detection.

Before you begin

  • If you don't already have a Firebase project, create one in the Firebase console.

  • Familiarize yourself with the guidelines presented in Inclusive ML guide - AutoML.

  • If you just want to try AutoML Vision Edge, and don't have your own training data, download a sample dataset such as one of the following:

1. Assemble your training data

First, you need to put together a training dataset of labeled images. Keep the following guidelines in mind:

  • The images must be in one of the following formats: JPEG, PNG, GIF, BMP, ICO.

  • Each image must be 30MB or smaller. Note that AutoML Vision Edge downscales most images during preprocessing, so there's generally no accuracy benefit to providing very high resolution images.

  • Include at least 10, and preferably 100 or more, examples of each label.

  • Include multiple angles, resolutions, and backgrounds for each label.

  • The training data should be as close as possible to the data on which predictions are to be made. For example, if your use case involves blurry and low-resolution images (such as from a security camera), your training data should be composed of blurry, low-resolution images.

  • The models generated by AutoML Vision Edge are optimized for photographs of objects in the real world. They might not work well for X-rays, hand drawings, scanned documents, receipts, and so on.

    Also, the models can't generally predict labels that humans can't assign. So, if a human can't assign labels by looking at the image for 1-2 seconds, the model likely can't be trained to do it either.

When you have your training images ready, prepare them to import into Google Cloud. You have two options:

Option 1: Cloud Storage with CSV index

Upload your training images to Google Cloud Storage and prepare a CSV file listing the URL of each image, and, optionally, the correct object labels and bounding regions for each image. This option is helpful when using large datasets.

For example, upload your images to Cloud Storage, and prepare a CSV file like the following:

gs://your-training-data-bucket/001.jpg,accordion,0.2,0.4,,,0.3,0.5,,
gs://your-training-data-bucket/001.jpg,tuba,0.2,0.5,,,0.4,0.8,,
gs://your-training-data-bucket/002.jpg,accordion,0.2,0.2,,,0.9,0.8,,

Object bounding boxes are specified as relative coordinates in the image. See Formatting a training data CSV.

The images must be stored in a bucket that's in the us-central1 region and part of your Firebase project's corresponding Google Cloud project.

Option 2: Unlabeled images

Label your training images and draw object boundaries in the Google Cloud console after you upload them. This is only recommended for small datasets. See the next step.

2. Train your model

Next, train a model using your images:

  1. Open the Vision Datasets page in the Google Cloud console. Select your project when prompted.

  2. Click New dataset, provide a name for the dataset, select the type of model you want to train, and click Create dataset.

  3. On your dataset's Import tab, upload your training images, a zip archive of your training images or a CSV file containing the Cloud Storage locations you uploaded them to. See Assemble your training data.

  4. After the import task completes, use the Images tab to verify the training data.

  5. If you didn't upload a CSV, for each image, draw bounding boxes around the objects you want to recognize and label each object.

  6. On the Train tab, click Start training.

    1. Name the model and select the Edge model type.

    2. Configure the following training settings, which govern the performance of the generated model:

      Optimize model for... The model configuration to use. You can train faster, smaller, models when low latency or small package size are important, or slower, larger, models when accuracy is most important.
      Node hour budget

      The maximum time, in compute hours, to spend training the model. More training time generally results in a more accurate model.

      Note that training can be completed in less than the specified time if the system determines that the model is optimized and additional training would not improve accuracy. You are billed only for the hours actually used.

      Typical training times
      Very small sets1 hour
      500 images2 hours
      1,000 images3 hours
      5,000 images6 hours
      10,000 images7 hours
      50,000 images11 hours
      100,000 images13 hours
      1,000,000 images18 hours

3. Evaluate your model

When training completes, you can click the Evaluate tab to see performance metrics for the model.

One important use of this page is to determine the confidence threshold that works best for your model. The confidence threshold is the minimum confidence the model must have for it to assign a label to an image. By moving the Confidence threshold slider, you can see how different thresholds affect the model’s performance. Model performance is measured using two metrics: precision and recall.

In the context of image classification, precision is the ratio of the number of images that were correctly labeled to the number of images the model labeled given the selected threshold. When a model has high precision, it assigns labels incorrectly less often (fewer false positives).

Recall is the ratio of the number of images that were correctly labeled to the number of images that had content the model should have been able to label. When a model has high recall, it fails to assign any label less often (fewer false negatives).

Whether you optimize for precision or recall will depend on your use case. See the AutoML Vision beginners' guide and the Inclusive ML guide - AutoML for more information.

When you find a confidence threshold that produces metrics you're comfortable with, make note of it; you will use the confidence threshold to configure the model in your app. (You can use this tool any time to get an appropriate threshold value.)

4. Publish or download your model

If you are satisfied with the model's performance and want to use it in an app, you have three options, from which you can choose any combination: deploy the model for online prediction, publish the model to Firebase, or download the model and bundle it with your app.

Deploy the model

On your dataset's Test & use tab, you can deploy your model for online prediction, which runs your model in the cloud. This option is covered in the Cloud AutoML docs. The docs on this site deal with the remaining two options.

Publish the model

By publishing the model to Firebase, you can update the model without releasing a new app version, and you can use Remote Config and A/B Testing to dynamically serve different models to different sets of users.

If you choose to only provide the model by hosting it with Firebase, and not bundle it with your app, you can reduce the initial download size of your app. Keep in mind, though, that if the model is not bundled with your app, any model-related functionality will not be available until your app downloads the model for the first time.

To publish your model, you can use either of two methods:

  • Download the TF Lite model from your dataset's Test & use page in the Google Cloud console, and then upload the model on the Custom model page of the Firebase console. This is usually the easiest way to publish a single model.
  • Publish the model directly from your Google Cloud project to Firebase using the Admin SDK. You can use this method to batch publish several models or to help create automated publishing pipelines.

To publish the model with the Admin SDK model management API:

  1. Install and initialize the SDK.

  2. Publish the model.

    You will need to specify the model's resource identifier, which is a string that looks like the following example:

    projects/PROJECT_NUMBER/locations/us-central1/models/MODEL_ID
    PROJECT_NUMBER The project number of the Cloud Storage bucket that contains the model. This might be your Firebase project or another Google Cloud project. You can find this value on the Settings page of the Firebase console or the Google Cloud console dashboard.
    MODEL_ID The model's ID, which you got from the AutoML Cloud API.

    Python

    # First, import and initialize the SDK.
    
    # Get a reference to the AutoML model
    source = ml.TFLiteAutoMlSource('projects/{}/locations/us-central1/models/{}'.format(
        # See above for information on these values.
        project_number,
        model_id
    ))
    
    # Create the model object
    tflite_format = ml.TFLiteFormat(model_source=source)
    model = ml.Model(
        display_name="example_model",  # This is the name you will use from your app to load the model.
        tags=["examples"],             # Optional tags for easier management.
        model_format=tflite_format)
    
    # Add the model to your Firebase project and publish it
    new_model = ml.create_model(model)
    new_model.wait_for_unlocked()
    ml.publish_model(new_model.model_id)
    

    Node.js

    // First, import and initialize the SDK.
    
    (async () => {
      // Get a reference to the AutoML model. See above for information on these
      // values.
      const automlModel = `projects/${projectNumber}/locations/us-central1/models/${modelId}`;
    
      // Create the model object and add the model to your Firebase project.
      const model = await ml.createModel({
        displayName: 'example_model',  // This is the name you use from your app to load the model.
        tags: ['examples'],  // Optional tags for easier management.
        tfliteModel: { automlModel: automlModel },
      });
    
      // Wait for the model to be ready.
      await model.waitForUnlocked();
    
      // Publish the model.
      await ml.publishModel(model.modelId);
    
      process.exit();
    })().catch(console.error);
    

Download & bundle the model with your app

By bundling your model with your app, you can ensure your app's ML features still work when the Firebase-hosted model isn't available.

If you both publish the model and bundle it with your app, the app will use the latest version available.

To download your model, click TF Lite on your dataset's Test & use page.

Next steps

Now that you have published or downloaded the model, learn how to use the model in your iOS+ and Android apps.