Start Training

This section assumes that you have already built and run the TLT-Trainer container.

Recall from the last section, we added the following arguments while running the container.

  • volume mounts

    • -v /home/user/dataset:/dataset

    • -v /home/user/project:/project

  • port binding

    • -p 1004:5000

First, let's quickly check the API

  1. Do a curl request from the host machine or simply visit the below address in the browser.

    curl http://localhost:1004/api/v1/check
  2. It should return {"success": "API running"}

Training

Training a model is as simple as sending a post request to the API with training config as a JSON string.

  • Make sure you have obtained an API KEY from NGC.

  • images / annotations values are the path of dataset inside the container.

Use the below command to start the training.

curl --header "Content-Type: application/json" \
--request POST \
--data '{ "api_key":"<API-KEY>",
"images": "/dataset/images",
"annotations": "/dataset/annotations",
"resize":{"width": 300, "height": 300, "padding": false},
"classes": ["face", "person"],
"architecture":{
"name": "SSD",
"backbone":"ResNet10",
"batch_size":32,
"epochs":50
}
}' \
http://localhost:1004/api/v1/run

Expected response {"success": "flow started"}

The next section explains about other endpoints that can be used to monitor training progress or to collect training data for plotting useful graphs.