Run TLT Container

Learn how to build and run the TLT-Trainer container.

In order to start training a model you need two things

  1. Dataset in KITTI format

  2. TLT-Trainer container

If you have not already prepared your dataset please follow the instructions from the Dataset Preparation section.

This section is focused on building and running the TLT-Trainer container.

Steps

  1. Clone the TLT-Trainer repository locally

    git clone https://www.smartcow.dev/SmartCow/TLT-Trainer.git
    cd TLT-Trainer
  2. Run the build command

    docker build -t tlt_trainer .
  3. Run the container

    Make sure you have already prepared the dataset. To run the container you need to pass following parameters.

    • volume mount for dataset path and project path

    • port number to access the Trainer API

    nvidia-docker run --rm --gpus all -it \
    -v /home/user/dataset:/dataset \
    -v /home/user/project:/project \
    -p 1004:5000 \
    --name tlt-ssd-resnet18 --hostname tlt \
    tlt_trainer python run.py

    Optionally you can also pass an environment variable TRAINER_CONFIG with JSON format training configuration and replace run.py with entrypoint.py in the nvidia-docker run command to automatically start training when the container is started, we will discuss it in the later section.

The next section shows an example of running the training and retrieving live training stats through the API.