Model Architectures

Transfer Learning Toolkit supports many detection architectures.

There are 13 backbones supported by TLT. For a complete list of all the permutations that are supported see the matrix below.

Backbone

Image Classification

Object Detection

Object Detection

Object Detection

Object Detection

Object Detection

Object Detection

DetectNet_V2

FasterRCNN

SSD

YOLOV3

RetinaNet

DSSD

ResNet10/18/34/50/101

Yes

Yes

Yes

Yes

Yes

Yes

Yes

VGG 16/19

Yes

Yes

Yes

Yes

Yes

Yes

Yes

GoogLeNet

Yes

Yes

Yes

Yes

Yes

Yes

Yes

MobileNet V1/V2

Yes

Yes

Yes

Yes

Yes

Yes

Yes

SqueezeNet

Yes

Yes

No

Yes

Yes

Yes

Yes

DarkNet 19/53

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Model Requirements

Classification

  • Input size: 3 * H * W (W, H >= 16)

  • Input format: JPG, JPEG, PNG

Object Detection

  • Image format: JPG, JPEG, PNG

  • Label format: KITTI detection

    label 0.00 0 0.00 x1 y1 x2 y2 0.00 0.00 0.00 0.00 0.00 0.00 0.00

  • Input shapes

    shape

    DetectNet_v2

    FasterRCNN

    SSD

    YOLOV3

    RetinaNet

    DSSD

    Height

    >=272 & multiple of 16

    >=160

    >=128 & multiple of 32

    >=128 & multiple of 32

    >=128 & multiple of 32

    >=128 & multiple of 32

    Width

    >= 480 & multiple of 16

    >=160

    >=128 & multiple of 32

    >=128 & multiple of 32

    >=128 & multiple of 32

    >=128 & multiple of 32

    Channel

    1 or 3

    1 or 3

    1 or 3

    1 or 3

    1 or 3

    1 or 3

In the next section, you will find instructions about preparing the dataset for training a model.