Vision - Object Detection & Real-Time Prediction in Browser.

In this project, I built an object detection model using 'CenterNet HourGlass104 Keypoints 512x512' which was pre-trained on COCO-2017 dataset containing 80 different object categories. This model also contained human pose keypoints which were also used during visualization of predictions.

There are many other pretrained models present on tensorflow-hub object detection and I used CenterNet because of it speed and accuracy rate, while in-case of live predictions in browser in used MobileNet / COCO-SSD which is light, less complex, faster inference speed but less accurate then CenterNet. (CenterNet Paper)

Other details like 80-categories, building process, performance are in the following sections.

  • Dataset and its Categories

    The CenterNet model was pretrained on COCO-2017 datasets, but there are latest models present in the tensorflow-hub which are trained on COCO-2020 datasets. The COCO have many different datasets ranging from keypoints to object segmentation. The CenterNet model was pretrained on following categories:

    1: person 2: bicycle 3: car 4: motorcycle 5: airplane 6: bus 7: train 8: truck 9: boat 10: traffic light 11: fire hydrant 12: stop sign 13: parking meter 14: bench 15: bird 16: cat 17: dog 18: horse 19: sheep 20: cow 21: elephant 22: bear 23: zebra 24: giraffe 25: backpack 26: umbrella 27: handbag 28: tie 29: suitcase 30: frisbee 31: skis 32: snowboard 33: sports ball 34: kite 35: baseball bat 36: baseball glove 37: skateboard 38: surfboard 39: tennis racket 40: bottle 41: wine glass 42: cup 43: fork 44: knife 45: spoon 46: bowl 47: banana 48: apple 49: sandwich 50: orange 51: broccoli 52: carrot 53: hot dog 54: pizza 55: donut 56: cake 57: chair 58: couch 59: potted plant 60: bed 61: dining table 62: toilet 63: tv 64: laptop 65: mouse 66: remote 67: keyboard 68: cell phone 69: microwave 70: oven 71: toaster 72: sink 73: refrigerator 74: book 75: clock 76: vase 77: scissors 78: teddy bear 79: hair drier 80: toothbrush.

  • Model and Images

    Model was directly loaded with pretrained weights from the tensorflow-hub using hub library. And was directly used for inference with out converting the model into high level Keras model and with out fine-tuning the model because the model is very complex and to train even the output layer or last few layers on custom dataset would have taken alot of time and resources. While in case of testing images, which were converted into bytes and then stored in np.array to make it easily compatible with tensor and input for the model. Also I didnt resize the images as there is build in layer in model for resizing and normalization.
    CenterNet output contains:

    # Model returns
    1 detection_classes
    2 detection_keypoint_scores
    3 detection_boxes
    4 num_detections
    5 detection_keypoints
    6 detection_scores

Model Predictions

After loading and preprocessing the images, I used Tensorflow object detection API for predictions and visulization of boxes and keypoints otherwise I would've to create custom functions to create bounding boxes and pose keypoints which would've taken alot of time.
Some of the predictions examples are:

Future experiments can be done to make object segmentation and to detect and read specfic objects like car license plates.

To view the source code, datasets and models, please visit my Github page.

Information

  • Project Name:
    Object Detection
  • Tool:
    Tensorflow
    Python
    JS
    Colaboratory
  • Datasets:
    COCO
  • Architecture:
    CNN
  • Models:
    CenterNet
    COCO-SDD/ MobileNet

Live Object Dectection on Photos.

It uses the COCO-SSD model pre-trained on 80-Categories which was developed by the TensorFlow.js team in 2018, where it is currently maintained on TensorFlow.js. In this project I'm using ml5 library to load and detect objects from images and p5.JS to create boxes and labels.

No file chosen, yet!

Live Object Detections using WebCam.

It uses the same model as above COCO-SSD, the difference is that is this section it take webcam frames as input to detect objects, which was also developed using ml5 and p5.JS libraries.

Please wait, it will take few seconds to load and open webcam
when you'll click the button.