The video object detection pipeline is designed using openCV. The videos are loaded and a frame rate of 25FPS is assumed. Using the frames the tf.session is initiated and TF Object detection API model is run. The resulting bounding boxes are drawn using openCV.
Resulting images with bounding boxes are written to video with 1FPS using openCV videowriter. After that the generated video can be served using data or blob url depending upon the video so generated.
The segmentation model is not working properly for most of the cases. The obvious solution for it would be to retrain the entire model. The segmentation model has to be implemented in the same way as the Object detection using openCV.
At present the entire video generation pipeline takes less than 40 seconds. This excludes the uploading time and downloading time. The server on which it was tested has a RTX 3060 GPU peak GPU memory utilisation reached upto 11GB
That's all for today!!!
Hope you had a great week