author: | zhaoyue-zephyrus |
score: | 8 / 10 |
-
Specialize a low-cost student model through online distillation to a specific target distribution in video streams.
-
Architectural design: Just-In-Time Network
-
Separable filter
-
Small number of channels
-
Skip connections from each encoder block to the corresponding decoder block
-
-
Training paradigm: Just-In-Time Model Distillation
-
(1) Pre-train JIT-Net on COCO; (2) Re-train online on a live video stream when new frames arrive using teacher MRCNN
-
Adaptive re-train:
-
Periodic distillation: run teacher network every \(\delta\) frames, \(\delta\) based on the recent student accuracy (8 - 64 frames)
-
Rapid specialization: update all layers with high lr and momentum; terminate when accuracy reaches threshold (0.9 by default) or update iterations reach upper limit (8 times).
-
-
-
Experimental results:
- Long Video Streams dataset (LVS): 30 HD videos, each 30 min; 720P+
-
Pseudo Label by a high-performance Detectron Mask RCNN
-
Qualitative results:
-
JITNet 0.9 maintains 82.5 mean IoU with 7.5× runtime speedup
-
JITNet is also more accurate than offline oracle, flow-based interpolation methods, OSVOS (one-shot video object segmentation).
-
Room for improvement espeically for small objects (traffic camera or aerial views)
TL;DR
- An online distillation approach for fast adaptation and efficient inference.
- Maintain high accuracy with significant speedup.