light head rcnn: In Defense of Two-Stage Object Detector

In this paper, we first investigate why typical two-stage methods are not as fast as single-stage, fast detectors like YOLO [26, 27] and SSD [22]. We find that Faster RCNN [28] and R-FCN [17] perform an intensive computation after or before RoI warping. Faster R-CNN involves two fully connected layers for RoI recognition, while RFCN produces a large score maps. Thus, the speed of these networks is slow due to the heavy-head design in the architecture. Even if we significantly reduce the base model, the computation cost cannot be largely decreased accordingly. We propose a new two-stage detector, Light-Head RCNN, to address the shortcoming in current two-stage approaches. In our design, we make the head of network as light as possible, by using a thin feature map and a cheap R-CNN subnet  pooling and single fully-connected layer). Our ResNet-101 based light-head R-CNN outperforms state-of-art object detectors on COCO while keeping time efficiency. More importantly, simply replacing the
backbone with a tiny network (e.g, Xception), our LightHead R-CNN gets 30.7 mmAP at 102 FPS on COCO, significantly outperforming the single-stage, fast detectors like YOLO [26, 27] and SSD [22] on both speed and accuracy.
Code will be make publicly available

1 个评论

对作者那几下的操作带来速度与精度的提升,表示已超出了自己想象。按照一般认识:使用Thin feature maps 会一定程度上损失信息造成精度降低(但原文出现相反的结果),尽管使用了Large separable convolution,后者的威力有这么大么?