【ICLR 2015】Multiple Object Recognition with Visual Attention

Paper 发表了文章 • 0 个评论 • 778 次浏览 • 2017-08-09 14:37 • 来自相关话题

关键词: 注意力, 强化学习, 多目标识别
 
Multiple Object Recognition with Visual Attention
 
Jimmy Ba, Volodymyr Mnih, Koray Kavukcuoglu
 
DeepMind
 
paper: https://arxiv.org/abs/1412.7755
 
We present an attention-based model for recognizing multiple objects in images.
The proposed model is a deep recurrent neural network trained with reinforcement
learning to attend to the most relevant regions of the input image. We show that the
model learns to both localize and recognize multiple objects despite being given
only class labels during training. We evaluate the model on the challenging task of
transcribing house number sequences from Google Street View images and show
that it is both more accurate than the state-of-the-art convolutional networks and
uses fewer parameters and less computation.
 
关键词: 注意力, 强化学习, 多目标识别
  查看全部
关键词: 注意力, 强化学习, 多目标识别
 
Multiple Object Recognition with Visual Attention
 
Jimmy Ba, Volodymyr Mnih, Koray Kavukcuoglu
 
DeepMind
 
paper: https://arxiv.org/abs/1412.7755
 
We present an attention-based model for recognizing multiple objects in images.
The proposed model is a deep recurrent neural network trained with reinforcement
learning to attend to the most relevant regions of the input image. We show that the
model learns to both localize and recognize multiple objects despite being given
only class labels during training. We evaluate the model on the challenging task of
transcribing house number sequences from Google Street View images and show
that it is both more accurate than the state-of-the-art convolutional networks and
uses fewer parameters and less computation.
 
关键词: 注意力, 强化学习, 多目标识别
 

【CVPR 2017】RON: Reverse Connection with Objectness Prior Networks for Object Detection

Paper 发表了文章 • 0 个评论 • 876 次浏览 • 2017-08-08 01:09 • 来自相关话题

 
RON: Reverse Connection with Objectness Prior Networks for Object Detection
 
Tao Kong, Fuchun Sun, Anbang Yao, Huaping Liu, Ming Lu, Yurong Chen
 
github: https://github.com/taokong/RON
code: https://github.com/taokong/RON
 
We present RON, an efficient and effective framework for generic object detection. Our motivation is to smartly associate the best of the region-based (e.g., Faster R-CNN) and region-free (e.g., SSD) methodologies. Under fully convolutional architecture, RON mainly focuses on two fundamental problems: (a) multi-scale object localization and (b) negative sample mining. To address (a), we design the reverse connection, which enables the network to detect objects on multi-levels of CNNs. To deal with (b), we propose the objectness prior to significantly reduce the searching space of objects. We optimize the reverse connection, objectness prior and object detector jointly by a multi-task loss function, thus RON can directly predict final detection results from all locations of various feature maps. Extensive experiments on the challenging PASCAL VOC 2007, PASCAL VOC 2012 and MS COCO benchmarks demonstrate the competitive performance of RON. Specifically, with VGG-16 and low resolution 384X384 input size, the network gets 81.3% mAP on PASCAL VOC 2007, 80.7% mAP on PASCAL VOC 2012 datasets. Its superiority increases when datasets become larger and more difficult, as demonstrated by the results on the MS COCO dataset. With 1.5G GPU memory at test phase, the speed of the network is 15 FPS, 3X faster than the Faster R-CNN counterpart.
 
  查看全部
 
RON: Reverse Connection with Objectness Prior Networks for Object Detection
 
Tao Kong, Fuchun Sun, Anbang Yao, Huaping Liu, Ming Lu, Yurong Chen
 
github: https://github.com/taokong/RON
code: https://github.com/taokong/RON
 
We present RON, an efficient and effective framework for generic object detection. Our motivation is to smartly associate the best of the region-based (e.g., Faster R-CNN) and region-free (e.g., SSD) methodologies. Under fully convolutional architecture, RON mainly focuses on two fundamental problems: (a) multi-scale object localization and (b) negative sample mining. To address (a), we design the reverse connection, which enables the network to detect objects on multi-levels of CNNs. To deal with (b), we propose the objectness prior to significantly reduce the searching space of objects. We optimize the reverse connection, objectness prior and object detector jointly by a multi-task loss function, thus RON can directly predict final detection results from all locations of various feature maps. Extensive experiments on the challenging PASCAL VOC 2007, PASCAL VOC 2012 and MS COCO benchmarks demonstrate the competitive performance of RON. Specifically, with VGG-16 and low resolution 384X384 input size, the network gets 81.3% mAP on PASCAL VOC 2007, 80.7% mAP on PASCAL VOC 2012 datasets. Its superiority increases when datasets become larger and more difficult, as demonstrated by the results on the MS COCO dataset. With 1.5G GPU memory at test phase, the speed of the network is 15 FPS, 3X faster than the Faster R-CNN counterpart.