论文名称
A Survey of Deep Learning-based Object Detection
Abstract
1. 写作目的
In order to understand the main development status of object detection pipeline, thoroughly and deeply.
2. 文章内容
- we first analyze the methods of existing typical detection models and describe the benchmark datasets.
- Afterwards and primarily, we provide a comprehensive overview of a variety of object detection methods in a systematic manner, covering the one-stage and two-stage detectors.
- Moreover, we list the traditional and new applications. Some representative branches of object detection are analyzed as well.
- Finally, we discuss the architecture of exploiting these object detection methods to build an effective and efficient system and point out a set of development trends to better follow the state-of-the-art algorithms and further research.
Ⅰ.Introduction
1. 现存的目标检测器分类及其特点
1.1 one-stage:YOLO,SSD.
one-stage detectors achieve high inference speed.
1.2 two-stage:Faster R-CNN.
Two-stage detectors have high localization and object recognition accuracy.
- The first stage, called RPN, a Region Proposal Network, proposes candidate object bounding boxes.
- The second stage, features are extracted by RoI Pool(RoI Pooling) operation from each candidate box for the following classification and bounding-box regression tasks .
1.3两类对比
- Fig(a) shows the basic architecture of two-stage detectors.
- Fig(b) exhibits the basic architecture of one-stage detectors.
Furthermore, the one-stage detectors propose predicted boxes from input images directly without region proposal step, thus they are time efficient and can be used for real-time devices.
Backbone Networks
Backbone network is acting as the basic feature extractor for object detection task which takes images as input and outputs feature maps of the corresponding input image.
Most of backbone networks for detection are the network for classification task taking out the last fully connected layers.
Typical Baselines
- Two-stage Detectors
- R-CNN
- Fast R-CNN
- Faster R-CNN
- Mask R-CNN
- One-stage Detectors
- YOLO series
- SSD
- DSSD
- RetinaNet
- M2Det
- RefineDet
- Latest Detectors
- Relation Networks for Object Detection
- DCNv2
- NAS-FPN
Datasets and Metrics
- PASCAL VOC dataset
- MS COCO benchmark
- ImageNet benchmark
- VisDrone2018 benchmark
- Open Images V5
- Pedestrian detection datasets
Analysis of general image object detection methods
Deep neural network based object detection pipelines have four steps in general, image pre-processing, feature extraction, classification and localization, post-processing.
- Firstly, raw images from the dataset cant be fed into the network directly.
Therefore, we need to resize them to any special sizes and make them clearer, such as enhancing brightness, color, contrast. Data augmentation is also available to meet some requirements, such as flipping, rotation, scaling, cropping, translation, adding Gaussian noise. In addition, GANs (generative adversarial networks) can generate new images to enrich the diversity of input according to people’s needs. For more details about data augmentation, please refer to for more details. - Secondly, feature extraction is a key step for further detection.
The feature quality directly determines the upper bound of subsequent tasks like classification and localization. - Thirdly, the detector head is responsible to propose and refine bounding box concluding classification scores and bounding box coordinates.
- At last, the post-processing step deletes any weak detecting results.
For example, NMS is a widely used method in which the highest scoring object deletes its nearby objects with inferior classification scores.