Abstract
Detecting small objects is critical to many submissions, such as automatic drive and lung nodule detection. However, small object detection is challenging with low-resolution features. Therefore, the linchpin of small object detection is to design an effective encoder that can extract subtle features. In this paper, we present a powerful encoder, called Ensemble Transformer with Attention Modules (ETAM) encoder, for abstracting the subtle small object features without sacrificing the capability of larger object detection. In ETAM, a Magnifying Glass (MG) module is proposed to focus on representative features of small objects. Then, the Quadruple Attention (QA) is designed to enrich the small object features with width and height in addition to channel and position. To accommodate both small and large objects, we use ensemble learning in our ETAM encoder, which has two branches. Experimental results show that ETAM significantly improves small object detection based on PASCAL VOC, MS-COCO, VisDrone2019, and LIDC-IDRI. With ETAM, the mAP for small objects is improved up to 91.7% based on the four datasets.
•Transformer’s potential for small object detection is demonstrated.•The MG can forecast small objects’ wide positions on shallow features.•The QA extends the attention to two extra dimensions, height and width.•ETAM has two branches to adapt small and larger object detection.