Treffer: Consistent and comprehensive scale aggregation network for drone-view small object detection.
Weitere Informationen
Accurate detection in UAV scenarios is challenging due to the small size of objects. This is attributed to two non-negligible factors: (1) small objects lack sufficient visual cues and often suffer information loss during feature extraction; (2) the positional sensitivity of small objects complicates the optimization of predicted bounding boxes. To address these issues, we present a Consistent and Comprehensive Scale Aggregation Network (C<sup>2</sup>SANet). For high-quality feature representations of small objects, C<sup>2</sup>SANet develops a novel Multi-Scale Interactive Feature Pyramid Network (MSI-FPN), which introduces two new components based on the top-down propagation path: the Deformable-Based Spatial Calibration (DSC) and Scale Feature Enhancement (SFE) modules. Specifically, DSC leverages pixel-spatial information between adjacent scale features to adjust up-sampled deep features, enhancing the consistency of semantic propagation. SFE first unifies the spatial size of all scale features, then achieves the "Collect-and-Distribute" of full-scale information through the scale interaction block with an encoder-decoder structure, ensuring that the shallow features can be complemented with comprehensive semantic information. Additionally, to improve the localization prediction of small objects, a Coarse-to-Fine Detection Head (CFDH) with geometrical-aware adjustment is devised to refine the quality of predicted boxes iteratively. Extensive experiment results demonstrate the effectiveness and generalizability of C<sup>2</sup>SANet in improving small object detection performance.
(Copyright © 2025 Elsevier Ltd. All rights reserved.)
Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.