*Result*: 基于 Roofline 理论的 YOLOv8 橙果识别模型轻量化改进.

Title:
基于 Roofline 理论的 YOLOv8 橙果识别模型轻量化改进.
Alternate Title:
Optimizing orange fruit recognition model using Roofline theory and lightweight improved YOLOv8.
Authors:
刘 洁1,2,3 liujie@mail.hzau.edu.cn, 王一帆1, 马京奥1, 戴梓健1, 杨宝钦1, 杨恬甜1
Source:
Transactions of the Chinese Society of Agricultural Engineering. Jul2025, Vol. 41 Issue 14, p184-193. 10p.
Database:
Academic Search Index

*Further Information*

*Orange (Citrus sinensis) has been one of the most key fruit varieties, due to its outstanding economic value. Its largescale planting has been an important carrier to promote the rural revitalization in the hilly areas. Current manual harvesting cannot fully meet the large-scale production in recent years, due to the high labor costs, low efficiency and prone to fruit damage. Particularly, the shortage of agricultural labor is ever increasingly prominent, with the urbanization and aging population structure. Mechanization and intelligence transformation of harvesting operations can be inevitable for the industrial development. This study aims to enhance the efficiency and accuracy of the real-time orange detection in the unstructured orchard environments under the constraints of the embedded edge computing platforms. A lightweight object detection model, named YOLOv8n-Light, was proposed in alignment with the Roofline performance model, in order to increase the computational intensity for the memory access. Thereby, a systematic optimization was also made to balance between resource consumption and detection accuracy. The baseline YOLOv8n network was modified to replace its backbone with the lightweight ShuffleNetV2 architecture. ShuffleNetV2 was utilized the channel splitting, pointwise convolution, and depthwise separable convolution. The parameter size and computational cost were minimized to extract the fine-grained features. Furthermore, a novel lightweight detection head was introduced on the top of this backbone, according to the shared 3×3 convolutional kernels across feature pyramid levels. The redundant parameter storage and activation memory traffic were significantly reduced to a streamlined and more efficient pipeline. A concatenation module was restructured to incorporate the SE (Squeeze-and-Excitation) attention mechanism. The channel-wise responses were recalibrated using feature importance. The SE module was enhanced the network's sensitivity to the relevant object features under complex conditions, such as the varying illumination, background clutter, and partial occlusion. The loss function was redesigned to integrate the MPDIoU (Minimum Point Distance Intersection over Union) and the Focaler-IoU (Focalized Intersection over Union), in order to improve the localization. This hybrid loss function was imposed the stronger penalties on the inaccurate bounding boxes and dynamically balanced the precision-recall trade-off, according to the quality of each prediction. As a result, there were the high regression accuracy and robust detection sensitivity. A series of the experiments were conducted on a Raspberry Pi 4B platform with 8 GB of RAM. The YOLOv8n-Light model reached an inference speed of 2.8 FPS (frames per second). There was the 64.7% increase, compared with the original YOLOv8n. The strong performance of the detection was attained a precision of 96.5%, which is 2.2 percentage points higher than the original YOLOv8n model, a recall of 89.5%, and a mAP (mean Average Precision) of 97.0%. Field evaluations were carried out in an orchard using a six-degree-of-freedom robotic arm equipped with an Intel RealSense depth camera. The average positioning errors were measured as 2.48 mm along the X-axis, 3.13 mm along the Y-axis, and 4.13 mm along the Z-axis. The robotic fruit-picking system was achieved in a recognition accuracy of 97.59%, a localization accuracy of 96.39%, and an overall picking success rate of 93.98%. The applicability of the system was improved under real-world agricultural conditions. In conclusion, the YOLOv8n-Light model was effectively balanced the computational efficiency and detection accuracy on the resource-constrained embedded platforms. An optimized loss function was integrated with the architectural improvements and attention mechanisms. The reliable performance was achieved in both controlled and real-world orchard environments. The lightweight refinement of the citrus fruit detection can serve as a strong reference for the automated harvesting equipment. [ABSTRACT FROM AUTHOR]*

*为平衡有限算力嵌入式系统检测目标的实时性和准确率, 基于 Roofline 理论以降低访存量和计算量为出发点提 出一种 YOLOv8 n -Light 橙果识别模型。引入 ShuffleNetv2 轻量级主干网络代替原 Backbone 复杂冗余卷积, 根据橙果目 标特点设计基于共享卷积的轻量化探头以替换原检测头, 借助 SEAttention 注意力机制重构 Concat 模块, 结合 MPDIoU 与 Focaler-IoU 思想优化损失函数以重分配准确率( P )与召回率( R )比例, 利用 2 500 幅图像数据构建改进模型数据集。 结果表明, 改进模型 YOLOv8 n -Light 准确率为 96. 5 %, 较原 YOLOv8 n 模型提升 2. 2 个百分点, 召回率为 89. 5 %, 平均 精度(mAP)为 97. 0 %;在 Raspberry Pi 4 B 8 G 平台上推理速度为每秒 2. 8 帧, 较原模型提升 64. 7 %;果园试验中引导采 摘机械臂末端执行器在 X 、 Y 、 Z 方向上的平均定位误差分别为 2. 48 、 3. 13 、 4. 13 mm, 识别准确率 97. 59 %, 定位准确率 96. 39 %, 采摘成功率 93. 98 %。该算法可为柑橘类果实识别模型轻量化改进和采摘机具研发提供依据和参考。 [ABSTRACT FROM AUTHOR]*