DS-YOLOv5：一种实时的安全帽佩戴检测与识别模型

白培瑞; 王瑞; 刘庆一; 韩超; 杜红萱; 轩辕梦玉; 傅颖霞

doi:10.13374/j.issn2095-9389.2022.11.11.006

摘要: 基于视频分析技术对生产现场人员安全帽佩戴情况进行自动检测与识别是保障安全生产的重要手段. 但是，复杂的现场环境和多变的外界因素为安全帽检测与识别的精确性提出挑战. 本文基于YOLOv5模型的框架，提出一种DS-YOLOv5安全帽检测与识别模型. 首先，利用改进的Deep SORT多目标跟踪的优势，提高视频检测中多目标检测和有遮挡的容错率，减少漏检情况；其次在主干网络中融合简化的Transformer模块，加强对图像的全局信息的捕获进而加强对小目标的特征学习；最后在网络的Neck部分应用双向特征金字塔网络(BiFPN)融合多尺度特征，以便适应由摄影距离造成的目标尺度变化. 所提模型在GDUT-HWD和MOT多目标跟踪数据集上进行了验证实验，结果表明DS-YOLOv5模型可以更好地适应遮挡和目标尺度变化，全类平均精度（mAP）可以达到95.5%，优于其他常见的安全帽检测与识别方法.

Abstract: Automatic detection and recognition of safety helmet wearing based on video analysis is important to ensure production safety. It is inefficient to supervise whether workers wear safety helmets by manual means. With the advancement of deep learning, using computer vision to assist in the detection of safety helmet-wearing holds significant research and application value. However, complex environments and variable factors pose challenges in achieving accurate detection and recognition of safety helmet usage. Helmet-wearing detection methods are generally classified as traditional machine learning and deep learning methods. Traditional machine learning methods employ manually selected features or statistical features, resulting in poor model stability. Deep learning–based methods are further divided into “two-stage” and “one-stage” methods. The two-stage method has high detection accuracy but cannot achieve real-time detection, while the one-stage counterpart is the reverse. Achieving accuracy as well as real-time detection is an important challenge in the development of video-based helmet detection systems. Accurate and quick detection of helmets is essential for effective real-time monitoring of production sites. To address these challenges, this paper proposes DS-YOLOv5—a real-time helmet detection and recognition model based on the YOLOv5 models. The proposed model solves three main problems: First, insufficient global information extraction problem of convolutional neural network (CNN) models. Second, the lacking robustness of the deep SORT for multiple targets and occlusion problems in video scenes. Third, the inadequate feature extraction of multiscale targets. The DS-YOLOv5 model takes advantage of the improved Deep SORT multitarget tracking algorithm to reduce the rate of missed detections in multitarget detection and occlusion and increase the error tolerance in video detection. Further, a simplified transformer module is integrated into the backbone network to enhance the capture of global information from images and thus enhance feature learning for small targets. Finally, the bidirectional feature pyramid network is used to fuse multiscale features, which can better adapt to target scale changes caused by the photographic distance. The DS-YOLOv5 model was validated using the GDUT-HWD dataset by ablation and comparison experiments. In these experiments, the tracking capability of the improved Deep SORT is compared with the YOLOv5 model using the public pedestrian dataset MOT. The results of the comparison of the five one-stage methods and four helmet detection and recognition models revealed that the proposed model has better capability for dealing with occlusion and target scale. Further, the model achieved mean average orecision (mAP) of 95.5%, which is superior to that of the other helmet detection and recognition models.

DS-YOLOv5：一种实时的安全帽佩戴检测与识别模型

DS-YOLOv5: A real-time detection and recognition model for helmet wearing