发布时间:2026年02月05日 作者:aiycxz.cn
(学术学位论文)基于深度学习的视频目标检测算法研究RESEARCH ON VIDEO OBJECT DETECTION ALGORITHM BASED ON DEEP LEARNING王 昊哈尔滨工业大学2021 年 6 月国内图书分类号:TP391.4 学校代码:10213国际图书分类号:004.8 密级:公开硕士学位论文基于深度学习的视频目标检测算法研究硕士研究生:王昊导 师:刘扬教授申 请 学 位:工学硕士学 科:计算机科学与技术所 在 单 位:计算机科学与技术学院答 辩 日 期:2021年6月授予学位单位:哈尔滨工业大学Classified Index: TP391.4U.D.C: 004.8Dissertation for the Master Degree in EngineeringRESEARCH ON VIDEO OBJECT DETECTION ALGORITHM BASED ON DEEP LEARNINGCandidate: Wang HaoSupervisor: Prof. Liu YangAcademic Degree Applied for: Master of EngineeringSpeciality: Computer Science and TechnologyAffiliation: School of Computer Science and TechnologyDate of Defence: June, 2021Degree-Conferring-Institution: Harbin Institute of Technology哈尔滨工业大学工学硕士学位论文摘要目标检测是计算机视觉领域中的一项基础任务,在自动驾驶、视频监控、机器人导航等领域有着广泛的应用。近年来,随着深度学习技术的快速发展,基于卷积神经网络的目标检测算法在检测精度和速度上均取得了巨大的进步。然而,这些算法大多针对静态图像进行检测,难以直接应用于视频序列。由于视频序列中普遍存在运动模糊、失焦、遮挡等图像质量退化问题,导致基于静态图像的目标检测算法在视频序列上的检测精度显著下降。因此,如何利用视频序列中的时序信息提升目标检测性能,成为视频目标检测领域的研究热点。本文首先介绍了视频目标检测的研究背景与意义,并分析了视频目标检测领域的研究现状。然后,对基于静态图像的目标检测算法进行了研究,并详细介绍了本文所使用的检测器。在此基础上,本文提出了一种基于注意力机制的视频目标检测算法,主要研究内容如下:(1)针对视频序列中普遍存在的图像质量退化问题,本文提出了一种基于注意力机制的图像特征增强方法。该方法通过注意力机制对视频序列中的关键帧进行定位,并利用关键帧的特征对非关键帧的特征进行增强,从而提升非关键帧的检测精度。此外,本文还提出了一种基于注意力机制的特征融合方法,通过注意力机制对视频序列中相邻帧的特征进行融合,从而提升检测精度。(2)针对视频序列中普遍存在的目标运动问题,本文提出了一种基于注意力机制的目标运动建模方法。该方法通过注意力机制对视频序列中目标的位置变化进行建模,从而提升检测精度。此外,本文还提出了一种基于注意力机制的目标运动预测方法,通过注意力机制对视频序列中目标的位置进行预测,从而提升检测精度。(3)针对视频序列中普遍存在的目标遮挡问题,本文提出了一种基于注意力机制的目标遮挡建模方法。该方法通过注意力机制对视频序列中目标的遮挡情况进行建模,从而提升检测精度。此外,本文还提出了一种基于注意力机制的目标遮挡预测方法,通过注意力机制对视频序列中目标的遮挡情况进行预测,从而提升检测精度。本文在 ImageNet VID 数据集上对所提出的算法进行了实验验证,实验结果表明,本文提出的算法在检测精度上优于现有的视频目标检测算法,并且具有较好的实时性。关键词:视频目标检测;深度学习;注意力机制;特征增强;特征融合- 1 -哈尔滨工业大学工学硕士学位论文AbstractObject detection is a fundamental task in the field of computer vision, and has a wide range of applications in autonomous driving, video surveillance, robot navigation and other fields. In recent years, with the rapid development of deep learning technology, object detection algorithms based on convolutional neural networks have made great progress in detection accuracy and speed. However, most of these algorithms are designed for static images and are difficult to apply directly to video sequences. Due to the common image quality degradation problems in video sequences, such as motion blur, defocus, and occlusion, the detection accuracy of static image-based object detection algorithms on video sequences is significantly reduced. Therefore, how to use the temporal information in video sequences to improve object detection performance has become a research hotspot in the field of video object detection.This paper first introduces the research background and significance of video object detection, and analyzes the research status of video object detection. Then, the static image-based object detection algorithms are studied, and the detector used in this paper is introduced in detail. On this basis, this paper proposes a video object detection algorithm based on attention mechanism. The main research contents are as follows:(1) Aiming at the common image quality degradation problems in video sequences, this paper proposes an image feature enhancement method based on attention mechanism. This method uses attention mechanism to