这是用户在 2025-7-8 20:11 为 https://app.immersivetranslate.com/pdf-pro/fbe17c2c-1eb7-4153-a409-95e60c437f85/ 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

Multi-Sensor Fusion Technology for 3D Object Detection in Autonomous Driving: A Review
用于自动驾驶中 3D 目标检测的多传感器融合技术:综述

Xuan Wang ® ®  ^("® "){ }^{\text {® }}, Member, IEEE, Kaiqiang Li, and Abdellah Chehri ® ®  ^("® "){ }^{\text {® }}, Senior Member, IEEE
Xuan Wang ® ®  ^("® "){ }^{\text {® }} , IEEE 会员, Kaiqiang Li, and Abdellah Chehri ® ®  ^("® "){ }^{\text {® }} , IEEE 高级会员

Abstract  抽象

With the development of society, technological progress, and new needs, autonomous driving has become a trendy topic in smart cities. Due to technological limitations, autonomous driving is used mainly in limited and low-speed scenarios such as logistics and distribution, shared transport, unmanned retail, and other systems. On the other hand, the natural driving environment is complicated and unpredictable. As a result, to achieve all-weather and robust autonomous driving, the vehicle must precisely understand its environment. The self-driving cars are outfitted with a plethora of sensors to detect their environment. In order to provide researchers with a better understanding of the technical solutions for multisensor fusion, this paper provides a comprehensive review of multi-sensor fusion 3D object detection networks according to the fusion location, focusing on the most popular LiDAR and cameras currently in use. Furthermore, we describe the popular datasets and assessment metrics used for 3D object detection, as well as the problems and future prospects of 3D object detection in autonomous driving.
随着社会的发展、技术的进步和新的需求,自动驾驶已成为智慧城市的热门话题。由于技术限制,自动驾驶主要应用于物流配送、共享运输、无人零售等系统等受限和低速场景。另一方面,自然驾驶环境复杂且不可预测。因此,为了实现全天候和稳健的自动驾驶,车辆必须精确了解其环境。自动驾驶汽车配备了大量的传感器来检测其环境。为了让研究人员更好地理解多传感器融合的技术方案,本文根据融合位置对多传感器融合三维目标检测网络进行了全面综述,重点介绍了目前目前最流行的激光雷达和摄像头。此外,我们描述了用于 3D 目标检测的常用数据集和评估指标,以及自动驾驶中 3D 目标检测的问题和未来前景。

Index Terms-Autonomous driving, smart cities, multi-sensor fusion, 3D object detection, LiDAR.
索引词 - 自动驾驶、智慧城市、多传感器融合、3D 目标检测、激光雷达。

I. Introduction  I. 引言

TRAFFIC congestion is a substantial hindrance to economic progress, with serious consequences for the social and economic sectors, as well as impediments to the advancement of society and sustainable cities. Significant breakthroughs in autonomous driving could bring about significant changes in human life, such as reducing the carbon emissions produced by transportation, reducing the amount of time spent commuting, improving transportation efficiency, and contributing to the development of smart cities [1].
交通拥堵是经济发展的重大障碍,对社会和经济部门造成严重后果,同时也阻碍了社会进步和可持续城市的发展。自动驾驶的重大突破可能会给人类生活带来重大变化,例如减少交通产生的碳排放,减少通勤时间,提高交通效率,并为智慧城市的发展做出贡献 [1]。
As a result, auto manufacturers have continued to introduce vehicles with assisted driving features, which suggests that the field of autonomous driving is currently seeing rapid growth. Currently, most car companies can achieve
因此,汽车制造商不断推出具有辅助驾驶功能的车辆,这表明自动驾驶领域目前正在快速增长。目前,大多数车企都可以实现
L2 level autonomous driving, and a few can achieve L3 level autonomous driving
L2 级自动驾驶,少数能实现 L3 级自动驾驶
The identification of objects in two dimensions has seen important advances recently. Nevertheless, because 2D object detection can only provide confidence scores for 2D edges and categories of things, it cannot provide the distance information required for autonomous vehicles.
最近,二维物体的识别取得了重要进展。然而,由于 2D 对象检测只能提供 2D 边缘和事物类别的置信度分数,因此它无法提供自动驾驶汽车所需的距离信息。
In a driving environment, self-driving cars must detect not only the distance to an object’s category, but also the rotation angle and even the object’s speed [2]. Autonomous vehicles must therefore have 3D object detection systems. At the top and bottom of the autonomous driving system are 3D object detection algorithms responsible for processing sensor inputs so that the autonomous car can “see” its surroundings. On the other hand, it will predict the surrounding environment based on what it observes and establish the subsequent driving trajectory to guide the vehicle’s control system to do actions such as acceleration, braking, and steering.
在驾驶环境中,自动驾驶汽车不仅必须检测到物体类别的距离,还必须检测旋转角度甚至物体的速度 [2]。因此,自动驾驶汽车必须配备 3D 对象检测系统。自动驾驶系统的顶部和底部是 3D 对象检测算法,负责处理传感器输入,以便自动驾驶汽车可以“看到”周围环境。另一方面,它会根据观察到的情况预测周围环境,并建立后续的行驶轨迹,以指导车辆的控制系统进行加速、制动和转向等动作。
The development of 3D object detection algorithms has produced numerous subfields. 3D object identification methods are commonly divided into two groups: a single-sensor approach and a multi-sensor fusion technique, also known as LiDAR-camera, radar-camera, LiDAR-radar-camera. The first method, “using only one sensor for 3D object recognition,” refers to the practice, as its name suggests. The latter term refers to using two or more sensors working together to improve their ability to detect three-dimensional objects. Common single-modal 3D object detection algorithms are presented in Section II-B.
3D 对象检测算法的发展产生了许多子领域。3D 物体识别方法通常分为两类:单传感器方法和多传感器融合技术,也称为 LiDAR-camera、radar-camera、LiDAR-radar-camera。第一种方法,“仅使用一个传感器进行 3D 对象识别”,顾名思义,指的是这种做法。后一个术语是指使用两个或多个传感器协同工作以提高它们检测 3D 物体的能力。常见的单模态 3D 对象检测算法在第 II-B 节中介绍。
Detection of 3D objects has its own set of challenges. Single-sensor approaches are frequently restricted by a lack of depth information or excessive similarity of object characteristics; for instance, the radar point clouds characteristics of utility poles and individuals are remarkably similar. Methods employing simply LiDAR as a sensing device are incapable of distinguishing between them.
检测 3D 对象有其自身的一系列挑战。单传感器方法经常受到缺乏深度信息或物体特征过度相似的限制;例如,电线杆和个人的雷达点云特征非常相似。仅使用 LiDAR 作为传感设备的方法无法区分它们。
Despite the fact that numerous types of 3D object detection methods have been carefully summarized and compared in previous works [2], [3], [4], [5], there are relatively few review works that compare the algorithms at the level of experimental results visualization. In this study, we focus on multi-sensor fusion for 3D object detection and reproduce some representative methods to help the readers understand 3D object detection in a visual format and evaluate the performance of 3D object detection algorithms in real-world driving scenarios.
尽管在以前的工作中已经仔细总结和比较了多种类型的 3D 目标检测方法 [2]、[3]、[4]、[5],但在实验结果可视化的层面上比较算法的综述工作相对较少。在本研究中,我们重点介绍了用于 3D 目标检测的多传感器融合,并复制了一些具有代表性的方法,以帮助读者以视觉形式理解 3D 目标检测,并评估 3D 目标检测算法在实际驾驶场景中的性能。

Fig. 1. Hierarchically-structured taxonomy of multi-sensor fusion 3D object detection for autonomous driving.
图 1.用于自动驾驶的多传感器融合 3D 对象检测的分层结构分类法。
The contributions of this paper are as follows:
本文的贡献如下:
  • We overview and briefly describe the most common 3D object detection datasets currently used in autonomous driving scenarios.
    我们概述并简要介绍了目前自动驾驶场景中最常用的 3D 对象检测数据集。
  • We describe the most prominent 3D object detection techniques based on LiDAR-camera fusion and provide an in-depth discussion centered on the fusion position.
    我们描述了基于 LiDAR 相机融合的最突出的 3D 目标检测技术,并围绕融合位置进行了深入讨论。
  • We illustrate the performance of 3D object detection in a variety of settings through the depiction of the results of multiple approaches.
    我们通过描述多种方法的结果来说明 3D 对象检测在各种设置中的性能。
  • We examine the challenges and future trends of 3D object identification in autonomous driving, as well as the prognosis of the impact that autonomous driving will have on the future world, with the aim that this will better stimulate future research.
    我们研究了自动驾驶中 3D 对象识别的挑战和未来趋势,以及自动驾驶对未来世界影响的预测,以期更好地激发未来的研究。

    The following is a brief description of the remainder of this paper. We present in Section II the data representation of sensors commonly used for autonomous driving and their representative detection networks. In Section III, we summarize the current datasets used for autonomous driving scenarios and make a brief comparison. In Section IV, we detail 3D object detection based on multi-sensor fusion. We first start with sensor devices and analyze the existing popular sensor combinations. Then, multi-sensor fusion 3D object
    以下是本文其余部分的简要说明。在第二节中,我们介绍了自动驾驶常用传感器的数据表示及其代表性检测网络。在第 III 节中,我们总结了当前用于自动驾驶场景的数据集,并进行了简要比较。在第 IV 节中,我们详细介绍了基于多传感器融合的 3D 对象检测。我们首先从传感器设备开始,分析现有的常用传感器组合。然后,多传感器融合 3D 对象

    detection is divided into two categories based on the fusion position, followed by a detailed description of the two types of schemes. In Section V, we compare popular 3D object detection schemes of recent years and present the visualization results. Finally, we conclude with a summary of the current challenges and an outlook for the future. The structure diagram of the article is shown in Figure 1.
    根据融合位置将检测分为两类,然后对两种类型的方案进行详细说明。在第 V 节中,我们比较了近年来流行的 3D 对象检测方案并展示了可视化结果。最后,我们总结了当前的挑战和对未来的展望。本文的结构图如图 1 所示。

A. What Is 3D Object Detection?
一个。什么是 3D 对象检测?

The goal of 3D object detection is to generate accurate attribute predictions for real-world objects, such as their size, rotation angle, and other relevant characteristics. When used for autonomous driving, 3D object detection also frequently makes predictions regarding the velocity of the objects being detected. Currently, the most common applications for 3D object detection are those associated with autonomous driving, as well as 3D object detection methods for usage in interior scenarios [6], [7]. In comparison to interior contexts, driving sceneries are dynamic, complicated, and highly changing; they are also quite demanding in terms of forecast speed. As shown in 2a, in 3D object identification methods, a rectangle is typically utilized to enclose the 3D object, and this rectangle
3D 对象检测的目标是为真实世界的对象生成准确的属性预测,例如其大小、旋转角度和其他相关特征。当用于自动驾驶时,3D 物体检测还经常对被检测物体的速度进行预测。目前,3D 对象检测最常见的应用是与自动驾驶相关的应用,以及用于室内场景的 3D 对象检测方法 [6], [7]。与内部环境相比,驾驶场景是动态的、复杂的和高度变化的;他们在预测速度方面也相当苛刻。如 2a 所示,在 3D 对象识别方法中,通常使用矩形来包围 3D 对象,而这个矩形

Fig. 2. Example of 3D object detection results.
图 2.3D 对象检测结果示例。

is typically represented as follows.
通常表示如下。
B = [ x c , y c , z c , l , w , h , θ , ϕ , φ , class ] B = x c , y c , z c , l , w , h , θ , ϕ , φ ,  class  B=[x_(c),y_(c),z_(c),l,w,h,theta,phi,varphi," class "]B=\left[x_{c}, y_{c}, z_{c}, l, w, h, \theta, \phi, \varphi, \text { class }\right]
where ( x c , y c , z c x c , y c , z c x_(c),y_(c),z_(c)x_{c}, y_{c}, z_{c} ) is denoted center coordinates of the rectangle, ( l , w , h l , w , h l,w,hl, w, h ) is denoted the length, width, and height of the rectangular, ( θ , ϕ , φ ) ( θ , ϕ , φ ) (theta,phi,varphi)(\theta, \phi, \varphi) are indicated and roll respectively. For the current autopilot stage, all objects are on the ground, so only the yaw angle θ θ theta\theta needs to be considered. As shown in Figure 2b. class indicates the class of the 3D object. In addition, in some methods, the object’s speed is also predicted [8].
其中 ( x c , y c , z c x c , y c , z c x_(c),y_(c),z_(c)x_{c}, y_{c}, z_{c} ) 表示矩形的中心坐标,( l , w , h l , w , h l,w,hl, w, h ) 表示矩形的长度、宽度和高度, ( θ , ϕ , φ ) ( θ , ϕ , φ ) (theta,phi,varphi)(\theta, \phi, \varphi) 分别表示和滚动。对于当前的自动驾驶仪阶段,所有物体都在地面上,因此只需要考虑偏航角 θ θ theta\theta 。如图 2b 所示。class 表示 3D 对象的类。此外,在某些方法中,还可以预测物体的速度 [8]。
Commonly sensors: Unlike 2D object detection, which typically employs only cameras as input, 3D object detection can use several sensors as input to the network. Cameras, LiDAR, and radar are currently the most prevalent sensors. Following that, we will provide a quick overview of various sensors.
常用传感器:与通常仅使用摄像头作为输入的 2D 对象检测不同,3D 对象检测可以使用多个传感器作为网络的输入。摄像头、LiDAR 和雷达是目前最普遍的传感器。之后,我们将提供各种传感器的快速概述。
Cameras: Cameras are ubiquitous in our lives because they are inexpensive to manufacture, have great imaging effects, and are passive sensors. The camera can produce an ( H × W × 3 H × W × 3 H xx W xx3H \times W \times 3 ) image, where ( H , W H , W H,WH, W ) are the height and width of the image, and 3 is the number of channels per pixel, generally referring to RGB channels. The camera can acquire high-resolution images of the outside world and visualize the shape of objects, but in 3D object detection, the camera has limitations. First, cameras have poor nighttime imaging, and although some algorithms [9] enable cameras to image at night to approximate daytime levels, this is achieved by increasing exposure time at the expense of time, which is a fatal problem at autonomous driving. Second, the camera does not provide good depth information, and using 2D images to forecast depth information with a trained network frequently results in substantial inaccuracies. Furthermore, the camera is weathersensitive, and imaging is far less effective in wet and foggy conditions than in sunny conditions.
相机:相机在我们的生活中无处不在,因为它们的制造成本低廉,具有出色的成像效果,并且是无源传感器。摄像头可以生成 ( H × W × 3 H × W × 3 H xx W xx3H \times W \times 3 ) 图像,其中 ( H , W H , W H,WH, W ) 是图像的高度和宽度,3 是每个像素的通道数,一般指 RGB 通道。该相机可以获取外部世界的高分辨率图像并可视化物体的形状,但在 3D 物体检测中,该相机存在局限性。首先,摄像头的夜间成像效果很差,尽管一些算法 [9] 使摄像头能够在夜间成像以接近白天的水平,但这是通过以牺牲时间为代价增加曝光时间来实现的,这在自动驾驶中是一个致命的问题。其次,相机不能提供良好的深度信息,并且使用 2D 图像通过训练有素的网络预测深度信息经常会导致严重的不准确。此外,该相机对天气敏感,在潮湿和多雾条件下成像的效果远不如在阳光明媚的条件下。
LiDAR: Light Detection and Ranging, or LDR for short, is a common type of active sensor. In order to determine an object’s precise 3D structure, LiDAR actively generates laser beams and collects information about the reflected light, in contrast to cameras, which passively take in data. Due to its high deployment cost, autonomous driving is now constrained by the use of LiDAR despite its ability to directly capture an item’s 3D structure and accurate depth information. Further, because of its short wavelength, LiDAR is subject to interference from numerous types of material in the air. Hence
LiDAR:光检测和测距,简称 LDR,是一种常见的有源传感器类型。为了确定物体的精确 3D 结构,LiDAR 主动生成激光束并收集有关反射光的信息,而相机则被动接收数据。由于部署成本高,尽管 LiDAR 能够直接捕获物品的 3D 结构和准确的深度信息,但自动驾驶现在受到 LiDAR 的使用限制。此外,由于其波长短,LiDAR 会受到空气中多种材料的影响。因此

its effectiveness will be marginally decreased in bad weather conditions.
在恶劣的天气条件下,它的有效性会略有降低。
Radar: Radar is an active sensor with the same basic principle as LiDAR, but unlike LiDAR, Radar works by generating radio waves. Because radio waves have a larger wavelength, Radar works across a longer distance. Radar has a limited resolution, and unlike LiDAR, it cannot directly acquire the contour of an object, making it ineffective for detecting small objects [10], [11].
雷达:雷达是一种有源传感器,基本原理与 LiDAR 相同,但与 LiDAR 不同的是,雷达通过产生无线电波来工作。由于无线电波的波长较大,因此雷达的工作距离更长。雷达的分辨率有限,而且与 LiDAR 不同,它无法直接获取物体的轮廓,因此无法有效检测小物体 [10],[11]。
Conclusion: 3D object detection algorithms gather sensor information and make decisions about the surrounding targets, which is an important aspect of autonomous driving. The driving environment is complicated and varied, and 3D object identification algorithms for autonomous driving scenarios demand a high level of accuracy and reliability. Each sensor has strengths and drawbacks. Thus it has become popular to combine many sensors for object detection.
结论:3D 目标检测算法收集传感器信息并做出有关周围目标的决策,这是自动驾驶的一个重要方面。驾驶环境复杂多样,用于自动驾驶场景的 3D 对象识别算法需要高水平的准确性和可靠性。每个传感器都有优点和缺点。因此,将许多传感器组合用于物体检测已变得很流行。

B. Single-Sensor-Based 3D Object Detection
B. 基于单传感器的 3D 对象检测

As its name implies, single-sensor-based 3D object detection refers to the prediction of the 3D Box of a target using data from a single sensor. There are two primary popular classifications: 3D object detection via Cameras and 3D object detection via LiDAR. Due to the fact that our research focuses on multi-sensor fusion schemes, we will only briefly introduce the techniques based on the aforementioned classes without providing a comprehensive review.
顾名思义,基于单传感器的 3D 对象检测是指使用来自单个传感器的数据预测目标的 3D 盒子。有两种主要的流行分类:通过摄像头进行 3D 对象检测和通过 LiDAR 进行 3D 对象检测。由于我们的研究重点是多传感器融合方案,我们只简要介绍基于上述类别的技术,而不提供全面的回顾。
  1. 3D Object Detection Through Cameras: Depending on the types of cameras available, camera-based 3D object detection can be further subdivided into more specific categories. Some examples of these subcategories include monocular camera-based 3D object detection, multi-vision camera-based 3D object detection, and stereo-based 3D object detection.
    通过摄像头进行 3D 对象检测:根据可用的摄像头类型,基于摄像头的 3D 对象检测可以进一步细分为更具体的类别。这些子类别的一些示例包括基于单眼摄像头的 3D 对象检测、基于多视觉摄像头的 3D 对象检测和基于立体的 3D 对象检测。

    a) Monocular camera-based: Monocular cameras typically feature only one lens and cannot directly determine depth in detail. They give information in the form of pixel intensities that can visually reflect an item’s shape and texture information. They are a favored choice for monocular 3D object detection due to their low cost and superior imaging [12]. Camera-based 3D object detection performs depth estimation directly on the image so that it can be seen as an evolution from 2D object detection. In recent years, most camerabased 3D object detection has been done using monocular cameras [13], [14], [15], [16], [17], [18], [19].
    a) 基于单目相机:单目相机通常只有一个镜头,无法直接确定细节深度。它们以像素强度的形式提供信息,这些像素强度可以直观地反映项目的形状和纹理信息。由于成本低、成像性能好,它们是单目 3D 目标检测的首选 [12]。基于摄像头的 3D 对象检测直接在图像上执行深度估计,因此可以将其视为 2D 对象检测的演变。近年来,大多数基于相机的 3D 物体检测都是使用单目相机完成的 [13], [14], [15], [16], [17], [18], [19]。

    b) Multi-vision camera-based: Currently, self-driving vehicles are typically outfitted with numerous cameras to
    b) 基于多视觉摄像头:目前,自动驾驶汽车通常配备大量摄像头,以

  1. Manuscript received 13 February 2023; revised 7 June 2023 and 21 August 2023; accepted 13 September 2023. Date of publication 27 September 2023; date of current version 2 February 2024. This work was supported in part by the Natural Science Foundation of Shandong Province under Grant ZR2020QF108 and Grant ZR2022QF037 and in part by the Youth Innovation Science and Technology Support Program of Shandong Province under Grant 2021KJ080. The Associate Editor for this article was C. Wen. (Corresponding author: Abdellah Chehri.)
    手稿于 2023 年 2 月 13 日收到;修订于 2023 年 6 月 7 日和 2023 年 8 月 21 日;2023 年 9 月 13 日接受。发布日期 2023 年 9 月 27 日;当前版本的日期 2024 年 2 月 2 日。这项工作部分得到了山东省自然科学基金 ZR2020QF108 和 ZR2022QF037 资助的支持,部分得到了山东省青年创新科技支持计划(2021KJ080)的支持。本文的副主编是 C. 温。(通讯作者:Abdellah Chehri)
    Xuan Wang and Kaiqiang Li are with the School of Computer and Control Engineering, Yantai University, Yantai 264005, China (e-mail: xuanwang91@ytu.edu.cn; likaiqiang@s.ytu.edu.cn).
    Xuan Wang 和 Kaiqiang Li 就职于中国烟台 264005 烟台大学计算机与控制工程学院(电子邮件:xuanwang91@ytu.edu.cn;likaiqiang@s.ytu.edu.cn)。
    Abdellah Chehri is with the Department of Mathematics and Computer Science, Royal Military College of Canada, Kingston, ON K7K 7B4, Canada (e-mail: chehri@rmc.ca).
    Abdellah Chehri 就职于加拿大皇家军事学院数学和计算机科学系,地址:Kingston, ON K7K 7B4, Canada(电子邮件:chehri@rmc.ca)。

    Digital Object Identifier 10.1109/TITS.2023.3317372
    数字对象标识符 10.1109/TITS.2023.3317372