这是用户在 2025-7-7 21:56 为 https://app.immersivetranslate.com/pdf-pro/57be7495-9ada-4984-870b-06f100494c94/ 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

Neural Path Planning With Multi-Scale Feature Fusion Networks
使用多尺度特征融合网络进行神经路径规划

XIANG JIN ® ®  ^("® "){ }^{\text {® }}, WEI LAN ® ®  ^("® "){ }^{\text {® }}, AND XIN CHANG
金翔 ® ®  ^("® "){ }^{\text {® }} , 魏兰 ® ®  ^("® "){ }^{\text {® }} , 和 XIN CHANG
School of Naval Architecture and Ocean Engineering, Dalian Maritime University, Dalian 116026, China
大连海事大学 船舶与海洋工程学院, 中国 大连 116026
Corresponding author: Xin Chang (xin.chang@dlmu.edu.cn)
通讯作者: Xin Chang (xin.chang@dlmu.edu.cn)

This work was supported in part by the National Key Research and Development Program of China under Grant 2016YFC0301500.
这项工作部分得到了中国国家重点研发计划(2016YFC0301500)的支持。

Abstract  抽象

Path planning is critical for planetary rovers that perform observation and exploration missions in unknown and dangerous environment. And due to the communication delay, it is difficult for the planet rover to receive instructions from Earth in time to guide its own movement. In this work, we present a novel neural network-based algorithm to solve the global path planning problem for planetary rovers. Inspired by feature pyramid networks used for object detection, we construct a deep neural network model, termed the Pyramid Path Planning Network (P3N), which has a well-designed backbone that efficiently learns a global feature representation of the environment, and a feature pyramid branch that adaptively fuses multi-scale features from different levels to generate the local feature representation with rich semantic information. The P3N learns environmental dynamics from terrain images of planetary surface taken by satellites, without using additional elevation information to construct an explicit environmental model in advance, and can perform path planning policy after end-to-end training. We evaluate the effectiveness of the proposed method on synthetic grid maps and a realistic data set constructed from the lunar terrain images. Experimental results demonstrate that our P3N has higher prediction accuracy and faster computation speed compared to the baseline methods, and generalize better in large-scale environments.
路径规划对于在未知和危险环境中执行观测和探索任务的行星漫游车至关重要。而且由于通信延迟,行星探测器很难及时接收到来自地球的指令来引导自身的运动。在这项工作中,我们提出了一种基于神经网络的新型算法来解决行星漫游者的全局路径规划问题。受用于目标检测的特征金字塔网络的启发,我们构建了一个深度神经网络模型,称为金字塔路径规划网络 (P3N),它有一个设计良好的主干,可以有效地学习环境的全局特征表示,以及一个特征金字塔分支,自适应地融合来自不同层次的多尺度特征,以生成具有丰富语义信息的局部特征表示。P3N 从卫星拍摄的行星表面地形图像中学习环境动力学,无需使用额外的高程信息提前构建明确的环境模型,并且可以在端到端训练后执行路径规划策略。我们在合成网格图和由月球地形图像构建的真实数据集上评估了所提出的方法的有效性。实验结果表明,与基线方法相比,我们的 P3N 具有更高的预测精度和更快的计算速度,并且在大规模环境中泛化效果更好。

INDEX TERMS Deep learning, feature pyramid network, global path planning, multi-scale feature fusion, planetary rover.
索引术语深度学习、特征金字塔网络、全局路径规划、多尺度特征融合、行星探测器。

I. INTRODUCTION  I. 引言

Path planning is a very important technology for mobile robots, and can be divided into global path planning and local path planning [1], [2]. Global path planning refers to finding an optimal collision-free path from the start to the end state under the condition that the environmental information is known or predictable. Local path planning means that when the environmental information is known, the mobile robot obtains more information by actively exploring the environment and tries to find a feasible path to reach the target through repeated attempts.
路径规划是移动机器人非常重要的技术,可以分为全局路径规划和局部路径规划[1]、[2]。全局路径规划是指在环境信息已知或可预测的条件下,从开始状态到结束状态找到一条最优的无碰撞路径。局部路径规划是指在已知环境信息的情况下,移动机器人通过主动探索环境来获取更多信息,并通过反复尝试尝试找到到达目标的可行路径。
As a special class of mobile robots, planet rovers usually perform observation and exploration missions on other planets in the solar system, such as the Moon or Mars [3]. These planets are so far away from Earth that communication delays make it difficult for ground commanders to monitor and
行星漫游车作为一类特殊的移动机器人,通常在太阳系中的其他行星上执行观测和探测任务,例如月球或火星 [3]。这些行星离地球很远,通信延迟使地面指挥官难以监控和
control the rovers’ movements. Although the rover is usually equipped with cameras and other sensors to help it perceive its surroundings, when the rover needs to explore a target outside its field of view, given the complex terrain on the planet and limited movement capability of rovers, blind exploration behavior may put it in danger and also substantially increase energy consumption.
控制流浪者的移动。虽然漫游车通常配备了摄像头等传感器来帮助它感知周围环境,但当漫游车需要探索其视野之外的目标时,考虑到地球上复杂的地形和漫游车的移动能力有限,盲目的探索行为可能会使其处于危险之中,也会大幅增加能源消耗。
With recent technological advances, we have been able to easily obtain high-resolution terrain images of other planets taken by satellites. For the path planning problem of the rover, an effective solution is to pre-plan a globally optimal path from the rover’s location to the target area based on the terrain data obtained from satellite images [4], [5], and the rover then follow this path and fine-tune the path based on the local environmental information obtained during the journey. Although this path may not be optimal due to the resolution of satellite images, it is necessary to ensure the safety of rover and the success of exploration mission.
随着最近的技术进步,我们已经能够轻松获得卫星拍摄的其他行星的高分辨率地形图像。对于漫游车的路径规划问题,有效的解决方案是根据卫星图像获取的地形数据[4]、[5],预先规划一条从漫游车所在地到目标区域的全局最优路径,然后漫游车沿着这条路径,根据旅途中获取的局部环境信息对路径进行微调。虽然由于卫星图像的分辨率,这条路径可能不是最优的,但有必要确保漫游车的安全和探测任务的成功。
Traditional path planning methods first need to integrate various environmental information and build up an environmental model that can be used by planning algorithms, such as configuration space (C-space), visibility graph, grid map, and Voronoi diagram, etc., and then use these models to find the optimal path [6]. We can broadly classify these planning algorithms into the following categories. The first is search-based algorithms [7], such as Dijkstra and A*, whose advantage is that if optimal paths exist, then through exploring the whole environment step by step, these algorithms are guaranteed to find optimal solutions. The second class of sampling-based algorithms [8], such as Probabilistic Road Map(PRM) and Rapidly-exploring Random Tree (RRT) algorithms, find feasible paths by randomly exploring the environment space, which are more efficient than search-based algorithms when facing high-dimensional and large-scale environments, but only guarantee that the solution found is asymptotically optimal. The third is heuristic algorithms [2], such as genetic algorithm (GA) and particle swarm optimization (PSO) algorithm, which are more efficient when facing partially known or unknown environments. They generate a set of local optimal solutions at each iteration, and then iteratively improve them according to different fitness functions and optimization policies.
传统的路径规划方法首先需要整合各种环境信息,构建一个可供规划算法使用的环境模型,如配置空间(C-space)、能见度图、网格图和 Voronoi 图等,然后利用这些模型找到最优路径[6]。我们可以将这些规划算法大致分为以下几类。首先是基于搜索的算法 [7],比如 Dijkstra 和 A*,其优点是如果存在最优路径,那么通过逐步探索整个环境,这些算法可以保证找到最优解。第二类基于采样的算法 [8],如概率路线图 (PRM) 和快速探索随机树 (RRT) 算法,通过随机探索环境空间来寻找可行的路径,在面对高维和大规模环境时,比基于搜索的算法更有效,但只能保证找到的解是渐近最优的。第三种是启发式算法 [2],例如遗传算法 (GA) 和粒子群优化 (PSO) 算法,它们在面对部分已知或未知的环境时效率更高。他们在每次迭代中生成一组局部最优解,然后根据不同的适应度函数和优化策略迭代改进它们。
With the rapid development of deep learning techniques in recent years, researchers have started to focus more on learning-based planning algorithms [9]. Learning-based algorithms build suitable neural network architectures by taking as input raw data about environmental information, such as data from satellites and radars, or data collected by sensors carried by robots, without relying on environmental models, and then train these networks by supervised learning or reinforcement learning to output feasible paths that meet specified requirements. MPNet [10] takes 3D point cloud data of the environment space as input and implements collision-free path planning with motion constraints considered. MPNet can also be combined with traditional planning algorithms to speed up the training of neural networks and improve the quality of planned paths. In addition, many novel deep learning techniques have been applied to solve motion planning problems, for example, OracleNet uses RNN [11], TDPP-NET adopts Imitation learning method [12], Pathgan even applies Generative Adversarial Neworks (GANs) [13]. However, these methods are usually trained and predicted in a certain environment instance, and it is difficult to effectively transfer the learned policy to other similar environments, lacking the ability to solve a class of problems through planning computations.
近年来,随着深度学习技术的快速发展,研究人员开始更多地关注基于学习的计划算法 [9]。基于学习的算法通过输入有关环境信息的原始数据(例如来自卫星和雷达的数据,或机器人携带的传感器收集的数据)来构建合适的神经网络架构,然后通过监督学习或强化学习训练这些网络,以输出满足指定要求的可行路径。MPNet [10] 将环境空间的 3D 点云数据作为输入,并在考虑运动约束的情况下实现无碰撞路径规划。MPNet 还可以与传统规划算法相结合,以加快神经网络的训练并提高规划路径的质量。此外,许多新颖的深度学习技术已被应用于解决运动规划问题,例如,OracleNet 使用 RNN [11],TDPP-NET 采用模仿学习方法 [12],Pathgan 甚至应用了生成对抗性网络 (GAN) [13]。但是,这些方法通常是在某个环境实例中训练和预测的,很难将学习到的策略有效地转移到其他类似的环境中,缺乏通过规划计算解决一类问题的能力。
Reinforcement learning (RL) is a trial-and-error approach to train a policy that allows an agent to make decisions that maximize future cumulative rewards, rather than just focusing on the reward that can be obtained in the present, which in essence requires an algorithm with some ability to plan [14]. If we describe the path planning task as a Markov decision process (MDP), we can use RL algorithms to solve the path planning problem [15]. Tamar et al. [16] approximated the
强化学习 (RL) 是一种试错法,用于训练策略,允许代理做出最大化未来累积奖励的决策,而不仅仅是关注当下可以获得的奖励,这本质上需要具有一定规划能力的算法 [14]。如果我们将路径规划任务描述为马尔可夫决策过程 (MDP),我们可以使用 RL 算法来解决路径规划问题 [15]。Tamar 等 [16] 近似

value iteration (VI) algorithm as a convolution architecture embedded in a neural network to obtain a value iteration network (VIN) with “planning capability”, which is trained to effectively generalize to new environments similar to the training set and outperforms standard convolutional neural network (CNN) architectures on navigation and path planning tasks. Although the VIN does not explicitly model the environment, it combines the advantages of both model-free and model-based RL algorithms, enabling both explicit planning computation and end-to-end training and inference, while not requiring a reward function to be set. However, limited by the convolutional architecture it uses, the VIN can only accept structurally regular data (e.g., images) as input, and the task-related MDP must be fixed and known. Several recent research works have further extended this value iteration-based planning algorithm, such as GVIN [17] and XLVIN [18] which apply VIN to graph-structured data using graph operators instead of the convolutions. The VPN [19] defines a maximum propagation algorithm and also approximates it as a convolution and max-pooling operations, achieving better results in dynamic environments compared to VIN. The UVIN [20] introduces a clustering algorithm and successfully extends the VIN to MDP-variable environments.
值迭代 (VI) 算法作为嵌入神经网络中的卷积架构,以获得具有“规划能力”的值迭代网络 (VIN),该网络经过训练可以有效地泛化到类似于训练集的新环境,并在导航和路径规划任务上优于标准卷积神经网络 (CNN) 架构。虽然 VIN 没有显式地对环境进行建模,但它结合了无模型和基于模型的 RL 算法的优势,支持显式规划计算和端到端训练和推理,同时不需要设置奖励函数。然而,受其使用的卷积架构的限制,VIN 只能接受结构规则的数据(例如图像)作为输入,并且与任务相关的 MDP 必须是固定的和已知的。最近的一些研究工作进一步扩展了这种基于值迭代的计划算法,例如 GVIN [17] 和 XLVIN [18],它们使用图运算符而不是卷积将 VIN 应用于图结构数据。VPN [19] 定义了一种最大传播算法,并将其近似为卷积和最大池化作,与 VIN 相比,在动态环境中取得了更好的结果。UVIN [20] 引入了一种聚类算法,并成功地将 VIN 扩展到 MDP 可变环境。
While conventional CNNs, such as ResNet [21], progressively compress the spatial resolution of feature maps through multi-layer convolution operations to extract global features for planning tasks, the VIN achieves higher accuracy and better generalization capability by extracting local features using an explicit planning module derived from the value iteration algorithm. It is worth noting that when we talk about local features here, we mean that the global information is aggregated to every position in the environmental space by the convolution operation. The Gated path planning network (GPPN) [22] proved that the explicit value iteration process is not necessary and can be replaced by an implicit LSTM unit, and what really matters is actually the extraction of local feature representation of environments. The Dualbranch convolutional neural network (DB-CNN) [4] built on the architecture of ResNet by extracting local features with a parallel convolution branch and fusing the global and local features for planning, achieving a better performance beyond. In general, the DB-CNN gives a more generic approach to path planning tasks by using CNNs, which has higher accuracy while significantly improving computational efficiency. However, since the space resolution of feature maps in the local feature extraction branch remains the same, it is difficult for the DB-CNN to adopt a deeper network architecture to balance efficiency and accuracy, which actually limits the expressiveness of the model, yet a deeper network will further increase the model parameters and easily cause over-fitting.
传统的 CNN,如 ResNet [21],通过多层卷积作逐步压缩特征图的空间分辨率,以提取用于规划任务的全局特征,而 VIN 通过使用从值迭代算法派生的显式规划模块提取局部特征,实现了更高的准确性和更好的泛化能力。值得注意的是,当我们在这里谈论局部特征时,我们的意思是通过卷积运算将全局信息聚合到环境空间中的每个位置。门控路径规划网络 (GPPN) [22] 证明了显式值迭代过程不是必需的,可以用隐式 LSTM 单元代替,真正重要的是环境局部特征表示的提取。Dualbranch 卷积神经网络 (DB-CNN) [4] 基于 ResNet 架构构建,通过并行卷积分支提取局部特征,并融合全局和局部特征进行规划,实现更好的性能超越。一般来说,DB-CNN 通过使用 CNN 为路径规划任务提供了一种更通用的方法,该方法具有更高的准确性,同时显著提高了计算效率。然而,由于局部特征提取分支中特征图的空间分辨率保持不变,DB-CNN 很难采用更深的网络架构来平衡效率和准确性,这实际上限制了模型的表达能力,但更深的网络会进一步增加模型参数,容易造成过拟合。
In this work, we continue to follow the idea of DB-CNN by constructing a neural network to extract both global and local features of the input, and then use the fused features for path planning. Inspired by the feature pyramid network (FPN) [23] commonly used in the field of object detection, we connect the two branches of DB-CNN and use the output of the global
在这项工作中,我们继续遵循 DB-CNN 的思想,构建一个神经网络来提取输入的全局和局部特征,然后使用融合特征进行路径规划。受目标检测领域常用的特征金字塔网络 (FPN) [23] 的启发,我们将 DB-CNN 的两个分支连接起来,并使用全局

feature branch as the input of the local feature branch, which is equivalent to doubling the depth of the local feature extractor with almost no increase in parameters, which is conducive to obtaining a better local feature representation. We name this novel neural network architecture as the Pyramid Path Planning Network, or P3N for short. In addition, unlike the DB-CNN which only utilizes the output of the last layer of local feature branch, the proposed P3N simultaneously extracts multi-scale features from each stage of the local feature branch, and then adaptively fuses these features into a better representation by a learnable weighting operation. We conducted extensive experiments on both grid maps and satellite terrain image datasets. Results show that our P3N has faster computation speed and higher prediction accuracy compared to the VIN and DB-CNN, and generalizes better on large-scale environments thanks to the deeper local feature extraction network.
feature 分支作为局部特征分支的输入,相当于在参数几乎没有增加的情况下,将局部特征提取器的深度加倍,有利于获得更好的局部特征表示。我们将这种新颖的神经网络架构命名为金字塔路径规划网络,简称 P3N。此外,与仅利用局部特征分支最后一层输出的 DB-CNN 不同,所提出的 P3N 同时从局部特征分支的每个阶段提取多尺度特征,然后通过可学习的加权作将这些特征自适应地融合成更好的表示。我们对网格地图和卫星地形图像数据集进行了广泛的实验。结果表明,与 VIN 和 DB-CNN 相比,我们的 P3N 具有更快的计算速度和更高的预测精度,并且由于更深的局部特征提取网络,在大规模环境中具有更好的泛化能力。
The main contributions of this work are summarized as follows:
这项工作的主要贡献总结如下:
  • We design a network architecture based on the feature pyramid network that can better extract global and local features from the input.
    我们基于特征金字塔网络设计了一个网络架构,可以更好地从输入中提取全局和局部特征。
  • By introducing the novel architecture, we propose the Pyramid Path Planning Network which can adaptively fuse multi-scale features and effectively learn to plan from natural images.
    通过引入新颖的架构,我们提出了金字塔路径规划网络,它可以自适应地融合多尺度特征,并有效地从自然图像中学习规划。
  • Experimental results on grid-world maps and terrain images show that the P3N significantly outperforms the baseline methods with lower computational cost and has better generalization on large-scale domains.
    在网格世界地图和地形图像上的实验结果表明,P3N 的性能明显优于基线方法,计算成本较低,并且在大尺度域上具有更好的泛化能力。

    The paper is organized as follows. Section 2 provides some preliminaries of this work. Section 3 describes the proposed P3N method for global path planning. Experimental results and discussion are presented in section 4. And the conclusion is given in Section 5.
    本文的组织结构如下。第 2 节提供了这项工作的一些初步内容。第 3 节描述了拟议的用于全局路径规划的 P3N 方法。实验结果和讨论在第 4 节中介绍。结论在第 5 节中给出。

II. PRELIMINARIES  II. 先决条件

A. VALUE ITERATION NETWORK
A. 价值迭代网络

1) VALUE ITERATION ALGORITHM
1) 值迭代算法

The MDP corresponds to a tuple [ S , A , P , R ] [ S , A , P , R ] [S,A,P,R][\mathcal{S}, \mathcal{A}, P, \mathcal{R}], where S S S\mathcal{S} is the set of all possible states of the agent, A A A\mathcal{A} is the set of all legal actions of state s , P s , P s,Ps, P is the probability distribution of transferring from the current state s s ss to the next state s s s^(')s^{\prime}, and R R R R RsubeR\mathcal{R} \subseteq \mathbb{R} is the set of reward received from the environment during the state transition. In RL, the goal of agent is to maximize the cumulative reward obtained from the current time t t tt
MDP 对应于一个 tuple [ S , A , P , R ] [ S , A , P , R ] [S,A,P,R][\mathcal{S}, \mathcal{A}, P, \mathcal{R}] ,其中 S S S\mathcal{S} 是代理所有可能状态的集合, A A A\mathcal{A} 是状态 s , P s , P s,Ps, P 所有合法行为的集合 是从当前状态 s s ss 转移到下一个状态 s s s^(')s^{\prime} 的概率分布, R R R R RsubeR\mathcal{R} \subseteq \mathbb{R} 是状态转换期间从环境收到的奖励集合。在 RL 中,agent 的目标是最大化从当前时间 t t tt 获得的累积奖励
G ( t ) = R t + 1 + γ R t + 2 + γ 2 R t + 3 + = Σ k γ k R t + k + 1 G ( t ) = R t + 1 + γ R t + 2 + γ 2 R t + 3 + = Σ k γ k R t + k + 1 G(t)=R_(t+1)+gammaR_(t+2)+gamma^(2)R_(t+3)+dots=Sigma_(k)gamma^(k)R_(t+k+1)G(t)=R_{t+1}+\gamma R_{t+2}+\gamma^{2} R_{t+3}+\ldots=\Sigma_{k} \gamma^{k} R_{t+k+1}
where γ [ 0 , 1 ] γ [ 0 , 1 ] gamma in[0,1]\gamma \in[0,1] is the discount factor to balance the importance of the rewards received at the current and later moments.
其中 γ [ 0 , 1 ] γ [ 0 , 1 ] gamma in[0,1]\gamma \in[0,1] 是 折扣因子,用于平衡当前和以后时刻收到的奖励的重要性。
In order to obtain the maximum cumulative reward, the agent needs to learn a policy π π pi\pi by repeated attempts.
为了获得最大的累积奖励,代理需要通过反复尝试来学习策略 π π pi\pi
The policy is a mapping from state s s ss to the probability π ( a s ) π ( a s ) pi(a∣s)\pi(a \mid s) of all possible actions, i.e., if the agent executes the policy π π pi\pi at moment t t tt, then π ( a s ) π ( a s ) pi(a∣s)\pi(a \mid s) is the probability that A t = a A t = a A_(t)=aA_{t}=a when S t = S S t = S S_(t)=SS_{t}=S. The state-value function of the policy π π pi\pi can be written as
策略是从状态 s s ss 到所有可能作的概率 π ( a s ) π ( a s ) pi(a∣s)\pi(a \mid s) 的映射,即,如果代理在此时 t t tt 执行策略 π π pi\pi ,则 π ( a s ) π ( a s ) pi(a∣s)\pi(a \mid s) A t = a A t = a A_(t)=aA_{t}=a S t = S S t = S S_(t)=SS_{t}=S 的概率。策略 π π pi\pi 的 state-value 函数可以写成

V π ( s ) = E π [ G ( t ) ] S t = s ] = E π [ k γ k R t + k + 1 ] S t = s ] V π ( s ) = E π [ G ( t ) ] S t = s = E π k γ k R t + k + 1 S t = s {:V^(pi)(s)=E_(pi)[G(t)]S_(t)=s]=E_(pi)[sum_(k)gamma^(k)R_(t+k+1)]S_(t)=s]\left.\left.V^{\pi}(s)=\mathrm{E}_{\pi}[G(t)] S_{t}=s\right]=\mathrm{E}_{\pi}\left[\sum_{k} \gamma^{k} \mathcal{R}_{t+k+1}\right] S_{t}=s\right],
which denotes the expected return obtained by executing the policy π π pi\pi from state s s ss at all times. Similarly, the action-value function for the policy π π pi\pi can be written as
这表示通过始终从 state s s ss 执行策略 π π pi\pi 获得的预期回报。同样,策略 π π pi\pi 的 action-value 函数可以写为
Q π ( s , a ) = E π [ k γ k R t + k + 1 ] S t = s , A t = a ] Q π ( s , a ) = E π k γ k R t + k + 1 S t = s , A t = a {:Q^(pi)(s,a)=E_(pi)[sum_(k)gamma^(k)R_(t+k+1)]S_(t)=s,A_(t)=a]\left.Q^{\pi}(s, a)=\mathrm{E}_{\pi}\left[\sum_{k} \gamma^{k} R_{t+k+1}\right] S_{t}=s, A_{t}=a\right]
which denotes the expected return obtained after choosing action a a aa from state s s ss, given that the policy π π pi\pi is executed all the time.
这表示在选择 action a a aa from state s s ss 后获得的预期回报,因为策略 π π pi\pi 一直执行。
For the optimal policy π π pi^(**)\pi^{*}, there is V π ( s ) = V ( s ) V π ( s ) = V ( s ) V^(pi^(**))(s)=V^(**)(s)V^{\pi^{*}}(s)=V^{*}(s), where V ( s ) V ( s ) V^(**)(s)V^{*}(s) is the optimal value function, and
对于最优策略 π π pi^(**)\pi^{*} ,有 V π ( s ) = V ( s ) V π ( s ) = V ( s ) V^(pi^(**))(s)=V^(**)(s)V^{\pi^{*}}(s)=V^{*}(s) ,其中 V ( s ) V ( s ) V^(**)(s)V^{*}(s) 是 最优值函数,
V ( s ) = max a A Q π ( s , a ) = max a s , r p ( s , r s , a ) [ r + γ V ( s ) ] V ( s ) = max a A Q π ( s , a ) = max a s , r p s , r s , a r + γ V s {:[V^(**)(s)=max_(a in A)Q^(pi)(s","a)],[=max_(a)sum_(s^('),r)p(s^('),r∣s,a)∣[r+gammaV^(**)(s^('))]]:}\begin{aligned} V^{*}(s) & =\max _{a \in A} Q^{\pi}(s, a) \\ & =\max _{a} \sum_{s^{\prime}, r} p\left(s^{\prime}, r \mid s, a\right) \mid\left[r+\gamma V^{*}\left(s^{\prime}\right)\right] \end{aligned}
which is the optimal Bellman equation for the policy π π pi^(**)\pi^{*}. For a finite MDP problem, there exists a unique optimal solution to (4) independent of the policy. For any policy π , V π ( s ) π , V π ( s ) pi,V^(pi)(s)\pi, V^{\pi}(s) will converge to V ( s ) V ( s ) V^(**)(s)V^{*}(s) with probability 1 by iteratively solving (4), which is known as the value iteration algorithm.
这是策略 π π pi^(**)\pi^{*} 的最优 Bellman 方程。对于有限 MDP 问题,存在独立于策略的 (4) 的唯一最优解。因为任何策略 π , V π ( s ) π , V π ( s ) pi,V^(pi)(s)\pi, V^{\pi}(s) 都将通过迭代求解 (4) 收敛到 V ( s ) V ( s ) V^(**)(s)V^{*}(s) 概率 1,这称为值迭代算法。

2) APPROXIMATE VALUE ITERATION MODULE
2) 近似值迭代模块

We consider the path planning problem on a two-dimensional gird map and denote by M M MM the MDP of the domain with respect to the planning policy π π pi\pi. We also assume that there exists an unknown M ¯ M ¯ bar(M)\bar{M} where the optimal policy π ¯ π ¯ bar(pi)^(**)\bar{\pi}^{*} contains some useful information about π π pi^(**)\pi^{*} in M M MM. If it is possible to solve M ¯ M ¯ bar(M)\bar{M} and use the solution of M ¯ M ¯ bar(M)\bar{M} as part of π π pi\pi, then π π pi\pi can automatically learn and use π π pi^(**)\pi^{*} to solve M M MM. To establish a connection between M ¯ M ¯ bar(M)\bar{M} and M M MM, let r ¯ = f R ( ϕ ( s ) ) r ¯ = f R ( ϕ ( s ) ) bar(r)=f_(R)(phi(s))\bar{r}=f_{R}(\phi(s)), and p ¯ = f P ( ϕ ( s ) ) p ¯ = f P ( ϕ ( s ) ) bar(p)=f_(P)(phi(s))\bar{p}=f_{P}(\phi(s)), where ϕ ( s ) ϕ ( s ) phi(s)\phi(s) denotes the observation of state s s ss.
我们在二维网格图上考虑路径规划问题,并用 M M MM 域的 MDP 表示关于规划策略 π π pi\pi 。我们还假设存在一个未知 M ¯ M ¯ bar(M)\bar{M} 数,其中最佳策略 π ¯ π ¯ bar(pi)^(**)\bar{\pi}^{*} 包含有关 π π pi^(**)\pi^{*} 中的 M M MM 一些有用信息。如果可以求解 M ¯ M ¯ bar(M)\bar{M} 并使用 M ¯ M ¯ bar(M)\bar{M} 的解 作为 π π pi\pi 的一部分 ,那么 π π pi\pi 可以自动学习和使用 π π pi^(**)\pi^{*} 来解决 M M MM 。要在 和 之间建立 M ¯ M ¯ bar(M)\bar{M} 连接,则 、 r ¯ = f R ( ϕ ( s ) ) r ¯ = f R ( ϕ ( s ) ) bar(r)=f_(R)(phi(s))\bar{r}=f_{R}(\phi(s)) p ¯ = f P ( ϕ ( s ) ) p ¯ = f P ( ϕ ( s ) ) bar(p)=f_(P)(phi(s))\bar{p}=f_{P}(\phi(s)) ,其中 ϕ ( s ) ϕ ( s ) phi(s)\phi(s) 表示对状态 s s ss M M MM 观察。
In VIN, the approximate value iteration module can be written as
在 VIN 中,近似值迭代模块可以写成
v ¯ k = max a ¯ A ¯ q ¯ k , q ¯ k = h ( r ¯ , v ¯ k 1 ) , v ¯ k = max a ¯ A ¯ q ¯ k , q ¯ k = h r ¯ , v ¯ k 1 , bar(v)^(k)=max_( bar(a)in bar(A)) bar(q)^(k), bar(q)^(k)=h(( bar(r)), bar(v)^(k-1)),\bar{v}^{k}=\max _{\bar{a} \in \bar{A}} \bar{q}^{k}, \bar{q}^{k}=h\left(\bar{r}, \bar{v}^{k-1}\right),
where k [ 1 , K ] k [ 1 , K ] k in[1,K]k \in[1, K] is the number of iterations, and the selection of K K KK is dependent on the map size. According to (4), we have
其中 k [ 1 , K ] k [ 1 , K ] k in[1,K]k \in[1, K] 是迭代次数,选择 K K KK 取决于 map 大小。根据 (4),我们有
h ( r ¯ , v ¯ ) = r ¯ + γ p ¯ ( s ¯ , a ¯ ) v ¯ h ( r ¯ , v ¯ ) = r ¯ + γ p ¯ ( s ¯ , a ¯ ) v ¯ h( bar(r), bar(v))= bar(r)+gamma sum bar(p)( bar(s), bar(a)) bar(v)h(\bar{r}, \bar{v})=\bar{r}+\gamma \sum \bar{p}(\bar{s}, \bar{a}) \bar{v}
In the 2D grid-world environment, the state transition is a one-step movement from one cell to the surrounding 8 cells, and the transition probability p p pp has the property of local connectivity. So (6) can be approximated as a convolution operation, with
在 2D 网格世界环境中,状态转换是从 1 个单元到周围 8 个单元的一步移动,转换概率 p p pp 具有局部连通性的性质。所以 (6) 可以近似为卷积运算,其中
q ¯ a , i , j k = ( i , j ) W a , i i , j j [ r , v ] [ r ¯ i , j , v ¯ i , j k 1 ] q ¯ a , i , j k = i , j W a , i i , j j [ r , v ] r ¯ i , j , v ¯ i , j k 1 bar(q)_(a,i,j)^(k)=sum_((i^('),j^(')))W_(a,i^(')-i,j^(')-j)^([r,v])[ bar(r)_(i^('),j^(')), bar(v)_(i^('),j^('))^(k-1)]\bar{q}_{a, i, j}^{k}=\sum_{\left(i^{\prime}, j^{\prime}\right)} W_{a, i^{\prime}-i, j^{\prime}-j}^{[r, v]}\left[\bar{r}_{i^{\prime}, j^{\prime}}, \bar{v}_{i^{\prime}, j^{\prime}}^{k-1}\right]
v ¯ i , j k = max a q ¯ a , i , j k , v ¯ i , j k = max a q ¯ a , i , j k , bar(v)_(i,j)^(k)=max_(a) bar(q)_(a,i,j)^(k),\bar{v}_{i, j}^{k}=\max _{a} \bar{q}_{a, i, j}^{k},
where the tuple ( i , j ) N ( i , j ) i , j N ( i , j ) (i^('),j^('))inN(i,j)\left(i^{\prime}, j^{\prime}\right) \in \mathcal{N}(i, j) is the neighbor of the agent’s position ( i , j ) ( i , j ) (i,j)(i, j), and W W WW the parameters of the convolution kernel. And this is the approximate value iteration module in VIN.
其中 tuples ( i , j ) N ( i , j ) i , j N ( i , j ) (i^('),j^('))inN(i,j)\left(i^{\prime}, j^{\prime}\right) \in \mathcal{N}(i, j) 是 agent's position ( i , j ) ( i , j ) (i,j)(i, j) 的邻居,以及 W W WW 卷积内核的参数。这是 VIN 中的近似值迭代模块。

B. DB CNN

Before DB-CNN proposed, most deep CNNs for path planning tasks only had a backbone network for extracting environmental features and estimating value functions. However, due to the local connectivity structure of convolution operation, more convolution layers have to be stacked to obtain better feature representation, which both increase the computational cost and make the models difficult to train. The DB-CNN built two parallel branches to extract both global and local features, which allowed the network to be shallower and more computationally efficient and with better model performance.
在 DB-CNN 提出之前,大多数用于路径规划任务的深度 CNN 只有一个用于提取环境特征和估计值函数的骨干网络。然而,由于卷积运算的局部连接结构,必须堆叠更多的卷积层才能获得更好的特征表示,这既增加了计算成本,又使模型难以训练。DB-CNN 构建了两个并行分支来提取全局和局部特征,这使得网络更浅、计算效率更高,模型性能更好。
The architecture of DB-CNN consists of three parts, which are pre-processing stage, branch one for global feature representation and branch two for local feature representation. The pre-processing stage usually contains multiple convolution layers, which serve to filter the noise in the input and decide whether to compress the feature maps according to the task properties. If needed, pooling layers can be added after the convolutions to compress the spatial resolution of feature maps.
DB-CNN 的架构由三部分组成,分别是预处理阶段、分支 1 用于全局特征表示和分支 2 用于局部特征表示。预处理阶段通常包含多个卷积层,用于过滤输入中的噪声,并根据任务属性决定是否压缩特征图。如果需要,可以在卷积之后添加池化层,以压缩特征图的空间分辨率。
The branch one includes a convolution layer, several residual modules and two fully connected layers. Both the convolution layer and residual modules are followed by maxpooling layers to progressively reduce the spatial dimension of feature maps. The residual modules include two convolution layers with shared parameters, a skip connection, and two ReLU activation functions [24], which are located after the first convolutional layer and before the final output, respectively. The stacked residual modules can increase the depth of the model and improve the training accuracy, while the shared parameters can not only reduce the computational effort, but also prevent the model from over-fitting. The output of last residual module is expanded into a one-dimensional from to feed the fully-connected layers, and after two nonlinear transformations, the global features of the environment are obtained.
分支 1 包括一个卷积层、几个残差模块和两个全连接层。卷积层和残差模块之后都是 maxpooling 层,以逐步减小特征图的空间维度。残差模块包括两个具有共享参数的卷积层、一个跳过连接和两个 ReLU 激活函数 [24],它们分别位于第一个卷积层之后和最终输出之前。堆叠的残差模块可以增加模型的深度,提高训练精度,而共享参数不仅可以减少计算工作量,还可以防止模型过拟合。将最后一个残差模块的输出扩展为一维 from 以馈送全连接层,经过两次非线性变换,得到环境的全局特征。
The branch two consists of two convolution layers and several residual modules. By following the VIN, the branch two keeps the spatial resolution of feature maps constant throughout, so the computational cost is higher compared to the branch one. After trading off the model accuracy and computational efficiency, the DB-CNN chooses a channel dimension of 20, which preserves more input information without dramatically affecting the computational speed.
分支 2 由两个卷积层和几个残差模块组成。通过遵循 VIN,分支 2 在整个过程中保持特征图的空间分辨率恒定,因此与分支 1 相比,计算成本更高。在权衡模型精度和计算效率后,DB-CNN 选择了 20 的通道维度,这样可以保留更多的输入信息,而不会显着影响计算速度。
Finally, the DB-CNN concatenates global and local features in the channel dimension, and then feeds them into one or more fully-connected layers, outputting Q Q QQ values for each executable action in the current state.
最后,DB-CNN 将通道维度中的全局和局部特征连接起来,然后将它们馈送到一个或多个全连接层中,输出 Q Q QQ 当前状态下每个可执行作的值。

FIGURE 1. The three schemes for object detection before the pyramid feature network. (a) Feature pyramids built upon image pyramids. (b) Single scale feature for fast detection. © Feature pyramids by reusing the pyramidal hierarchy of deep convolutional networks.
图 1.金字塔特征网络之前对象检测的三种方案。(a) 基于图像金字塔构建的要素金字塔。(b) 用于快速检测的单刻度功能。© 通过重用深度卷积网络的金字塔层次结构来表征金字塔。
In general, the DB-CNN replaces the computationally tedious value iteration module in the VIN with a local feature extractor, and learns global features with another regular CNN branch, giving a more universal framework for solving path planning problems, which has a higher prediction accuracy while significantly reducing computational cost. On the one hand, the DB-CNN, compared to the VIN, uses a relatively shallow network architecture, which actually limits its performance in large-scale environments. On the other hand, if we blindly increase the depth of two feature extraction networks, it is easy to cause over-fitting of the model, and thus we are caught in a dilemma.
一般来说,DB-CNN 用本地特征提取器替换了 VIN 中计算繁琐的值迭代模块,并使用另一个常规 CNN 分支学习全局特征,为解决路径规划问题提供了更通用的框架,在显著降低计算成本的同时具有更高的预测精度。一方面,与 VIN 相比,DB-CNN 使用相对浅的网络架构,这实际上限制了它在大规模环境中的性能。另一方面,如果我们盲目地增加两个特征提取网络的深度,很容易造成模型的过拟合,从而陷入两难境地。
In the next section, we will give the design of a novel path planning network that can effectively increase the model depth without significantly increasing the computational cost, and improve the model performance at the meantime.
在下一节中,我们将设计一种新的路径规划网络,它可以在不显著增加计算成本的情况下有效地增加模型深度,同时提高模型性能。

C. FEATURE PYRAMID NETWORK
C. 特征金字塔网络

With the rapid development of deep learning techniques, researchers have designed various deep neural network architectures for image recognition, detection, and segmentation. In particular, the object detection task means to detect and localize multiple objects at different scales from a single image, thus requiring learning to obtain a multi-scale feature representation of the image.
随着深度学习技术的快速发展,研究人员设计了各种用于图像识别、检测和分割的深度神经网络架构。具体而言,目标检测任务是指从单个图像中检测和定位不同尺度的多个目标,因此需要学习以获得图像的多尺度特征表示。
FIGURE 1 shows several schemes for learning from the multi-scale representation of images for object detection.
图 1 显示了从图像的多尺度表示中学习以进行对象检测的几种方案。

FIGURE 2. (a) The architecture of feature pyramid network (FPN). (b) The up-sample module in the top-down pathway of FPN.
图 2.(a) 特征金字塔网络 (FPN) 的架构。(b) FPN 自上而下路径中的上采样模块。
Among them, the first scheme down-samples the original image to construct the image pyramid at different scales, and then detects objects individually on each of the image scales independently. These image pyramids are scale-independent, and contains rich semantic information, thus providing better performance with high computational cost. The second is inspired by the deep neural networks applied to image recognition, using the well-designed CNNs to automatically learn features at different scales from the input image and then make predictions with the small-scale features for faster detection. Although the high-level feature maps are semantically strong, they does not contain the accurate location information of small objects, which harms the final prediction accuracy. The third reuses the inherent, pyramidal feature hierarchy computed by CNNs and extracts multi-scale features of different spatial resolutions with marginal computational cost, but introduces large semantic gaps caused by different layers. The high-resolution feature maps may harm the representation capacity for object detection Due to lack of adequate semantic information.
其中,第一种方案对原始图像进行下采样,以构建不同比例尺的影像金字塔,然后独立地在每个图像尺度上单独检测对象。这些图像金字塔与比例无关,并包含丰富的语义信息,因此可以提供更好的性能和较高的计算成本。第二个研究的灵感来自应用于图像识别的深度神经网络,使用精心设计的 CNN 从输入图像中自动学习不同尺度的特征,然后使用小规模特征进行预测以加快检测速度。虽然高级特征图语义很强,但不包含小目标的准确位置信息,损害了最终的预测精度。第三个方法重用了 CNN 计算的固有金字塔形特征层次结构,并以边际计算成本提取不同空间分辨率的多尺度特征,但引入了由不同层引起的较大语义差距。由于缺乏足够的语义信息,高分辨率的特征图可能会损害目标检测的表示能力。
The FPN, shown in FIGURE 2, makes full use of the pyramidal hierarchy structure of CNN by constructing a new top-down pathway in addition to the bottom-up backbone, and fully integrates the high-level semantic features with low resolution and the low-level positional features with high resolution through the lateral connections between the two pathways. The FPN has the property of full convolution, which can receive arbitrary size images as input and then output the corresponding size feature maps. Moreover, the
如图 2 所示的 FPN 充分利用了 CNN 的金字塔层次结构,在自下而上的主干之外构建了一条新的自上而下的通路,并通过两条通路之间的横向连接,充分融合了低分辨率的高级语义特征和高分辨率的低级位置特征。FPN 具有全卷积的特性,可以接收任意大小的图像作为输入,然后输出对应大小的特征图。此外,

construction of the feature pyramid pathway is independent of the architecture of the backbone, so they can be designed separately.
特征金字塔路径的构造独立于 backbone 的架构,因此可以单独设计。
The bottom-up backbone, usually derived from modern CNNs for image recognition, consists of stacked multi-stage convolutional modules that gradually reduce the spatial dimensions of the feature maps by pooling or convolution operations with stride size 2 . The top-down pathway is built step-by-step via the fusion operation shown in FIGURE 2 (b). The feature maps with resolution m × m × c 1 m × m × c 1 m xx m xxc_(1)m \times m \times c_{1} from higher levels are up-sampled to 2 m × 2 m × c 1 2 m × 2 m × c 1 2m xx2m xxc_(1)2 m \times 2 m \times c_{1}, then the feature maps with resolution 2 m × 2 m × c 2 2 m × 2 m × c 2 2m xx2m xxc_(2)2 m \times 2 m \times c_{2} from the backbone are convolved to adjust the channel dimension to c 1 c 1 c_(1)c_{1}, and finally the two features are first summed and then convolved (in order to eliminate the confounding effect of up-sampling) to generate feature maps that contains both rich semantic and location information. The FPN takes full advantage of the multi-stage pyramidal structure of the backbone to gradually integrates semantic information into high-resolution feature maps from top to bottom, which is conducive to the detection of small objects.
自下而上的主干,通常源自用于图像识别的现代 CNN,由堆叠的多级卷积模块组成,这些模块通过步幅大小为 2 的池化或卷积作逐渐减小特征图的空间维度。自上而下的途径是通过图 2 (b) 所示的融合作逐步构建的。将分辨率 m × m × c 1 m × m × c 1 m xx m xxc_(1)m \times m \times c_{1} 来自更高级别的特征图上采样到 2 m × 2 m × c 1 2 m × 2 m × c 1 2m xx2m xxc_(1)2 m \times 2 m \times c_{1} ,然后将分辨率 2 m × 2 m × c 2 2 m × 2 m × c 2 2m xx2m xxc_(2)2 m \times 2 m \times c_{2} 来自主干的特征图进行卷积,将通道维度调整为 c 1 c 1 c_(1)c_{1} ,最后将两个特征先求和,再进行卷积(为了消除上采样的混杂效应),生成包含丰富语义和位置信息的特征图。FPN 充分利用了 backbone 的多级金字塔结构,将语义信息从上到下逐步整合到高分辨率的特征图中,有利于小目标的检测。

III. METHODS  三、方法

A. THE IMPROVED DB-CNN
A. 改进的 DB-CNN

The VIN approximates the value iteration algorithm in RL with a convolution operator, and performs explicit planning computation to implement an end-to-end path planning algorithm using high-dimensional images as input. Most of the improved methods built upon VIN retain the explicit planning module, which differ in their implementation but are generally based on iterative computation. We refer to this class of methods as VI-based planning algorithms. Experiments in [16] show that although CNNs also have good single-step prediction accuracy in path planning tasks, they lack long-term planning capability, performing poorly on tasks that require multi-step decision-making.
VIN 使用卷积算子近似 RL 中的值迭代算法,并执行显式规划计算,以实现使用高维图像作为输入的端到端路径规划算法。大多数基于 VIN 构建的改进方法都保留了显式规划模块,它们的实现不同,但通常基于迭代计算。我们将这类方法称为基于 VI 的规划算法。[16] 中的实验表明,尽管 CNN 在路径规划任务中也具有良好的单步预测精度,但它们缺乏长期规划能力,在需要多步决策的任务上表现不佳。
The DB-CNN builds two parallel network branches. The branch one, consistent with the conventional CNN, learns the low-dimensional global feature representation of the high-dimensional input by compressing the feature map dimensions step by step. The branch two, following the VIN, extracts the local features related to the current position from the environmental information. Without the explicit iterative computation, the branch two of DB-CNN is still the conventional CNN architecture, except that the spatial dimension of feature maps is always kept the same in the forward computation process to facilitate the extraction of local features by the attention mechanism. Although the quality of local features obtained by this way may not be as good as that of VIN, after aggregating with the global features of branch one, the DB-CNN achieves a better performance on the path planning task and has a lower computational effort. We refer to this class of methods as CNN-based planning algorithms.
DB-CNN 构建两个并行网络分支。分支一与传统的 CNN 一致,通过逐步压缩特征图维度来学习高维输入的低维全局特征表示。分支 2 位于 VIN 之后,从环境信息中提取与当前位置相关的局部特征。在没有显式迭代计算的情况下,DB-CNN 的分支 2 仍然是传统的 CNN 架构,只是在前向计算过程中特征图的空间维度始终保持不变,以方便通过注意力机制提取局部特征。虽然通过这种方式获得的局部特征的质量可能不如 VIN,但在与分支 1 的全局特征聚合后,DB-CNN 在路径规划任务上取得了更好的性能,并且计算工作量较低。我们将这类方法称为基于 CNN 的计划算法。
In general, the VI-based methods accomplishes path planning with only local features through explicit planning
通常,基于 VI 的方法通过显式规划完成仅具有局部特征的路径规划

computation, while the CNN-based methods, relying on the conventional CNN architecture, achieve better performance with a simpler network architecture. However, the CNN-based methods are more prone to over-fitting and performs much worse on the test set. Although DB-CNN uses parameter sharing in the convolution module, it still has much more parameters than the VIN, making it more difficult to train.
计算,而基于 CNN 的方法依赖于传统的 CNN 架构,通过更简单的网络架构获得更好的性能。但是,基于 CNN 的方法更容易出现过度拟合,并且在测试集上的性能要差得多。尽管 DB-CNN 在卷积模块中使用了参数共享,但它的参数仍然比 VIN 多得多,这使得训练更加困难。
In addition, when facing large-scale environments, VI-based methods have full-convolution property and can receive any size input without modifying the model architecture, only the number of iterations of the VI module needs to be adjusted accordingly. However, this also leads to a rapid growth in the computational effort as the input size increases, and the time required to train a model from scratch that can perform path planning on large-size maps is almost unacceptable. The CNN-based methods, when receiving large-size maps as input, requires stacking more convolution layers to obtain a larger effective receptive field. Although this is much smaller in terms of the increase of computation, it is still costly to achieve planning on large-size maps because the feature map resolution of the local feature extractor always remains the same.
此外,当面对大规模环境时,基于 VI 的方法具有全卷积属性,可以在不修改模型架构的情况下接收任何大小的输入,只需要相应地调整 VI 模块的迭代次数。然而,随着输入大小的增加,这也会导致计算工作量的快速增长,并且从头开始训练可以在大尺寸地图上执行路径规划的模型所需的时间几乎是不可接受的。基于 CNN 的方法,当接收大尺寸映射作为输入时,需要堆叠更多的卷积层以获得更大的有效感受野。尽管这在计算量的增加方面要小得多,但在大尺寸映射上实现规划的成本仍然很高,因为本地特征提取器的特征图分辨率始终保持不变。
In this section, we follow the idea of DB-CNN by building a network to extract both global and local features, and then use the fused features for path planning. Since the branch one of DB-CNN has already obtained a global representation about the input information, we can actually continue to use this global feature as the input of branch two, which is equivalent to directly doubling the network depth with almost no increase in computation, which is beneficial to obtain a better representation of local features.
在本节中,我们遵循 DB-CNN 的思想,构建一个网络来提取全局和局部特征,然后使用融合特征进行路径规划。由于 DB-CNN 的分支 1 已经获得了关于输入信息的全局表示,我们实际上可以继续使用这个全局特征作为分支 2 的输入,这相当于在几乎没有增加计算量的情况下,直接将网络深度加倍,有利于获得更好的局部特征表示。
FIGURE 3 shows a simple improvement to the DB-CNN, where we keep the structure of branch one the same and take the output of the last convolution module as the input of branch 2, which can significantly increase the network depth of branch two to obtain better local feature representation with marginal extra cost. The improved model no longer contains two parallel branches, but is similar to the FPN.
图 3 显示了对 DB-CNN 的简单改进,其中我们保持分支 1 的结构相同,并将最后一个卷积模块的输出作为分支 2 的输入,这可以显着增加分支 2 的网络深度,以边际额外成本获得更好的局部特征表示。改进的模型不再包含两个并行分支,而是类似于 FPN。

B. PYRAMID PATH PLANNING NETWORK
B. 金字塔路径规划网络

We can see that the branch one of the improved DB-CNN is similar to the backbone part of the FPN, while the branch two can be regarded as the feature pyramids. And with the lateral connections established between the two branches, we can obtain a Pyramid-based Path Planning Network, abbreviated as the P3N.
我们可以看到,改进后的 DB-CNN 的分支 1 与 FPN 的 backbone 部分相似,而分支 2 可以看作是特征金字塔。并且随着两个分支之间建立的横向连接,我们可以得到一个基于金字塔的路径规划网络,缩写为 P3N。
FIGURE 4 illustrates the overall architecture of P3N, where the bottom-up pathway is used to extract the global features, and the top-down pyramid pathway fuses features across a large of scales to construct a multi-scale feature representation. Unlike the DB-CNN which only gets local features at the high-resolution level, the P3N extracts features at each stage, and then fuses them to obtain the semantically-strong local features.
图 4 说明了 P3N 的整体架构,其中自下而上的路径用于提取全局特征,而自上而下的金字塔路径融合了大尺度的特征以构建多尺度特征表示。与仅获取高分辨率级别的局部特征的 DB-CNN 不同,P3N 在每个阶段提取特征,然后将它们融合以获得语义上很强的局部特征。


(b)  (二)
FIGURE 3. (a) The architecture of DB-CNN with branch one for global feature representation and branch two for local feature representaion. (b) The improved DB-CNN with a FPN-like local feature extractor.
图 3.(a) DB-CNN 的架构,其中分支 1 用于全局特征表示,分支 2 用于局部特征表示。(b) 具有类似 FPN 的局部特征提取器的改进型 DB-CNN。

FIGURE 4. The architecture of our pyramid path planning network (P3N).
图 4.我们的金字塔路径规划网络 (P3N) 的架构。
We assume that features from different stages should have different contributions to the final local feature, so we consider assigning different weights to these features and learning them by end-to-end training. We denote the weights by w i w i w_(i)w_{i} and the feature vector corresponding to the i i ii th stage of the feature pyramid branch by I p i I p i I_(p_(i))I_{p_{i}}, then the final local features can be expressed as
我们假设来自不同阶段的特征应该对最终的局部特征有不同的贡献,因此我们考虑为这些特征分配不同的权重,并通过端到端训练来学习它们。我们用 表示 w i w i w_(i)w_{i} 权重,用 表示对应于特征金字塔分支第 i i ii 阶段的特征向量 I p i I p i I_(p_(i))I_{p_{i}} ,那么最终的局部特征可以表示为
I local = i w i I p i I local  = i w i I p i I_("local ")=sum_(i)w_(i)*I_(p_(i))I_{\text {local }}=\sum_{i} w_{i} \cdot I_{p_{i}}
where w i w i w_(i)w_{i}, as a learnable scalar parameter, can take any value, which may lead to training instability.
其中 w i w i w_(i)w_{i} ,作为可学习的标量参数,可以采用任何值,这可能会导致训练不稳定。

FIGURE 5. The backbone of P3N.
图 5.P3N 的主干。
Therefore we further constrain the sum of all weight parameters to be 1 , with
因此,我们进一步将所有权重参数的总和限制为 1 ,其中
I local = i e w i j e w j I p i I local  = i e w i j e w j I p i I_("local ")=sum_(i)(e^(w_(i)))/(sum_(j)e^(w_(j)))*I_(p_(i))I_{\text {local }}=\sum_{i} \frac{e^{w_{i}}}{\sum_{j} e^{w_{j}}} \cdot I_{p_{i}}
where we apply the softmax operation to all weight parameters and constrain their values to range from 0 to 1 .
其中,我们将 softmax 作应用于所有权重参数,并将它们的值限制为 0 到 1 的范围。
However, the softmax function will bring a large computational cost and may affect the running speed of the model. Tan et al. [25] proposed a fast normalization method that can replace the softmax operation, so we can rewrite (9) as
但是,softmax 函数会带来较大的计算成本,并可能影响模型的运行速度。Tan 等人 [25] 提出了一种快速归一化方法,可以代替 softmax 作,因此我们可以将 (9) 改写为
I local = i | w i | ϵ + j | w j | I p i I local  = i w i ϵ + j w j I p i I_("local ")=sum_(i)(|w_(i)|)/(epsilon+sum_(j)|w_(j)|)*I_(p_(i))I_{\text {local }}=\sum_{i} \frac{\left|w_{i}\right|}{\epsilon+\sum_{j}\left|w_{j}\right|} \cdot I_{p_{i}}
where ϵ ϵ epsilon\epsilon is a small positive number to avoid the numerical instability associated with a denominator value close to 0 . The weight parameter is also constrained to be between 0 and 1 , but this method is much more efficient than the softmax.
其中 ϵ ϵ epsilon\epsilon 是一个小的正数,以避免与接近 0 的分母值相关的数值不稳定性。weight 参数也被限制为介于 0 和 1 之间,但此方法比 softmax 效率高得多。
Furthermore, to obtain a better global feature representation, we reconstruct the backbone of P3N on the basis of DB-CNN, drawing on a variety of modern CNN design paradigms. FIGURE 5 illustrates the backbone of P3N used to take in the 128 × 128 128 × 128 128 xx128128 \times 128 resolution images, which consists of three parts. The first part is the pre-processing stage, which includes a 7 × 7 7 × 7 7xx77 \times 7 convolution layer with a stride size of 1 to keep the resolution of input images constant. Since object detection is usually performed on a relatively lower-resolution feature maps, the FPN reduces the feature map resolution in the pre-processing stage to decrease the computational effort in the subsequent steps. Whereas in the path planning task, we need to keep at least one full-resolution feature map, with the aim of being able to accurately compute the Q value of agent at any position on the map.
此外,为了获得更好的全局特征表示,我们在 DB-CNN 的基础上,借鉴了各种现代 CNN 设计范式,重建了 P3N 的主干。图 5 说明了用于获取分辨率图像的 128 × 128 128 × 128 128 xx128128 \times 128 P3N 主干,它由三个部分组成。第一部分是预处理阶段,其中包括一个 7 × 7 7 × 7 7xx77 \times 7 步幅大小为 1 的卷积层,以保持输入图像的分辨率恒定。由于对象检测通常在分辨率相对较低的特征图上执行,因此 FPN 在预处理阶段降低了特征图分辨率,以减少后续步骤中的计算工作量。而在路径规划任务中,我们需要至少保留一个全分辨率的特征图,目的是能够准确计算 agent 在 map 上任何位置的 Q 值。
The second part is the core of the backbone, which involves four stages, and at the end of each stage the feature map resolution is reduced by a 2 × 2 2 × 2 2xx22 \times 2 convolution with a stride of 2 . We built the Convblock with reference to MobileNetV3 [26], composed of an Inverted Residual and Linear Bottleneck module. The first 1 × 1 1 × 1 1xx11 \times 1 convolution layer expands the number of channels of the input features. The 5 × 5 5 × 5 5xx55 \times 5 depthwise separable convolution (DWConv) [27] layer has a larger receptive field compared to the 3 × 3 3 × 3 3xx33 \times 3 conventional convolution, with a better balance between computation and performance. And the last 1 × 1 1 × 1 1xx11 \times 1 convolution operation is used to reduce the dimension of output features. It is worth noting that we only apply the pre-BN and ReLU to the 5 × 5 5 × 5 5xx55 \times 5 convolution, because ConvNeXt [28] points out that fewer normalization layers and activation functions are beneficial sometimes. We refer to the design of most multi-stage networks and set the ratio of layers in each stage to 1 : 1 : 3 : 1 1 : 1 : 3 : 1 1:1:3:11: 1: 3: 1.
第二部分是 backbone 的核心,涉及四个阶段,在每个阶段结束时,特征图分辨率通过步 2 × 2 2 × 2 2xx22 \times 2 幅为 2 的卷积降低。我们参考 MobileNetV3 [26] 构建了 Convblock,它由 Inverted Residual 和 Linear Bottleneck 模块组成。第一个 1 × 1 1 × 1 1xx11 \times 1 卷积层扩展了输入特征的通道数。与传统卷积相比 3 × 3 3 × 3 3xx33 \times 3 5 × 5 5 × 5 5xx55 \times 5 深度可分离卷积 (DWConv) [27] 层具有更大的感受野,在计算和性能之间具有更好的平衡。最后一个 1 × 1 1 × 1 1xx11 \times 1 卷积作用于减小输出特征的维度。值得注意的是,我们只将 pre-BN 和 ReLU 应用于 5 × 5 5 × 5 5xx55 \times 5 卷积,因为 ConvNeXt [28] 指出,较少的归一化层和激活函数有时是有益的。我们参考大多数多阶段网络的设计,并将每个阶段的层数比率设置为 1 : 1 : 3 : 1 1 : 1 : 3 : 1 1:1:3:11: 1: 3: 1
The last part is the post-processing stage, where we first transforms the 2D feature maps into a 1D feature vectors through an adaptive average-pooling operation, and then introduce more nonlinearities through two fully connected layers while adjusting the dimension of output global features.
最后一部分是后处理阶段,我们首先通过自适应平均池化作将 2D 特征图转换为 1D 特征向量,然后通过两个全连接层引入更多非线性,同时调整输出全局特征的维度。

IV. EXPERIMENTS AND DISCUSSION
IV. 实验和讨论

In this section, we empirically evaluate the proposed P3N architecture when used as a policy representation for global path planning task. We compare the performance differences between the P3N and two baseline methods (VIN and DB-CNN) on two datasets, grid maps and terrain images. For a fairer comparison, our evaluation metrics are consistent with those in [16], including prediction loss, path planning success rate, and trajectory difference. the prediction loss refers to the error rate of the algorithm’s single-step prediction. The planning success rate is the probability that the methods can find a collision-free path given the specified start and target states. The trajectory difference means the length difference between the successfully planned trajectory and the optimal.
在本节中,我们实证评估了所提出的 P3N 架构何时用作全局路径规划任务的策略表示。我们比较了 P3N 和两种基线方法 (VIN 和 DB-CNN) 在两个数据集、网格图和地形图像上的性能差异。为了更公平地进行比较,我们的评估指标与 [16] 中的指标一致,包括预测损失、路径规划成功率和轨迹差异。预测损失是指算法的单步预测的误差率。规划成功率是方法在给定指定起始和目标状态的情况下找到无冲突路径的概率。轨迹差异是指成功规划的轨迹与最佳轨迹之间的长度差异。
We also discuss the details of model implementation and the impact of training strategies on model performance. All models and experimental code are implemented based on the PyTorch [29] framework and will be open sourced when the work accepted.
我们还讨论了模型实现的细节以及训练策略对模型性能的影响。所有模型和实验代码都是基于 PyTorch [29] 框架实现的,并在工作接受后开源。

A. PATH PLANNING IN GRID-WORLD DOMAIN
A. GRID-WORLD 域中的路径规划

Our first experimental scenario is a synthetic grid-map domain as shown in FIGURE 6, where the start and target state, as well as the positions of obstacles, are generated randomly. Each obstacle occupies one grid, and the proportion of grids with obstacles to all grids is fixed in order to facilitate control of the planning difficulty. In the framework of RL, the agent can only move one step at a time to the surrounding eight grids, and the goal is to find a collision-free shortest trajectory from the start position to the goal.
我们的第一个实验场景是一个合成网格映射域,如图 6 所示,其中开始和目标状态以及障碍物的位置是随机生成的。每个障碍物占据一个网格,并且有障碍物的网格占所有网格的比例是固定的,以便于控制规划难度。在 RL 的框架内,智能体一次只能向周围的八个网格移动一步,目标是找到从起始位置到目标的无碰撞最短轨迹。

FIGURE 6. Test examples on 64 × 64 64 × 64 64 xx6464 \times 64 grid maps.
图 6.在网格图上 64 × 64 64 × 64 64 xx6464 \times 64 测试示例。

1) PERFORMANCE IN SMALL-SCALE DOMAIN
1) 小规模领域的性能

Let’s start with the grid maps of 28 × 28 28 × 28 28 xx2828 \times 28 size. The training set contains 10,000 randomly generated maps, where the proportion of obstacles to the entire environmental space is always kept at 50 % 50 % 50%50 \%. For each map instance, we first randomly specify the start and target positions, then generate the expert trajectory using the A A A^(**)\mathrm{A}^{*} algorithm, and finally discretize it into a series of single-step state-action pairs as training samples. All models receive the environmental map and the agent’s position as input and output the Q value of the optional actions in the current state. We ask the agent to choose the action with the highest Q value and move to the next position until it reaches the target.
让我们从 28 × 28 28 × 28 28 xx2828 \times 28 大小的网格图开始。训练集包含 10,000 个随机生成的地图,其中障碍物占整个环境空间的比例始终保持在 50 % 50 % 50%50 \% 。对于每个 map 实例,我们首先随机指定起点和目标位置,然后使用 A A A^(**)\mathrm{A}^{*} 算法生成 expert 轨迹,最后将其离散化为一系列单步状态-动作对作为训练样本。所有模型都接收环境地图和代理的位置作为输入,并输出当前状态下可选作的 Q 值。我们要求代理选择 Q 值最高的动作并移动到下一个位置,直到到达目标。
We train each model for 30 epochs using the RMSprop optimizer, with an initial learning rate set to 4 e 3 4 e 3 4e-34 \mathrm{e}-3 and a mini-batch size of 256 , and reduce the learning rate by 10 × 10 × 10 xx10 \times in the last 6th and 2nd epochs, respectively [30]. At the end of each training epoch, we immediately test the prediction loss of all models on a test set containing another 5000 maps. At the end of all 30 training epochs, we evaluate the planning success rate and trajectory difference of all models on the test set. We train each model five times with different random seeds and then average the results. For the VIN, we set the number of iterations of the planning module to 1.5 × 1.5 × 1.5 xx1.5 \times the map size, i.e., for a 28 × 28 28 × 28 28 xx2828 \times 28 grid map, the number of iterations is set to 42.
我们使用 RMSprop 优化器训练每个模型 30 个 epoch,初始学习率设置为 4 e 3 4 e 3 4e-34 \mathrm{e}-3 256 ,小批量大小为 256 ,并在最后第 6 个和第 2 个周期分别降低学习率 10 × 10 × 10 xx10 \times [30]。在每个训练 epoch 结束时,我们立即在包含另外 5000 个映射的测试集上测试所有模型的预测损失。在所有 30 个训练 epoch 结束时,我们评估测试集上所有模型的规划成功率和轨迹差异。我们使用不同的随机种子对每个模型训练五次,然后对结果进行平均。对于 VIN,我们将规划模块的迭代次数设置为 1.5 × 1.5 × 1.5 xx1.5 \times 地图大小,即对于 28 × 28 28 × 28 28 xx2828 \times 28 网格地图,迭代次数设置为 42。
FIGURE 7 shows the variation of training error and test error for all models during the training process. The performance of VIN on the training and test sets is much worse than the other two methods, and the test error fluctuates significantly during the training, which indicates that the
图 7 显示了训练过程中所有模型的训练误差和测试误差的变化。VIN 在训练集和测试集上的性能远差于其他两种方法,并且在训练过程中测试误差波动很大,这表明

FIGURE 7. The training error (left) and test error (right) with all models in 2 8 × 2 8 2 8 × 2 8 28xx28\mathbf{2 8} \times \mathbf{2 8} grid maps.
图 7.网格图中 2 8 × 2 8 2 8 × 2 8 28xx28\mathbf{2 8} \times \mathbf{2 8} 所有模型的训练误差(左)和测试误差(右)。
TABLE 1. Performance on 28 × 28 28 × 28 28 xx2828 \times 28 grid maps.
表 1.网格地图的性能 28 × 28 28 × 28 28 xx2828 \times 28
Methods  方法 Pred. loss  Pred. 损失 Succ. rate  成功。率 Traj. diff  特拉吉。差异 Epoch time  纪元时间
VIN 0.129 89.31 % 89.31 % 89.31%89.31 \% 0.451 41 s  41 秒
DB-CNN 0.079 97.78 % 97.78 % 97.78%97.78 \% 0.402 30 s  30 秒
P3N (ours)  P3N(我们的) 0 . 0 7 5 0 . 0 7 5 0.075\mathbf{0 . 0 7 5} 9 9 . 0 4 % 9 9 . 0 4 % 99.04%\mathbf{9 9 . 0 4 \%} 0 . 3 5 7 0 . 3 5 7 0.357\mathbf{0 . 3 5 7} 2 2 s 2 2 s 22s\mathbf{2 2 s}
Methods Pred. loss Succ. rate Traj. diff Epoch time VIN 0.129 89.31% 0.451 41 s DB-CNN 0.079 97.78% 0.402 30 s P3N (ours) 0.075 99.04% 0.357 22s| Methods | Pred. loss | Succ. rate | Traj. diff | Epoch time | | :---: | :---: | :---: | :---: | :---: | | VIN | 0.129 | $89.31 \%$ | 0.451 | 41 s | | DB-CNN | 0.079 | $97.78 \%$ | 0.402 | 30 s | | P3N (ours) | $\mathbf{0 . 0 7 5}$ | $\mathbf{9 9 . 0 4 \%}$ | $\mathbf{0 . 3 5 7}$ | $\mathbf{2 2 s}$ |
training of VIN is unstable and sensitive to the choice of training strategies and hyper-parameters. The performance of the two CNN-based methods is relatively similar, and the training is smoother. The proposed P3N, although slightly under-performing DB-CNN in the early stage, gradually outperforms the latter as the training proceeds.
VIN 的训练不稳定且对训练策略和超参数的选择敏感。两种基于 CNN 的方法的性能相对较相似,训练更流畅。所提出的 P3N 虽然在早期性能略逊于 DB-CNN,但随着训练的进行,其性能逐渐优于后者。
TABLE 1 shows more experimental results. Our P3N outperforms the baseline methods in all evaluation metrics, beats the VIN by 7 % 7 % 7%7 \% in planning success rate to 99.04 % 99.04 % 99.04%99.04 \%, and the training time of the model is only half of that of VIN. The P3N achieves 1.36 × 1.36 × 1.36 xx1.36 \times faster computation speed than the DB-CNN, although it is only slightly ahead of the latter in three metrics. Considering the small size of the domain in this experiment, the shallow network is sufficient to extract useful environmental features, and thus our method does not achieve a significant advantage.
表 1 显示了更多的实验结果。我们的 P3N 在所有评价指标上都优于基线方法, 7 % 7 % 7%7 \% 在规划成功率上优于 99.04 % 99.04 % 99.04%99.04 \% VIN,模型的训练时间仅为 VIN 的一半。P3N 的计算 1.36 × 1.36 × 1.36 xx1.36 \times 速度比 DB-CNN 更快,尽管它在三个指标上仅略微领先后者。考虑到本实验中域的较小尺寸,浅层网络足以提取有用的环境特征,因此我们的方法没有取得显著的优势。

2) GENERALIZATION IN LARGE-SCALE DOMAIN
2) 大规模领域的泛化

We further increase the size of grid maps to test the performance of these methods. When the map size grows, the planning module of VIN needs to perform more iterations to ensure that the reward signal can be efficiently propagated across the whole planning space, which can significantly increase the computational cost. And more unconstrained iterations can also cause training instability, resulting in dramatic degradation of the model performance. Thanks to the fact that the size of feature maps in the VI-based models is always consistent with the input data, Jin et al. [31] proposed a two-stage training strategy that can effectively improve the performance of VI-based methods on large-scale domains while substantially reducing the computational effort. In this work, however, we aim to explore the performance differences between models due to different architectures, not to examine the impact of various tricks on model performance. Thus, we only use some of the most basic training strategies in the next experiments.
我们进一步增加网格映射的大小以测试这些方法的性能。当 map 大小增加时,VIN 的规划模块需要进行更多的迭代,以确保奖励信号可以在整个规划空间中高效传播,这会显着增加计算成本。更多不受约束的迭代也会导致训练不稳定,从而导致模型性能急剧下降。由于基于 VI 的模型中特征图的大小始终与输入数据一致,Jin 等人 [31] 提出了一种两阶段训练策略,可以有效提高基于 VI 的方法在大规模域上的性能,同时大大减少计算工作量。然而,在这项工作中,我们的目标是探索由于不同架构而导致的模型之间的性能差异,而不是研究各种技巧对模型性能的影响。因此,在接下来的实验中,我们只使用一些最基本的训练策略。
We continue to construct two datasets of 10,000 grid maps each, with map sizes of 64 × 64 64 × 64 64 xx6464 \times 64 and 128 × 128 128 × 128 128 xx128128 \times 128. Since
我们继续构建两个数据集,每个数据集包含 10,000 个网格地图,地图大小为 64 × 64 64 × 64 64 xx6464 \times 64 128 × 128 128 × 128 128 xx128128 \times 128 。因为
TABLE 2. Performance on 64 × 64 64 × 64 64 xx6464 \times 64 and 128 × 128 128 × 128 128 xx128128 \times 128 grid maps.
表 2.在网格 128 × 128 128 × 128 128 xx128128 \times 128 图和网格图上 64 × 64 64 × 64 64 xx6464 \times 64 的性能。
Methods  方法 64 × 64 64 × 64 64 xx6464 \times 64 128 × 128 128 × 128 128 xx128128 \times 128
  捕食者。 损失
Pred.
loss
Pred. loss| Pred. | | :---: | | loss |
  成功。 率
Succ.
rate
Succ. rate| Succ. | | :---: | | rate |
  特拉吉。 差异
Traj.
diff
Traj. diff| Traj. | | :---: | | diff |
  捕食者。 损失
Pred.
loss
Pred. loss| Pred. | | :---: | | loss |
  成功。 率
Succ.
rate
Succ. rate| Succ. | | :---: | | rate |
  特拉吉。 差异
Traj.
diff
Traj. diff| Traj. | | :---: | | diff |
VIN 0.196 % 0.196 % 0.196%0.196 \% 54.52 % 54.52 % 54.52%54.52 \% 2.495 0.257 30.09 % 30.09 % 30.09%30.09 \% 4.966
DB-CNN 0.117 82.64 % 82.64 % 82.64%82.64 \% 1.361 0.150 66.68 % 66.68 % 66.68%66.68 \% 2.235
P3N (ours)  P3N(我们的) 0 . 0 9 7 0 . 0 9 7 0.097\mathbf{0 . 0 9 7} 9 0 . 8 % 9 0 . 8 % 90.8%\mathbf{9 0 . 8 \%} 0 . 6 4 7 0 . 6 4 7 0.647\mathbf{0 . 6 4 7} 0 . 1 2 7 0 . 1 2 7 0.127\mathbf{0 . 1 2 7} 8 1 . 8 % 8 1 . 8 % 81.8%\mathbf{8 1 . 8 \%} 1 . 1 5 3 1 . 1 5 3 1.153\mathbf{1 . 1 5 3}
Methods 64 xx64 128 xx128 "Pred. loss" "Succ. rate" "Traj. diff" "Pred. loss" "Succ. rate" "Traj. diff" VIN 0.196% 54.52% 2.495 0.257 30.09% 4.966 DB-CNN 0.117 82.64% 1.361 0.150 66.68% 2.235 P3N (ours) 0.097 90.8% 0.647 0.127 81.8% 1.153| Methods | $64 \times 64$ | | | $128 \times 128$ | | | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | | Pred. <br> loss | Succ. <br> rate | Traj. <br> diff | Pred. <br> loss | Succ. <br> rate | Traj. <br> diff | | VIN | $0.196 \%$ | $54.52 \%$ | 2.495 | 0.257 | $30.09 \%$ | 4.966 | | DB-CNN | 0.117 | $82.64 \%$ | 1.361 | 0.150 | $66.68 \%$ | 2.235 | | P3N (ours) | $\mathbf{0 . 0 9 7}$ | $\mathbf{9 0 . 8 \%}$ | $\mathbf{0 . 6 4 7}$ | $\mathbf{0 . 1 2 7}$ | $\mathbf{8 1 . 8 \%}$ | $\mathbf{1 . 1 5 3}$ |
FIGURE 8. The Apollo 17 landing site. From left to right: the orthomosaic (overhead terrain image) created from images provided by the Lunar Reconnaissance Orbiter Camera (LROC) Narrow Angle Camera (NAC) with a resolution of 0.5 m per pixel, the corresponding digital elevation model (DEM) generated from four LROC NAC stereo image pairs, and the grid map where regions with an elevation angle of 20 degrees or more as obstacles.
图 8.阿波罗 17 号着陆点。从左到右:根据月球侦察轨道器照相机 (LROC) 窄角照相机 (NAC) 提供的影像(分辨率为每像素 0.5 m)创建的正射镶嵌(高架地形影像)、由四个 LROC NAC 立体影像对生成的相应数字高程模型 (DEM),以及以仰角为 20 度或更大的区域作为障碍物的格网地图。

increasing the map size results in a larger memory usage, we adjust the mini-batch size to 64 and 32, respectively, and also reduce the learning rate accordingly, and increase the number of training epochs to 60 and 120 to allow models to be fully trained.
增加 map 大小会导致更大的内存使用量,我们将 mini-batch 大小分别调整为 64 和 32,并相应地降低学习率,并将训练 epoch 的数量增加到 60 和 120 ,以允许模型得到充分训练。
TABLE 2 shows the performance of all models on largerscale domains. Our P3N still obtains the best performance and further widens the performance gap with the other methods, with a planning success rate of almost 1.8 × 1.8 × 1.8 xx1.8 \times the VIN on the 64 × 64 64 × 64 64 xx6464 \times 64 maps and 2.7 × 2.7 × 2.7 xx2.7 \times on the 128 × 128 128 × 128 128 xx128128 \times 128 maps. As the domain size increases, the DB-CNN gradually exposes the shortcoming of not having enough depth of local feature extractor. The planning success rate on the 64 × 64 64 × 64 64 xx6464 \times 64 maps is 8 % 8 % 8%8 \% lower than that of P3N, and the gap is further expanded to 15 % 15 % 15%15 \% on the 128 × 128 128 × 128 128 xx128128 \times 128 maps. In conclusion, although path planning on grid maps is a relatively simple task, our proposed FPN-based path planning method is still more competitive than the baseline. The P3N has higher planning success rate and lower trajectory difference on large-size grid maps, while the computational speed is rather faster due to the novel architectural design.
表 2 显示了所有模型在更大规模域上的性能。我们的 P3N 仍然获得了最佳性能,并进一步拉大了与其他方法的性能差距,规划成功率几乎 1.8 × 1.8 × 1.8 xx1.8 \times 64 × 64 64 × 64 64 xx6464 \times 64 VIN 在 map 和 2.7 × 2.7 × 2.7 xx2.7 \times 128 × 128 128 × 128 128 xx128128 \times 128 map。随着域大小的增加,DB-CNN 逐渐暴露出局部特征提取器深度不足的缺点。 64 × 64 64 × 64 64 xx6464 \times 64 地图上的规划成功率低于 8 % 8 % 8%8 \% P3N,并且差距进一步扩大到 15 % 15 % 15%15 \% 128 × 128 128 × 128 128 xx128128 \times 128 地图上。总之,尽管在网格图上进行路径规划是一项相对简单的任务,但我们提出的基于 FPN 的路径规划方法仍然比基线更具竞争力。P3N 在大尺寸网格图上具有更高的规划成功率和更小的轨迹差异,而由于新颖的架构设计,计算速度相当快。

B. ROVERS NAVIGATION  B. 漫游者导航

The benefits of NN-based algorithms, compared to traditional path planning algorithms, lie in the ability to identify useful environmental information from natural images and then execute the planning policy end-to-end. So we further verify whether the proposed P3N can still perform better than the baseline methods on terrain images.
与传统的路径规划算法相比,基于 NN 的算法的优势在于能够从自然图像中识别有用的环境信息,然后端到端地执行规划策略。因此,我们进一步验证了所提出的 P3N 在地形图像上是否仍然比基线方法表现得更好。
We construct the second experimental scenario with the orthomosaic (overhead terrain image) created from images provided by the Lunar Reconnaissance Orbiter Camera (LROC) Narrow Angle Camera (NAC) with 0.5 meters per pixel resolution. Considering the safety of planet rovers, we set the areas with slope greater than 20 degrees
我们使用月球侦察轨道器相机 (LROC) 窄角相机 (NAC) 提供的图像创建的正射镶嵌(高架地形图像)构建了第二个实验场景,分辨率为 0.5 米。考虑到行星漫游者的安全,我们将坡度设置为大于 20 度的区域

FIGURE 9. Test examples in the lunar terrain image data set with the resolution of 128 × 128 128 × 128 128 xx128128 \times 128. From left to right: overhead terrain image, DEM, and grid map. Note that the DEM is not available for models, and only the terrain image is used as the network input.
图 9.分辨率为 128 × 128 128 × 128 128 xx128128 \times 128 的月球地形图像数据集中的测试示例。从左到右:高架地形图像、DEM 和网格地图。请注意,DEM 不适用于模型,仅将 terrain 图像用作网络输入。

as obstacles. We want planet rovers to be able to start from any position, actively avoid those obstacles, and then safely reach the designated target area. We would like to emphasize that the terrain image itself does not contain any elevation information, and its corresponding digital elevation model (DEM), as shown in FIGURE 8, are generated from the LROC NAC stereo images.
作为障碍。我们希望行星漫游者能够从任何位置出发,主动避开那些障碍物,然后安全地到达指定的目标区域。我们想强调的是,地形图像本身不包含任何高程信息,其相应的数字高程模型 (DEM)(如图 8 所示)是从 LROC NAC 立体图像生成的。
We crop the orthomosaic randomly into small-size images that do not overlap each other to construct the test scenarios. For comparison, we continue to construct three datasets containing 10,000 terrain images with resolutions of 32 × 32 32 × 32 32 xx3232 \times 32, 64 × 64 64 × 64 64 xx6464 \times 64, and 128 × 128 128 × 128 128 xx128128 \times 128, respectively. we determine the slope at each location in the environment by calculating the gradient of adjacent pixels using the DEMs corresponding to the terrain images, and then mark the locations with slope greater than 20 degrees as obstacles, thus converting the terrain images into grid maps. We specify a random set of start and goal positions on the grid map, and then use the A A A^(**)\mathrm{A}^{*} algorithm to generate the optimal path. It is worth mentioning that the elevation data is only used to help generate demonstration trajectories, and all models can only accept terrain images as input and then infer decision-friendly environmental information from them by end-to-end learning.
我们将正射镶嵌随机裁剪为彼此不重叠的小尺寸图像,以构建测试场景。为了进行比较,我们继续构建三个数据集,其中包含 10,000 张分辨率分别为 32 × 32 32 × 32 32 xx3232 \times 32 64 × 64 64 × 64 64 xx6464 \times 64 128 × 128 128 × 128 128 xx128128 \times 128 的地形图像。我们通过使用地形图像对应的 DEM 计算相邻像素的梯度来确定环境中每个位置的坡度,然后将坡度大于 20 度的位置标记为障碍物,从而将地形图像转换为网格地图。我们在网格图上随机指定一组起点和终点位置,然后使用该 A A A^(**)\mathrm{A}^{*} 算法生成最佳路径。值得一提的是,高程数据仅用于帮助生成演示轨迹,所有模型都只能接受地形图像作为输入,然后通过端到端学习从中推断出决策友好的环境信息。
We randomly select 6 / 7 6 / 7 6//76 / 7 samples from each data set for training and the rest for test. The model training strategies and the selection of other hyper-parameters remain the same as in the previous section. FIGURE 9 shows a representative set of test examples of what the trajectories predicted by our method look like. We can qualitatively conclude that solving such a path planning problem should be very difficult due to the differences between terrain images and DEMs.
我们从每个数据集中随机选择 6 / 7 6 / 7 6//76 / 7 样本进行训练,其余样本进行测试。模型训练策略和其他超参数的选择与上一节相同。图 9 显示了一组具有代表性的测试示例,展示了我们的方法预测的轨迹是什么样的。我们可以定性地得出结论,由于 terrain 图像和 DEM 之间的差异,解决这样的路径规划问题应该非常困难。
TABLE 3 shows the performance of all models on the lunar terrain images. The VIN performs better on small-size terrain images than on grid maps, with a planning success rate close to 6 % 6 % 6%6 \% higher. This is because on the one hand, the VIN has enough network depth to extract valid information
表 3 显示了所有模型在月球地形图像上的性能。VIN 在小尺寸地形图像上的性能优于网格地图,规划成功率接近 6 % 6 % 6%6 \% 更高。这是因为一方面,VIN 有足够的网络深度来提取有效信息
TABLE 3. Performance in the lunar domain.
表 3.月球域中的性能。
Methods  方法 32 × 32 32 × 32 32 xx3232 \times 32 64 × 64 64 × 64 64 xx6464 \times 64 128 × 128 128 × 128 128 xx128128 \times 128
  成功。 率
Succ.
rate
Succ. rate| Succ. | | :---: | | rate |
  特拉吉。 差异
Traj.
diff
Traj. diff| Traj. | | :---: | | diff |
  成功。 率
Succ.
rate
Succ. rate| Succ. | | :---: | | rate |
  特拉吉。 差异
Traj.
diff
Traj. diff| Traj. | | :---: | | diff |
  成功。 率
Succ.
rate
Succ. rate| Succ. | | :---: | | rate |
  特拉吉。 差异
Traj.
diff
Traj. diff| Traj. | | :---: | | diff |
VIN 95.17 % 95.17 % 95.17%95.17 \% 0.236 47.36 % 47.36 % 47.36%47.36 \% 2.498 29.79 % 29.79 % 29.79%29.79 \% 4.234
DB-CNN 98.44 % 98.44 % 98.44%98.44 \% 0.113 84.70 % 84.70 % 84.70%84.70 \% 1.334 58.23 % 58.23 % 58.23%58.23 \% 2.367
P3N (ours)  P3N(我们的) 9 8 . 9 9 % 9 8 . 9 9 % 98.99%\mathbf{9 8 . 9 9 \%} 0 . 1 1 5 0 . 1 1 5 0.115\mathbf{0 . 1 1 5} 8 9 . 2 1 % 8 9 . 2 1 % 89.21%\mathbf{8 9 . 2 1 \%} 0 . 9 1 9 0 . 9 1 9 0.919\mathbf{0 . 9 1 9} 8 1 . 8 % 8 1 . 8 % 81.8%\mathbf{8 1 . 8 \%} 1 . 4 5 3 1 . 4 5 3 1.453\mathbf{1 . 4 5 3}
Methods 32 xx32 64 xx64 128 xx128 "Succ. rate" "Traj. diff" "Succ. rate" "Traj. diff" "Succ. rate" "Traj. diff" VIN 95.17% 0.236 47.36% 2.498 29.79% 4.234 DB-CNN 98.44% 0.113 84.70% 1.334 58.23% 2.367 P3N (ours) 98.99% 0.115 89.21% 0.919 81.8% 1.453| Methods | $32 \times 32$ | | $64 \times 64$ | | $128 \times 128$ | | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | | Succ. <br> rate | Traj. <br> diff | Succ. <br> rate | Traj. <br> diff | Succ. <br> rate | Traj. <br> diff | | VIN | $95.17 \%$ | 0.236 | $47.36 \%$ | 2.498 | $29.79 \%$ | 4.234 | | DB-CNN | $98.44 \%$ | 0.113 | $84.70 \%$ | 1.334 | $58.23 \%$ | 2.367 | | P3N (ours) | $\mathbf{9 8 . 9 9 \%}$ | $\mathbf{0 . 1 1 5}$ | $\mathbf{8 9 . 2 1 \%}$ | $\mathbf{0 . 9 1 9}$ | $\mathbf{8 1 . 8 \%}$ | $\mathbf{1 . 4 5 3}$ |
from small-size images, and on the other hand, obstacles in the lunar environment take up a much smaller proportion of the entire planning space compared to grid maps, as can be seen in the rightmost column of FIGURE 9. Therefore, even if the single-step prediction given by the VIN is not optimal, the probability that the agent can finally reach the target is still high, and of course the trajectory difference will be relatively larger.
另一方面,与网格图相比,月球环境中的障碍物在整个规划空间中所占的比例要小得多,如图 9 最右边的一列所示。因此,即使 VIN 给出的单步预测不是最优的,智能体最终能够到达目标的概率仍然很高,当然轨迹差异也会比较大。
The performance of DB-CNN decreases significantly on large-size terrain images, and its planning success rate in the 128 × 128 128 × 128 128 xx128128 \times 128 lunar domains has dropped by more than 8 % 8 % 8%8 \% compared with that on grid maps, which again illustrates it is difficult for the DB-CNN to learn effective environmental representation on large-scale domains due to the lack of enough network depth in its local feature extraction branch.
DB-CNN 在大尺寸地形图像上的性能明显下降,其在 128 × 128 128 × 128 128 xx128128 \times 128 月球域中的规划成功率下降幅度超过 8 % 8 % 8%8 \% 在网格地图上,这再次说明了 DB-CNN 由于其局部特征提取分支中缺乏足够的网络深度,很难在大尺度域上学习有效的环境表示。
In contrast, the P3N has a well-designed backbone that can better learn the global representation of environmental information. The FPN-based local feature extractor can effectively fuse semantic and location information from different levels, and then generate context-rich local representation by means of adaptive feature aggregation. Thanks to the powerful representation network, the P3N can easily infer the elevation information of the environment from terrain images and accurately distinguish obstacles and non-obstacles, thus achieving similar performance on terrain images as on grid maps. Also, the P3N outperforms the baseline methods in all evaluation metrics, with the lead being greater as the image size increases.
相比之下,P3N 具有精心设计的主干,可以更好地学习环境信息的全局表示。基于 FPN 的局部特征提取器可以有效地融合来自不同层次的语义和位置信息,然后通过自适应特征聚合生成上下文丰富的局部表示。得益于强大的表示网络,P3N 可以轻松地从地形图像中推断出环境的高程信息,并准确区分障碍物和非障碍物,从而在地形图像上实现与网格地图上类似的性能。此外,P3N 在所有评估指标上都优于基线方法,随着图像大小的增加,领先优势更大。

V. CONCLUSION  V. 结论

In this work, we propose a effective neural network-based computational framework to solve the global path planning problem for planet rovers. We design a novel neural network architecture based on feature pyramid networks, named the Pyramidal Path Planning network (P3N), which take the terrain image of the planet surface and the position coordinates of the rover as input, and output a safe and energy-efficient path to the specified target area through implicit planning computation. Our P3N has a well-designed backbone that efficiently learns the global feature representation of the environment, and a feature pyramid branch that adaptively fuses multi-scale features from different levels to generate a strong local feature representation. While previous studies generally used two independent network branches to extract global and local features separately, we use the multi-scale global features learned by the backbone as the input to the local feature extractor, and obtain a fine-grained representation of the environment containing rich semantic and location information. We compare our P3N with two baseline methods,
在这项工作中,我们提出了一种有效的基于神经网络的计算框架来解决行星漫游者的全局路径规划问题。我们设计了一种基于特征金字塔网络的新型神经网络架构,称为金字塔路径规划网络 (P3N),它以行星表面的地形图像和漫游车的位置坐标为输入,通过隐式规划计算输出一条安全节能的路径到达指定的目标区域。我们的 P3N 有一个精心设计的 backbone 可以有效地学习环境的全局特征表示,以及一个特征金字塔分支,它可以自适应地融合来自不同级别的多尺度特征,以生成强大的局部特征表示。虽然以前的研究通常使用两个独立的网络分支来分别提取全局和局部特征,但我们使用骨干学习到的多尺度全局特征作为局部特征提取器的输入,并获得包含丰富语义和位置信息的环境的细粒度表示。我们将 P3N 与两种基线方法进行了比较,

including the VIN and DB-CNN, on path planning tasks of grid maps and a data set generated from lunar terrain images. The experimental results show that the P3N achieves the best performance in all evaluation metrics, and the computation speed is 86 % 86 % 86%86 \% and 36 % 36 % 36%36 \% faster than the two baseline methods on 28 × 28 28 × 28 28 xx2828 \times 28 grid maps, respectively. And our method has better generalization performance on the large-scale environment, with a path planning success rate of 81.8 % 81.8 % 81.8%81.8 \% when training from scratch on the 128 × 128 128 × 128 128 xx128128 \times 128 lunar domain, outperforming the VIN by 52 % 52 % 52%52 \% and the DB-CNN by 23.6 % 23.6 % 23.6%23.6 \% with less computational cost.
包括 VIN 和 DB-CNN、网格图的路径规划任务和从月球地形图像生成的数据集。实验结果表明,P3N 在所有评价指标上均取得了最佳性能,在网格图上的 28 × 28 28 × 28 28 xx2828 \times 28 计算速度分别 86 % 86 % 86%86 \% 36 % 36 % 36%36 \% 优于两种基线方法。并且我们的方法在大规模环境下具有更好的泛化性能,在 128 × 128 128 × 128 128 xx128128 \times 128 月球域从头开始训练时具有路径 81.8 % 81.8 % 81.8%81.8 \% 规划成功率,优于 VIN BY 52 % 52 % 52%52 \% 和 DB-CNN, 23.6 % 23.6 % 23.6%23.6 \% 计算成本更低。

ACKNOWLEDGMENT  确认

Xiang Jin would like to thank J. Zhang for his passionate help in data set preparation.
Xiang Jin 感谢 J. Zhang 在数据集准备方面的热情帮助。

REFERENCES  引用

[1] P. Raja and S. Pugazhenthi, “Optimal path planning of mobile robots: A review,” Int. J. Phys. Sci., vol. 7, no. 9, pp. 1314-1320, Feb. 2012.
[1] P. Raja 和 S. Pugazhenthi,“移动机器人的最佳路径规划:综述”,国际物理学杂志,第 7 卷,第 9 期,第 1314-1320 页,2012 年 2 月。

[2] M. N. A. Wahab, S. Nefti-Meziani, and A. Atyabi, “A comparative review on mobile robot path planning: Classical or meta-heuristic methods?” Аппи. Rev. Control, vol. 50, pp. 233-252, Sep. 2020.
[2] M. N. A. Wahab、S. Nefti-Meziani 和 A. Atyabi,“移动机器人路径规划的比较回顾:经典方法还是元启发式方法?Аппи.Rev. Control,第 50 卷,第 233-252 页,2020 年 9 月。

[3] M. Sutoh, M. Otsuki, S. Wakabayashi, T. Hoshino, and T. Hashimoto, “The right path: Comprehensive path planning for lunar exploration rovers,” IEEE Robot. Autom. Mag., vol. 22, no. 1, pp. 22-33, Mar. 2015.
[3] M. Sutoh、M. Otsuki、S. Wakabayashi、T. Hoshino 和 T. Hashimoto,“正确的路径:月球探测车的综合路径规划”,IEEE 机器人。自动。Mag.,第 22 卷,第 1 期,第 22-33 页,2015 年 3 月。

[4] J. Zhang, Y. Xia, and G. Shen, “A novel learning-based global path planning algorithm for planetary rovers,” Neurocomputing, vol. 361, pp. 69-76, Oct. 2019.
[4] J. Zhang、Y. Xia 和 G. Shen,“一种用于行星漫游者的新型基于学习的全局路径规划算法”,神经计算,第 361 卷,第 69-76 页,2019 年 10 月。

[5] M. Pflueger, A. Agha, and G. S. Sukhatme, “Rover-IRL: Inverse reinforcement learning with soft value iteration networks for planetary rover path planning,” IEEE Robot. Autom. Lett., vol. 4, no. 2, pp. 1387-1394, Apr. 2019.
[5] M. Pflueger、A. Agha 和 G. S. Sukhatme,“Rover-IRL:用于行星漫游者路径规划的软值迭代网络的逆强化学习”,IEEE 机器人。自动。Lett.,第 4 卷,第 2 期,第 1387-1394 页,2019 年 4 月。

[6] T. Lozano-Perez, “Spatial planning: A configuration space approach,” in Autonomous Robot Vehicles. New York, NY, USA: Springer, 1990, pp. 259-271.
[6] T. Lozano-Perez,“空间规划:一种配置空间方法”,载于《自主机器人车辆》。美国纽约州纽约市:施普林格出版社,1990 年,第 259-271 页。

[7] R. Yonetani, T. Taniai, M. Barekatain, M. Nishimura, and A. Kanezaki, “Path planning using neural A* search,” in Proc. ICML, 2021, pp. 12029-12039.
[7] R. Yonetani、T. Taniai、M. Barekatain、M. Nishimura 和 A. Kanezaki,“使用神经 A* 搜索进行路径规划”,ICML 论文集,2021 年,第 12029-12039 页。

[8] A.-I. Toma, H.-Y. Hsueh, H. A. Jaafar, R. Murai, P. H. J. Kelly, and S. Saeedi, “PathBench: A benchmarking platform for classical and learned path planning algorithms,” in Proc. 18th Conf. Robots Vis. (CRV), May 2021, pp. 79-86.
[8] 人工智能托马,HYHsueh、H. A. Jaafar、R. Murai、P. H. J. Kelly 和 S. Saeedi,“PathBench:经典和学习路径规划算法的基准测试平台”,第 18 届会议机器人对 (CRV) 论文集,2021 年 5 月,第 79-86 页。

[9] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge, MA, USA: MIT Press, 2016.
[9] I. Goodfellow、Y. Bengio 和 A. Courville,深度学习。美国马萨诸塞州剑桥:麻省理工学院出版社,2016 年。

[10] A. H. Qureshi, Y. Miao, A. Simeonov, and M. C. Yip, “Motion planning networks: Bridging the gap between learning-based and classical motion planners,” IEEE Trans. Robot., vol. 37, no. 1, pp. 48-66, Aug. 2021.
[10] A. H. Qureshi、Y. Miao、A. Simeonov 和 M. C. Yip,“运动规划网络:弥合基于学习的运动规划师和经典运动规划师之间的差距”,IEEE Trans. Robot.,第 37 卷,第 1 期,第 48-66 页,2021 年 8 月。

[11] M. J. Bency, A. H. Qureshi, and M. C. Yip, “Neural path planning: Fixed time, near-optimal path generation via Oracle imitation,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), Macau, China, Nov. 2019, pp. 3965-3972.
[11] M. J. Bency, A. H. Qureshi, and M. C. Yip, “Neural path planning: Fixed time, near-optimal path generation via Oracle imitation,” in Proc. IEEE/RSJ Int. Conf. Intell.机器人系统 (IROS),中国澳门,2019 年 11 月,第 3965-3972 页。

[12] K. Wu, M. A. Esfahani, S. Yuan, and H. Wang, “TDPP-Net: Achieving three-dimensional path planning via a deep neural network architecture,” Neurocomputing, vol. 357, pp. 151-162, Sep. 2019.
[12] K. Wu、M. A. Esfahani、S. Yuan 和 H. Wang,“TDPP-Net:通过深度神经网络架构实现三维路径规划”,神经计算,第 357 卷,第 151-162 页,2019 年 9 月。

[13] M. Assens, X. G.-I. Nieto, K. McGuinness, and N. E. O’Connor, “PathGAN: Visual scanpath prediction with generative adversarial networks,” in Proc. ECCV, Munich, Germany, 2018, pp. 1-18.
[13] M. Assens, X. G.-I.Nieto、K. McGuinness 和 N. E. O'Connor,“PathGAN:使用生成对抗网络进行视觉扫描路径预测”,载于 Proc. ECCV,德国慕尼黑,2018 年,第 1-18 页。

[14] S. Richard and B. Andrew, Reinforcement Learning: An Introduction. Cambridge, MA, USA: MIT Press, 2018.
[14] S. Richard 和 B. Andrew,强化学习:简介。美国马萨诸塞州剑桥:麻省理工学院出版社,2018 年。

[15] Y. F. Chen, M. Liu, M. Everett, and J. P. How, “Decentralized noncommunicating multiagent collision avoidance with deep reinforcement learning,” in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), Singapore, May 2017, pp. 285-292.
[15] Y. F. Chen, M. Liu, M. Everett, 和 J. P.如何,“通过深度强化学习避免分散的非通信多智能体碰撞避免”,在 IEEE Int. Conf. Robot 论文集。自动。(ICRA),新加坡,2017 年 5 月,第 285-292 页。

[16] A. Tamar, Y. Wu, G. Thomas, S. Levine, and P. Abbeel, “Value iteration networks,” in Proc. Adv. Neural Inf. Process. Syst., Barcelona, Spain, vol. 29, 2016, pp. 2154-2162.
[16] A. Tamar、Y. Wu、G. Thomas、S. Levine 和 P. Abbeel,“价值迭代网络”,Proc. Adv. Neural Inf. Process。Syst.,西班牙巴塞罗那,第 29 卷,2016 年,第 2154-2162 页。

[17] S. Niu, S. Chen, H. Guo, C. Targonski, M. C. Smith, and J. Kovaevi, “Generalized value iteration networks: Life beyond lattices,” in Proc. AAAI, New Orleans, LA, USA, 2018, pp. 6246-6253.
[17] S. Niu、S. Chen、H. Guo、C. Targonski、M. C. Smith 和 J. Kovaevi,“广义价值迭代网络:超越晶格的生活”,载于 AAAI 论文集,美国洛杉矶新奥尔良,2018 年,第 6246-6253 页。

[18] A. Deac, P. Velickovic, O. Milinkovic, P.-L. Bacon, J. Tang, and M. Nikolic, “XLVIN: Executed latent value iteration nets,” 2020, arXiv:2010.13146.
[18] A. Deac, P. Velickovic, O. Milinkovic, P.-L.Bacon, J. Tang, and M. Nikolic, “XLVIN: 执行的潜在值迭代网络”,2020 年,arXiv:2010.13146。

[19] N. Nardelli, P. Kohli, G. Synnaeve, P. Torr, Z. Lin, and N. Usunier, “Value propagation networks,” in Proc. ICLR, New Orleans, LA, USA, 2019, pp. 1-13.
[19] N. Nardelli、P. Kohli、G. Synnaeve、P. Torr、Z. Lin 和 N. Usunier,“价值传播网络”,ICLR 论文集,美国洛杉矶新奥尔良,2019 年,第 1-13 页。

[20] L. Zhang, X. Li, S. Chen, H. Zang, J. Huang, and M. Wang, “Universal value iteration networks: When spatially-invariant is not universal,” in Proc. AAAI, New York, NY, USA, 2020, pp. 6778-6785.
[20] L. Zhang, X. Li, S. Chen, H. Zang, J. Huang, and M. Wang, “Universal value iteration networks: When spacetially-invariant is not universal,” in Proc. AAAI, New York, NY, USA, 2020, pp. 6778-6785.

[21] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Las Vegas, NA, USA, Jun. 2016, pp. 770-778.
[21] K. He, X. Zhang, S. 任, and J. Sun, “深度残差学习用于图像识别”,载于 IEEE 会议论文集。Vis. pattern 识别。(CVPR),美国北美拉斯维加斯,2016 年 6 月,第 770-778 页。

[22] L. Lee, E. Parisotto, D. Chaplot, E. Xing, and R. Salakhutdinov, “Gated path planning networks,” in Proc. ICML, Stockholm, Sweden, 2018, pp. 4597-4608.
[22] L. Lee、E. Parisotto、D. Chaplot、E. Xing 和 R. Salakhutdinov,“门控路径规划网络”,ICML 论文集,瑞典斯德哥尔摩,2018 年,第 4597-4608 页。

[23] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proc. CVPR, Honolulu, HI, USA, Jul. 2017, pp. 936-944.
[23] T.-Y.Lin、P. Dollár、R. Girshick、K. He、B. Hariharan 和 S. Belongie,“用于对象检测的特征金字塔网络”,CVPR 论文集,美国夏威夷州檀香山,2017 年 7 月,第 936-944 页。

[24] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Commun. ACM, vol. 60, no. 6, pp. 84-90, May 2017, doi: 10.1145/3065386.
[24] A. Krizhevsky、I. Sutskever 和 G. E. Hinton,“使用深度卷积神经网络进行 ImageNet 分类”,Commun.ACM,第 60 卷,第 6 期,第 84-90 页,2017 年 5 月,doi: 10.1145/3065386。

[25] M. Tan and Q. V. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” in Proc. 36th Int. Conf. Mach. Learn., Long Beach, CA, USA, 2019, pp. 6105-6114.
[25] M. Tan 和 Q. V. Le,“EfficientNet:重新思考卷积神经网络的模型缩放”,第 36 届国际会议论文集,美国加利福尼亚州长滩,2019 年,第 6105-6114 页。

[26] A. Howard, M. Sandler, B. Chen, W. Wang, L.-C. Chen, M. Tan, G. Chu, V. Vasudevan, Y. Zhu, R. Pang, H. Adam, and Q. Le, “Searching for MobileNetV3,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Seoul, South Korea, Oct. 2019, pp. 1314-1324.
[26] A. Howard, M. Sandler, B. Chen, W. Wang, L.-C.Chen, M. Tan, G. Chu, V. Vasudevan, Y. Zhu, R. Pang, H. Adam, and Q. Le, “搜索 MobileNetV3”,IEEE/CVF 国际会议论文集。Vis. (ICCV),韩国首尔,2019 年 10 月,第 1314-1324 页。

[27] F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Honolulu, HI, USA, Jul. 2017, pp. 1800-1807.
[27] F. Chollet,“Xception:具有深度可分离卷积的深度学习”,IEEE 会议论文集。Vis. pattern 识别。(CVPR),美国夏威夷州檀香山,2017 年 7 月,第 1800-1807 页。

[28] Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A ConvNet for the 2020s,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), New Orleans, LA, USA, Jun. 2022, pp. 11976-11986.
[28] Z. Liu, H. 毛, C.-Y.Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A ConvNet for the 2020s,” in Proc. IEEE/CVF Conf. Comput.Vis. pattern 识别。(CVPR),美国路易斯安那州新奥尔良,2022 年 6 月,第 11976-11986 页。

[29] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, and A. Desmaison, “PyTorch: An imperative style, high-performance deep learning library,” in Proc. Adv. NeurIPS, Vancouver, BC, Canada, vol. 32, 2019, pp. 8026-8037.
[29] A. Paszke、S. Gross、F. Massa、A. Lerer、J. Bradbury、G. Chanan、T. Killeen、Z. Lin、N. Gimelshein、L. Antiga 和 A. Desmaison,“PyTorch:一种命令式、高性能深度学习库”,载于 Proc. Adv. NeurIPS,加拿大不列颠哥伦比亚省温哥华,第 32 卷,2019 年,第 8026-8037 页。

[30] K. He, R. Girshick, and P. Dollár, “Rethinking ImageNet pre-training,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 4918-4927.
[30] K. He, R. Girshick, and P. Dollár, “Rethinking ImageNet pre-training,” in Proc. IEEE/CVF Int. Conf. Comput.Vis. (ICCV),2019 年 10 月,第 4918-4927 页。

[31] X. Jin, W. Lan, T. Wang, and P. Yu, “Value iteration networks with double estimator for planetary rover path planning,” Sensors, vol. 21, no. 24, p. 8418, Dec. 2021. [Online]. Available: https://www.mdpi.com/14248220/21/24/8418
[31] X. Jin、W. Lan、T. Wang 和 P. Yu,“用于行星漫游者路径规划的具有双重估计器的价值迭代网络”,传感器,第 21 卷,第 24 期,第 8418 页,2021 年 12 月。[在线]。可用: https://www.mdpi.com/14248220/21/24/8418

XIANG JIN received the B.S. degree in naval architecture and ocean engineering, in 2015, and the M.S. degree in design and construction of naval architecture and ocean structure from Dalian Maritime University, Dalian, China, in 2018, where he is currently pursuing the Ph.D. degree in marine electrical engineering.
金翔于 2015 年获得船舶建筑与海洋工程学士学位,并于 2018 年获得中国大连海事大学船舶建筑与海洋结构设计与建造硕士学位,目前正在攻读海洋电气工程博士学位。
His research interests include the motion planning and formation control of the autonomous underwater vehicles.
他的研究兴趣包括自主水下航行器的运动规划和编队控制。

WEI LAN received the B.S. degree in marine engineering, in 2014, and the M.S. degree in naval architecture and ocean engineering from Dalian Maritime University, Dalian, China, in 2016, where he is currently pursuing the Ph.D. degree in marine engineering.
魏蓝于 2014 年获得海洋工程学士学位,并于 2016 年获得中国大连大连海事大学船舶与海洋工程硕士学位,目前正在攻读海洋工程博士学位。
His research interests include the research of small underwater vehicle, and formation motion and control composed of underwater vehicles.
他的研究兴趣包括小型水下航行器的研究,以及由水下航行器组成的编队运动和控制。

XIN CHANG received the B.S. degree in ship engineering and the M.S. and Ph.D. degrees in design and construction of naval architecture and ocean structure from Harbin Engineering University, Harbin, China, in 2000, 2003, and 2005, respectively, and the Ph.D. degree from the Harbin Institute of Technology, in 2009.
张欣欣分别于 2000 年、2003 年和 2005 年获得哈尔滨工程大学船舶工程学士学位以及船舶建筑和海洋结构设计与建造硕士和博士学位,并于 2009 年获得哈尔滨工业大学博士学位。
His research interests include the research of ship propulsion performance and energy saving, ship overall performance evaluation technology, and ship life cycle health management.
他的研究兴趣包括船舶推进性能与节能研究、船舶整体性能评价技术、船舶生命周期健康管理。

  1. The associate editor coordinating the review of this manuscript and approving it for publication was Abderrahmane Lakas ( D ) ( D ) ^((D)){ }^{(\mathrm{D})}.
    协调这份手稿的审查并批准其出版的副主编是 Abderrahmane Lakas ( D ) ( D ) ^((D)){ }^{(\mathrm{D})}