利用视觉记忆进行具有人形避障功能的导航 a a ^("a "){ }^{\text {a }}

Josafat Delfin $^{a}$ , Héctor M. Becerra $^{b,*}$ , Gustavo Arechavaleta $^{a}$
Josafat Delfin $^{a}$ ，Héctor M. Becerra $^{b,*}$ ，Gustavo Arechavaleta $^{a}$ $^{a}$ Robotics and Advanced Manufacturing Group, Centro de Investigación y de Estudios Avanzados del IPN (CINVESTAV), Saltillo, Coah., Mexico
$^{a}$ 机器人与先进制造组，IPN 高级研究与研究中心（CINVESTAV），墨西哥科阿韦拉州萨尔蒂约 $^{b}$ Computer Science Department, Centro de Investigación en Matemáticas (CIMAT), C/ Jalisco S/N, Col. Valenciana. C.P. 36023, Guanajuato, Gto., Mexico
$^{b}$ 计算机科学系，数学研究中心（CIMAT），墨西哥瓜纳华托州瓜纳华托市瓦伦西亚区哈利斯科街无号，邮编 36023

H I G H L I G H T S
重点

A complete humanoid navigation scheme based on a visual memory is proposed.
提出了一种基于视觉记忆的完整人形机器人导航方案。
It integrates localization, planning and path following with obstacle avoidance.
该方案集成了定位、规划和路径跟踪以及障碍物避让。
A sensor-based framework is used for visual path following and obstacle avoidance.
采用基于传感器的框架实现视觉路径跟踪和障碍物避让。
Transitions between tasks are performed smoothly to avoid instability of the humanoid.
任务之间的过渡平稳进行，以避免人形机器人不稳定。
An extensive experimental evaluation using the NAO platform is presented.
本文展示了使用 NAO 平台进行的大量实验评估。

A R T I C L E I N F O
文章信息

Article history: 文章历史：

Received 8 February 2018 2018 年 2 月 8 日收到
Received in revised form 11 August 2018
2018 年 8 月 11 日修订后收到
Accepted 24 August 2018
2018 年 8 月 24 日接受
Available online 7 September 2018
2018 年 9 月 7 日在线发布

Keywords: 关键词：

Visual navigation  视觉导航
Humanoid robots  人形机器人
Visual memory  视觉记忆
Obstacle avoidance  障碍物避让

Abstract 摘要

We present a complete humanoid navigation scheme based on a topological map known as visual memory (VM), which is composed by a set of key images acquired offline by means of a supervised teaching phase (human-guided). Our autonomous navigation scheme integrates the humanoid localization in the VM, a visual path planner and a path follower with obstacle avoidance. We propose a pure visionbased localization algorithm that takes advantage of the topological structure of the VM to find the key image that best fits the current image in terms of common visual information. In addition, the visual path planner benefits obstacle-free paths. The VM is updated when a new obstacle is detected with an RGB-D camera mounted on the humanoid’s head. The visual path following and obstacle avoidance problems are formulated in a unified sensor-based framework in which, a hierarchy of tasks is defined, and the transitions of consecutive and hierarchical tasks are performed smoothly to avoid instability of the humanoid. An extensive experimental evaluation using the NAO platform shows the good performance of the navigation scheme.
我们提出了一种基于拓扑地图——称为视觉记忆（VM）的完整人形机器人导航方案，该视觉记忆由一组通过监督教学阶段（人类引导）离线采集的关键图像组成。我们的自主导航方案集成人形机器人在视觉记忆中的定位、视觉路径规划器以及带有障碍物避让的路径跟随器。我们提出了一种纯视觉定位算法，利用视觉记忆的拓扑结构，寻找与当前图像在共同视觉信息方面最匹配的关键图像。此外，视觉路径规划器能够规划无障碍路径。当通过安装在人形机器人头部的 RGB-D 相机检测到新障碍物时，视觉记忆会被更新。视觉路径跟随和障碍物避让问题被统一地表述在基于传感器的框架中，在该框架中定义了任务层级，并且连续且层级任务的转换平滑进行，以避免人形机器人不稳定。使用 NAO 平台进行的大量实验评估展示了该导航方案的良好性能。

1. Introduction 1. 引言

One of the main objectives to be reached by a humanoid is to mimic human navigation strategies. An important feature of the human brain is the ability to visually memorize key scenes of previously visited places for facilitating a subsequent navigation in the same place [1]. The idea of using a sequence of key images for robot navigation was early introduced in [2], where the visual memory (VM) was called view-sequenced route representation. A VM is a topological map that represents an environment by a set of key images [3,4], which are typically organized as a directed graph where each node is a key image and the edges provide information
类人机器人的主要目标之一是模仿人类的导航策略。人脑的一个重要特征是能够视觉记忆先前访问地点的关键场景，从而促进在同一地点的后续导航[1]。利用关键图像序列进行机器人导航的思想最早在文献[2]中提出，其中视觉记忆（VM）被称为视图序列路径表示。视觉记忆是一种拓扑地图，通过一组关键图像来表示环境[3,4]，这些关键图像通常组织成有向图，每个节点代表一个关键图像，边则提供关键图像之间的关系信息。

of a relation between key images. Once this representation of the environment is known, the problems of robot localization, path planning and autonomous navigation can be solved.
一旦环境的这种表示被确定，机器人定位、路径规划和自主导航的问题便可得到解决。

Visual servo control schemes are built upon the sensor-based control approach, taking advantage of the rich information of images acquired by a vision system to command the motion of a robot. However, the scope of visual-servo controllers is limited to positioning tasks where the visual scene is partially observed from the initial and target locations [5-7]. This is due to the field-ofview constraint of typical vision systems [8]. This local behavior appears in any of the two classical approaches of visual servoing: the position-based scheme (PBVS), where the estimation of some 3D pose parameters is needed; and the image-based scheme (IBVS), where direct feedback of image features is used [9]. The extension suggested in

[2, 10]

takes advantage of a sequence of target images (VM), which share visual information between each pair of consecutive images, and the image acquired at the initial location shares information with one of the target images. In this
视觉伺服控制方案建立在基于传感器的控制方法之上，利用视觉系统获取的丰富图像信息来指挥机器人的运动。然而，视觉伺服控制器的应用范围仅限于视觉场景在初始和目标位置部分可见的定位任务[5-7]。这是由于典型视觉系统的视场限制[8]。这种局部行为出现在视觉伺服的两种经典方法中：基于位置的方案（PBVS），需要估计某些三维位姿参数；以及基于图像的方案（IBVS），使用图像特征的直接反馈[9]。文献

[2, 10]

中提出的扩展利用了一系列目标图像（视觉记忆，VM），这些图像之间共享视觉信息，且初始位置采集的图像与其中一幅目标图像共享信息。在这种
manner, the robot enlarges its workspace as the images associated to the initial and final locations do not need to share information. This strategy has been mainly exploited for navigation of wheeled mobile robots [2,4,11,12].
方式下，机器人扩大了其工作空间，因为与初始和最终位置相关联的图像不需要共享信息。这一策略主要被用于轮式移动机器人的导航[2,4,11,12]。

In this paper, we propose a humanoid navigation strategy based on a VM, which encloses robot localization, visual path planning and path following with obstacle avoidance. From a minimum number of detected point features, the proposed strategy computes continuous velocities that can be applied in indoor environments with natural scenes, like corridors, uncluttered or cluttered environments.
本文提出了一种基于视觉记忆（VM）的类人导航策略，该策略涵盖机器人定位、视觉路径规划以及带有障碍物避让的路径跟随。通过最少数量的检测点特征，所提策略计算出可应用于具有自然场景的室内环境（如走廊、无杂物或杂乱环境）的连续速度。

This paper is an incremental work with respect to our previous results in

[13, 14]

. In the first reference, we have proposed a visual path following scheme for humanoid robots that is generic in the sense that it can be used for different types of visual servo controllers. In the second reference, we have extended the scheme by solving the initial humanoid localization problem in the VM to plan a visual path when the VM has several forks and/or cycles. As extensions to those works, in this paper we propose an enhanced pure vision-based localization algorithm that first finds a neighborhood of key images around the image currently observed by the robot, to later perform a local search for the key image that best fits the current image in terms of common visual information. In addition, we incorporate a versatile obstacle avoidance strategy that is aided by depth information from an RGB-D camera mounted on the robot’s head. The strategy deals with static obstacles that appear during the autonomous navigation, and that were not present during the teaching phase to generate the VM. Besides, the unexpected obstacles are localized in the VM, and they are taken into account for subsequent navigation tasks.
本文是在我们之前成果

[13, 14]

基础上的增量工作。在第一个参考文献中，我们提出了一种适用于人形机器人视觉路径跟踪的方案，该方案具有通用性，可用于不同类型的视觉伺服控制器。在第二个参考文献中，我们通过解决视觉记忆（VM）中人形机器人初始定位问题，扩展了该方案，以便在 VM 存在多个分叉和/或环路时规划视觉路径。作为这些工作的扩展，本文提出了一种增强的纯视觉定位算法，该算法首先在机器人当前观察图像周围找到一组关键图像邻域，随后进行局部搜索，以确定在共有视觉信息方面最匹配当前图像的关键图像。此外，我们引入了一种多功能的障碍物规避策略，该策略借助安装在机器人头部的 RGB-D 相机提供的深度信息。该策略能够处理自主导航过程中出现的静态障碍物，这些障碍物在生成视觉记忆的教学阶段并不存在。此外，意外障碍物在视觉记忆（VM）中被定位，并在后续导航任务中予以考虑。

The main contribution of this work is a novel vision-based navigation scheme for humanoids formulated in a unifying framework of consecutive and hierarchical tasks (visual servoing and obstacle avoidance tasks), for which, the transitions between tasks are performed using smooth functions to keep the balance of the humanoid platform. The generation of continuous velocities that are given to the walking pattern generator (WPG) is particularly important for visual control of humanoids [7]. Task transitions occur when a new target image is given, or when an obstacle is detected for stepping over or circumvent it while maintaining the visual target in sight during the navigation. The mobility of the humanoid robot is exploited to walk laterally when it is needed as well as to step over obstacles. Thus, no motion constraints are imposed to the robot. An extensive experimental evaluation using the NAO platform shows the good performance of the navigation scheme.
本工作的主要贡献是一种新颖的基于视觉的人形机器人导航方案，该方案在连续且分层的任务（视觉伺服和避障任务）统一框架下构建，任务之间的切换通过平滑函数实现，以保持人形平台的平衡。连续速度的生成并传递给步态生成器（WPG）对于人形机器人的视觉控制尤为重要[7]。当给出新的目标图像，或检测到障碍物需要跨越或绕行时，任务切换发生，同时在导航过程中保持视觉目标在视野内。人形机器人的机动性被充分利用，在需要时进行侧向行走以及跨越障碍物。因此，对机器人没有施加运动约束。基于 NAO 平台的广泛实验评估展示了该导航方案的良好性能。

Our proposal aims to use 2D information as much as possible, such that the only available information for navigation is a set of key images and the current view. The generation of the VM is done offline, i.e., the key images to form the VM are acquired and selected by means of a supervised teaching phase (humanguided). Depth information is just a complement to facilitate the obstacle avoidance task. We have left as future work the detection of obstacles relying only on monocular vision. The main advantage of using 2D information instead of 3D is the reduction of computational requirements, which are poor in commercial and small-sized humanoid platforms like NAO, since there is no need of generating and updating a complex spatial map. For instance, in the proposed localization method no extra information is needed, like odometry or sensor fusion.
我们的方案旨在尽可能多地使用二维信息，使得导航时唯一可用的信息是一组关键图像和当前视图。视觉记忆（VM）的生成是在离线状态下完成的，即通过监督教学阶段（人工引导）获取并选择形成 VM 的关键图像。深度信息仅作为辅助，以便于障碍物规避任务。我们将仅依靠单目视觉进行障碍物检测作为未来工作。使用二维信息而非三维信息的主要优势在于降低计算需求，而商业化和小型人形机器人平台如 NAO 的计算能力有限，因为无需生成和更新复杂的空间地图。例如，在所提出的定位方法中，不需要额外的信息，如里程计或传感器融合。

The rest of the paper is structured as follows: Section 2 presents a brief account of related work. Section 3 presents an overview of the whole navigation strategy. Section 4 details the structure of the graph that represents the VM. Section 5 describes the proposed localization algorithm and the path planning stage. Section 6 presents the proposed visual path following scheme with smooth transitions of visual tasks. Section 7 describes the obstacle avoidance strategies. Section 8 presents the experimental evaluation of the stages of the method and its integration. Finally, Section 9 gives some concluding remarks and ideas of future work.
本文余下部分结构如下：第 2 节简要介绍相关工作。第 3 节概述整体导航策略。第 4 节详细说明表示视觉记忆（VM）的图结构。第 5 节描述所提出的定位算法及路径规划阶段。第 6 节介绍带有视觉任务平滑过渡的视觉路径跟随方案。第 7 节阐述障碍物避让策略。第 8 节展示方法各阶段及其集成的实验评估。最后，第 9 节给出结论性意见及未来工作展望。

The use of VM has made possible to overcome the local behavior of visual servo control schemes. The navigation based on a VM has been mainly used for wheeled mobile robots. For instance, a local Euclidean reconstruction is used in [4] to avoid building a complete map. Geometric reconstruction is used in [11] to predict the feature positions, and it allows recovery of features tracking failures. An image-based scheme reported in [12] makes use of direct feedback from a visual geometric constraint. In [15], the integration of a laser range finder in the visual navigation helps to avoid unexpected obstacles. Different from other robots, humanoids are able to perform several tasks at the same time [16], which represents one of their main advantages.
视觉记忆（VM）的使用使得克服视觉伺服控制方案的局部行为成为可能。基于视觉记忆的导航主要应用于轮式移动机器人。例如，[4]中使用局部欧几里得重建以避免构建完整地图。[11]中采用几何重建来预测特征位置，并允许恢复特征跟踪失败。[12]中报道的基于图像的方案利用视觉几何约束的直接反馈。[15]中，激光测距仪与视觉导航的集成有助于避免意外障碍物。与其他机器人不同的是，人形机器人能够同时执行多项任务[16]，这也是其主要优势之一。

Vision-based control of humanoid robots is an interesting and challenging problem due to the inherent dynamic balance and the undesired sway motion introduced by bipedal locomotion [5, 6]. Solutions to the visual servoing problem for local positioning of humanoid robots have been reported in [5-7]. These control schemes exploit the reactive WPG proposed in [17], which provides automatic generation of foot placements from desired values of linear and angular velocities of the robot’s center of mass (CoM). In [5] and [7], a visual servoing scheme gives the velocity reference for the CoM as the main input to the WPG. Such schemes are considered decoupled approaches since the visual controller is independent of the WPG. In contrast, a coupled approach has been proposed in [6], where visual constraints are directly introduced in the WPG.
基于视觉的类人机器人控制是一个有趣且具有挑战性的问题，原因在于双足行走固有的动态平衡以及由此引起的不期望的摆动运动[5, 6]。关于类人机器人局部定位的视觉伺服问题的解决方案已在文献[5-7]中报道。这些控制方案利用了文献[17]中提出的反应式步态生成器（WPG），该生成器能够根据机器人质心（CoM）的线速度和角速度的期望值自动生成足部位置。在文献[5]和[7]中，视觉伺服方案为 CoM 提供速度参考，作为 WPG 的主要输入。这类方案被视为解耦方法，因为视觉控制器与 WPG 是独立的。相反，文献[6]提出了一种耦合方法，其中视觉约束直接引入到 WPG 中。

The extension of visual servo controllers to deal with humanoid navigation in indoor environments has important applications in service robotics for surveillance and assistance. In the literature, a camera onboard the robot has been used to achieve vision-based navigation, which requires a larger displacement than the local visual servoing task [10,18,19]. A landmark-based navigation approach that integrates motion planning through geometric primitives and visual servoing tasks is described in [20]. Although the motion planner returns obstacle-free paths, the locomotion model is not capable to step over obstacles. A trajectory tracking control in the Cartesian space, based on vision aided by odometry, is proposed in [18]. The estimation of the humanoid’s spatial location considers data fusion from different sensors, similarly than in [21].
将视觉伺服控制器扩展到处理室内环境中的类人导航，在服务机器人用于监控和辅助方面具有重要应用。文献中，机器人搭载的摄像头被用于实现基于视觉的导航，该导航任务所需的位移大于局部视觉伺服任务[10,18,19]。文献[20]描述了一种基于地标的导航方法，该方法通过几何基元和视觉伺服任务集成运动规划。尽管运动规划器返回无障碍路径，但运动模型无法跨越障碍物。文献[18]提出了一种基于视觉辅助里程计的笛卡尔空间轨迹跟踪控制。类人机器人的空间位置估计考虑了来自不同传感器的数据融合，类似于文献[21]。

Pure vision-based humanoid navigation strategies have also been proposed [10,19,22]. In particular, [22] suggests the use of landmarks like 2D bar codes while [19] proposes the use of vanishing points. Both schemes consider environments with corridors. The extension of the view-sequenced route representation for humanoid navigation is reported in [10]. The evaluation is also carried out in a corridor with no initial localization and the controller is based on template correlation to decide basic motion actions. Thus, our proposed scheme can be used in less restrictive indoor environments with natural scenes unlike the referred works [10, 19,22]. Moreover, our method considers the holonomic nature of humanoid robots, in contrast to previous works [18-20], which constraint the robot motion as a unicycle model.
纯视觉驱动的人形机器人导航策略也已被提出[10,19,22]。特别地，[22]建议使用类似二维条形码的地标，而[19]则提出使用消失点。这两种方案均考虑了带走廊的环境。[10]报道了视序路线表示法在人形机器人导航中的扩展。评估同样在无初始定位的走廊环境中进行，控制器基于模板相关性来决定基本运动动作。因此，我们提出的方案可用于比上述文献[10,19,22]更不受限制的自然场景室内环境。此外，我们的方法考虑了人形机器人全向运动的特性，而与之前将机器人运动限制为单轮模型的工作[18-20]不同。

An important issue for humanoid navigation is the ability of avoiding obstacles. In structured indoor scenarios, the scheme reported in [23] uses line-based scene analysis and feature tracking to step over or walk around obstacles. Vision aided by range sensors has also been used for humanoid navigation with obstacle avoidance in [24]. In that work, obstacles are identified by means of laser data and that information is then used to train visual classifiers that are later applied to the images during autonomous navigation. Some works focus on the generation of stable dynamic movements in scenarios with obstacles by stepping over them [25, 26]. These works were conceived for human-sized humanoid platforms, like the HRP-2 robot, and a priori knowledge of obstacles
类人机器人导航的一个重要问题是避障能力。在结构化的室内场景中，文献[23]中报道的方案利用基于线的场景分析和特征跟踪来跨越或绕行障碍物。文献[24]中也采用了结合测距传感器的视觉辅助方法实现类人机器人导航及避障。在该工作中，障碍物通过激光数据识别，随后利用这些信息训练视觉分类器，分类器在自主导航过程中应用于图像识别。一些研究则侧重于在有障碍物的场景中通过跨越障碍物生成稳定的动态运动[25, 26]。这些研究针对的是人体大小的类人机器人平台，如 HRP-2 机器人，并且依赖于对障碍物的先验知识。
dimensions. Recently, an extended version of the WPG of [17] has been introduced in [27], which considers the feet orientation, and it is applied for obstacle avoidance using the HRP-2 robot. However, the underlying nonlinear model predictive control must solve a nonlinear program with a time horizon, which is computationally expensive. Our navigation strategy proposes a simpler method for obstacle avoidance than the ones used in previous works, focusing in an adequate management of hierarchical tasks. Monocular vision aided by data from encoders has been suggested in [28]. The scheme uses optical flow to make a representation of mazelike environments with obstacles to identify the free space. In [29], the humanoid navigation in a cluttered environment has been addressed, considering an application where the robot transports a load. The navigation scheme relies on a 3D representation of the environment obtained using SLAM, from which, collision-free trajectories are computed.
维度。最近，[27]中引入了[17]的 WPG 的扩展版本，该版本考虑了足部方向，并应用于使用 HRP-2 机器人进行障碍物避让。然而，基础的非线性模型预测控制必须解决具有时间视野的非线性规划问题，计算代价较高。我们的导航策略提出了一种比以往工作中使用的方法更简单的障碍物避让方法，重点在于对层级任务的适当管理。[28]中建议使用单目视觉结合编码器数据。该方案利用光流来表示带有障碍物的迷宫环境，以识别自由空间。在[29]中，考虑机器人运输负载的应用，解决了杂乱环境中的人形机器人导航问题。导航方案依赖于通过 SLAM 获得的环境三维表示，从中计算无碰撞轨迹。

To the authors knowledge, the humanoid navigation based on VM has not been previously extended to deal with obstacle avoidance in indoor environments regardless the characteristics of the work space, i.e., uncluttered, cluttered or corridor-like indoor environments.
据作者所知，基于视觉记忆（VM）的类人导航此前尚未扩展到处理室内环境中的障碍物避让，无论工作空间的特性如何，即无杂乱、杂乱或走廊式的室内环境。

3. Outline of the navigation strategy
3. 导航策略概述

The autonomous navigation framework presented in this work can be divided in four stages: 1) visual memory building, 2) robot localization and path planning, 3) path following and 4) obstacle localization and graph updating. Fig. 1 presents an outline of the complete navigation strategy. In the context of this work, a VM is a directed graph

G = {I, E}

, where each node encodes an image of a set

I = {I_{1}, I_{2}, \dots, I_{m}}

m

key images and

E = {E_{1}, E_{2}, \dots, E_{p}}

is the set of

p

edges where each of them has an associated weight that links a pair of nodes. Recall that we assume that the VM generation is done offline by means of a supervised teaching phase (humanguided). The automatic selection of key images for building such VM represents a problem by itself, and it is not addressed in this work. However, we point out that some methods for keyframes selection have been proposed in the literature related to the field of visual SLAM [30] or video content analysis [31]. The aim of these methods is to ensure a minimum of common visual information between selected images. In this work, we assume that the VM is given as input of the navigation strategy, provided that the keyframes in the VM have enough overlapping of visual features.
本文提出的自主导航框架可分为四个阶段：1）视觉记忆构建，2）机器人定位与路径规划，3）路径跟踪，4）障碍物定位与图更新。图 1 展示了完整导航策略的概述。在本文的背景下，视觉记忆（VM）是一个有向图

G = {I, E}

，其中每个节点编码一组

I = {I_{1}, I_{2}, \dots, I_{m}}

关键图像中的一幅

m

图像，

E = {E_{1}, E_{2}, \dots, E_{p}}

是

p

条边的集合，每条边都具有一个关联权重，连接一对节点。需要指出的是，我们假设视觉记忆的生成是通过监督教学阶段（人类引导）离线完成的。自动选择关键图像以构建此类视觉记忆本身就是一个问题，本文未予以解决。然而，我们指出，文献中已有一些针对关键帧选择的方法，涉及视觉 SLAM 领域[30]或视频内容分析[31]。这些方法的目标是确保所选图像之间具有最小的公共视觉信息。在本研究中，我们假设视觉记忆作为导航策略的输入，前提是视觉记忆中的关键帧具有足够的视觉特征重叠。

The robot localization, which is the second stage of the strategy, consists of finding the key image denoted by

I_{1}^{*}

in the VM that best fits the current image

I

in terms of common visual information. Once

I

and

I_{1}^{*}

are linked, then a key image that belongs to the VM is given as the target

I_{n}^{*}

. The visual path planning is carried out to find the shortest visual path

I^{*}

G

from

I_{1}^{*}

I_{n}^{*}

in terms of weights of the edges.
机器人定位作为策略的第二阶段，包含在视觉记忆（VM）中寻找与当前图像

I

在共同视觉信息方面最匹配的关键图像

I_{1}^{*}

。一旦

I

和

I_{1}^{*}

建立关联，则将属于 VM 的关键图像作为目标

I_{n}^{*}

。视觉路径规划旨在根据边的权重，在

G

中从

I_{1}^{*}

到

I_{n}^{*}

寻找最短的视觉路径

I^{*}

。

In the path following phase, the robot performs an autonomous navigation to follow the resulting visual path. A visual control scheme guides the humanoid along the sequence of key images until the robot achieves the desired location. The last step of the navigation strategy is the VM updating, which consists of modifying the weights attached to the edges of

G

whenever the robot encounters new static obstacles during the visual path following. In particular, if an obstacle is found while the robot is following the piece of visual path from

I_{i}

I_{i + 1}

, the weight at the corresponding edge

E_{i}

is increased. Thus, the number

N_{o}

of detected obstacles and their location in the VM are memorized. This allows the robot to avoid costly paths with obstacles in a future navigation.
在路径跟随阶段，机器人执行自主导航以跟随生成的视觉路径。视觉控制方案引导人形机器人沿关键图像序列前进，直到机器人达到预期位置。导航策略的最后一步是视觉记忆（VM）的更新，即在机器人沿视觉路径行进时遇到新的静态障碍物时，修改附加在

G

边上的权重。具体来说，如果机器人在从

I_{i}

到

I_{i + 1}

的视觉路径段上发现障碍物，则相应边

E_{i}

的权重会增加。因此，检测到的障碍物数量

N_{o}

及其在视觉记忆中的位置被记录下来。这使得机器人在未来导航中能够避免带有障碍物的高成本路径。

The main steps of the proposed humanoid navigation framework are presented in Algorithm 1, where each function is described in detail in the subsequent sections.
所提出的人形机器人导航框架的主要步骤在算法 1 中展示，每个函数将在后续章节中详细描述。

Algorithm 1: visualNavigation allows the autonomous navigation of
the robot.
    Input: Graph of key images \(\mathbf{G}=\{\mathbf{I}, \mathcal{E}\}\), target image \(\mathcal{I}_{n}^{*}\)
    Output: Autonomous visual navigation
    \(\mathcal{I}=\) captureCurrentImage
    \(\mathcal{I}_{1}^{*}=\operatorname{localizeRobot}(\mathcal{I}, \mathbf{I})\)
    \(\mathbf{I}^{*}=\operatorname{getShortestPath}\left(\mathcal{I}_{1}^{*}, \mathcal{I}_{n}^{*}, \mathbf{G}\right)\)
    (navigation, \(N_{o}, \mathcal{E}_{i}\) ) = pathFollower( \(\mathcal{I}, \mathbf{I}^{*}, \mathbf{G}\) )
    if \(N_{0}>0\) then
        \(\mathbf{G}=\operatorname{graphUpdating}\left(N_{o}, \mathcal{E}_{i}, \mathbf{G}\right)\)
    return

4. Structure of the visual memory
4. 视觉记忆的结构

As aforementioned, a VM is a graph

G = {I, E}

where the nodes

I

encode key images obtained during a teaching phase. It is worth noting that the graph is not always linear, the VM could have several branches and loops. In this section, we complement the description of the graph’s structure by defining the weights of the edges between nodes.
如前所述，视觉记忆（VM）是一个图

G = {I, E}

，其中节点

I

编码了教学阶段获得的关键图像。值得注意的是，该图并非总是线性的，视觉记忆可能具有多个分支和环路。本节通过定义节点间边的权重，补充图结构的描述。

The weight attached to an edge represents the cost of traveling from the current node to one of the adjacent nodes, and it is denoted by

E_{i}

. We define the weight of an edge in terms of the number of matched interest points and the amount of rotation between neighboring key images as follows:
边的权重表示从当前节点到相邻节点的移动代价，记为

E_{i}

。我们根据匹配兴趣点的数量和相邻关键图像之间的旋转量定义边的权重，具体如下：

E_{i} = α (1 - {\bar{s}}_{i}^{*}) + β \overset{―}{θ u_{i}^{*}}

,
where

{\bar{s}}_{i}^{*}

and

\overset{―}{θ u_{i}^{*}}

are the normalized number of matches and the normalized rotation with respect to the vertical axis (perpendicular to the motion plane) between neighboring key images respectively,

α

and

β

are weights to favor one of the terms if desired. The cost related to the number of matches

(1 - {\bar{s}}_{i}^{*})

means that, the more matches there are between nodes, the lower the cost will be. The cost related to the rotation means that, the more rotation there are between nodes the higher the cost will be.
其中

{\bar{s}}_{i}^{*}

和

\overset{―}{θ u_{i}^{*}}

分别是相邻关键图像之间匹配点数的归一化值和相对于垂直轴（垂直于运动平面）的归一化旋转量，

α

和

β

是用于调整两项权重的系数。与匹配点数相关的代价

(1 - {\bar{s}}_{i}^{*})

表示节点间匹配点越多，代价越低。与旋转相关的代价表示节点间旋转越大，代价越高。

In previous works [3,4], the weights of the edges are unitary, i.e., the cost of a visual path is given by counting the number of key images in the route, and an edge exists if a minimum number of point correspondences is achieved to relate neighboring nodes. In this work, the term related to rotation

{\overset{―}{θ u}}_{i}^{*}

is included to deal with environments where the richness of point features

{\bar{s}}_{i}^{*}

is low but sufficient, and the rotation can become more important to determine the cost of a route.
在先前的工作[3,4]中，边的权重是单位权重，即视觉路径的代价通过计算路径中关键图像的数量来确定，且当达到最小数量的点对应关系以关联相邻节点时，边才存在。在本工作中，加入了与旋转相关的项

{\overset{―}{θ u}}_{i}^{*}

，以应对点特征丰富度

{\bar{s}}_{i}^{*}

较低但足够的环境，其中旋转在确定路径代价时可能变得更为重要。

Thus, we estimate the relative rotation

θ u_{i}^{*}

between neighboring key images

I_{i}

and

I_{i + 1}

, which have associated camera reference frames

C_{i}

and

C_{i + 1}

, respectively. An option to recover the whole relative pose (with translation up to scale) between the frames

C_{i}

and

C_{i + 1}

is by means of the decomposition of a geometric constraint, for instance, the homography matrix [32]. To estimate the homography matrix

H_{i}^{*}

, only the key images

I_{i}

and

I_{i + 1}

are needed. The relative transformation between

C_{i}

and

C_{i + 1}

is encoded in the Euclidean homography as [33]:
因此，我们估计相邻关键图像

I_{i}

和

I_{i + 1}

之间的相对旋转

θ u_{i}^{*}

，它们分别具有相关联的相机参考系

C_{i}

和

C_{i + 1}

。恢复参考系

C_{i}

和

C_{i + 1}

之间完整相对位姿（带尺度的平移）的一种方法是通过几何约束的分解，例如单应矩阵[32]。估计单应矩阵

H_{i}^{*}

仅需关键图像

I_{i}

和

I_{i + 1}

。

C_{i}

和

C_{i + 1}

之间的相对变换编码在欧氏单应矩阵中，如[33]所示：

H_{i}^{*} = R_{i}^{*} + \frac{t_{i}^{*}}{d_{i}^{*}} n_{i}^{*}

,
where

R_{i}^{*}

and

t_{i}^{*}

are the rotation matrix and translation vector expressed in

C_{i + 1}, d_{i}^{*}

is the distance from

C_{i + 1}

to a plane

π

, and

n_{i}^{*}

is the unitary vector expressed in

C_{i + 1}

normal to

π

. According to (2), it is possible to decompose

H_{i}^{*}

to obtain

t_{i}^{*}

up to scale and

R_{i}^{*}

. The rotation matrix

R_{i}^{*}

can then be parametrized by the axis/angle

θ u_{i}^{*}

. The second element of the vector

θ u_{i}^{*}

, related to the rotation around the

y

-axis of the camera (perpendicular to the motion plane), provides the required rotation between neighboring key images. In addition, such angular component is used to compute
其中

R_{i}^{*}

和

t_{i}^{*}

分别表示旋转矩阵和平移向量，

C_{i + 1}, d_{i}^{*}

表示从

C_{i + 1}

到平面

π

的距离，

n_{i}^{*}

是以

C_{i + 1}

表示的单位向量，垂直于

π

。根据公式(2)，可以分解

H_{i}^{*}

以获得

t_{i}^{*}

（尺度不定）和

R_{i}^{*}

。旋转矩阵

R_{i}^{*}

随后可以通过轴/角

θ u_{i}^{*}

参数化。向量

θ u_{i}^{*}

的第二个元素与相机绕

y

轴（垂直于运动平面）的旋转相关，提供了相邻关键图像之间所需的旋转。此外，该角度分量用于计算

Fig. 1. Outline of the navigation strategy. Once the VM is available, the robot must localize itself in the VM by finding the image that best fits the current image. Then the visual path to the target image is found by a path planner. The robot navigates autonomously by following the visual path while avoiding static obstacles detected with a range sensor, which were absent during the teaching phase. The notation

{\cdot}

stands for the reference frame of the robot

r

, camera

c

, obstacle

o

and range sensor

s_{0}

.
图 1. 导航策略概述。一旦视觉记忆（VM）可用，机器人必须通过找到与当前图像最匹配的图像来在 VM 中定位自身。然后，通过路径规划器找到到目标图像的视觉路径。机器人通过沿视觉路径自主导航，同时避开在教学阶段不存在的由测距传感器检测到的静态障碍物。符号

{\cdot}

代表机器人

r

、相机

c

、障碍物

o

和测距传感器

s_{0}

的参考坐标系。
the angular velocity, which corresponds to one of the inputs of the WPG. The other two inputs are longitudinal and lateral velocities as it will be explained in Section 6.
角速度，对应于 WPG 的一个输入。其他两个输入是纵向速度和横向速度，详见第 6 节说明。

In this work, we rely on the homography model to assign the weights of the edges in the graph, to localize the robot and to formulate the visual control scheme. The homography, as a geometric constraint between images, allows the process of points matching to be robust. This projective model is actually general, since it can be computed for three-dimensional non-planar scenes [32]. Besides, the homography model does not have the issue of being bad conditioned with short baseline, as other geometric constraints like the epipolar geometry [33]. In the scheme of [32], there is no need of verifying the existence of a dominant plane in the scene but a virtual plane is defined by selecting three non-collinear point features in the image. A practical suggestion is to select automatically the three points that maximize the area of the corresponding triangle in both images. Differently from the common homography matrix estimation, the scheme based on a virtual plane needs at least eight points (three reference points and five supplementary) to estimate the model.
在本研究中，我们依赖单应性模型为图中的边赋权，以实现机器人定位和视觉控制方案的制定。单应性作为图像间的几何约束，使得点匹配过程具有鲁棒性。该投影模型实际上是通用的，因为它可以用于三维非平面场景的计算[32]。此外，单应性模型不像极线几何等其他几何约束那样，在基线较短时存在条件不良的问题[33]。在文献[32]的方案中，无需验证场景中是否存在主导平面，而是通过选择图像中三个非共线的点特征定义一个虚拟平面。一个实用的建议是自动选择在两幅图像中对应三角形面积最大的三个点。与常见的单应性矩阵估计不同，基于虚拟平面的方案至少需要八个点（三个参考点和五个补充点）来估计模型。

Although the homography model is a good option for extracting the required information directly from images in all stages of the proposed approach, the navigation scheme could handle other geometric constraints such as the fundamental matrix. However, the epipolar geometry is ill-conditioned with short baseline and planar scenes, but a model selection between the homography and fundamental matrices could be formulated as in [34].
尽管单应性模型是从图像中直接提取所需信息的良好选择，适用于所提方法的所有阶段，但导航方案也可以处理其他几何约束，如基础矩阵。然而，极线几何在基线较短和平面场景下条件较差，但可以如文献[34]中所述，将单应性矩阵和基础矩阵之间的模型选择进行公式化。

5. Visual localization and planning
5. 视觉定位与规划

Once the VM is available with the structure described in the previous section, and before starting the autonomous navigation, the robot must localize itself by comparing its current view with the set of memorized key images. Next, given a target image, a path planning stage is needed to find the sequence of key images connecting the image resulting from the localization and the target image. In both stages, we take advantage of the homography estimation for planar and non-planar scenes [32] based on the formulation of a virtual plane.
一旦视觉记忆（VM）具备了上一节所述的结构，并且在开始自主导航之前，机器人必须通过将当前视图与记忆中的关键图像集合进行比较来实现自我定位。接下来，给定目标图像，需要路径规划阶段以找到连接定位结果图像与目标图像的关键图像序列。在这两个阶段中，我们利用基于虚拟平面公式的单应性估计方法，适用于平面和非平面场景[32]。

Algorithm 2: localizeRobot finds the most similar key image \(\mathcal{I}_{1}^{*}\) in
the graph with respect to the current image \(\mathcal{I}\).
    Input: Graph of key images \(\mathbf{G}=\{\mathbf{I}, \mathcal{E}\}\), current image \(\mathcal{I}\), target
            image \(\mathcal{I}_{n}^{*}\)
    Output: Most similar key image \(\mathcal{I}_{1}^{*}\)
    \(\widehat{\mathbf{I}}=\) findNodesNeighborhood \((\mathcal{I}, \mathbf{G}) / / \operatorname{sIZE}(\widehat{\mathbf{I}})=\widehat{m}\)
    for \(j \leftarrow 1\) to \(\widehat{m}\) do
        matches \(=\boldsymbol{\operatorname { m a t c h }}(\mathcal{I}, \widehat{\mathbf{I}}[j])\)
        if matches \(>\mu\) then
            \(\mathbf{H}_{j}=\) computeHomography( matches )
            \(\left(\mathbf{R}_{j}, \mathbf{t}_{j}\right)=\) decomposeHomography \(\left(\mathbf{H}_{j}\right)\)
            \(\mathbf{t}_{ \pm j}=\) computeDirection \(\left(\mathbf{t}_{j}\right)\)
            \(\left\|\mathbf{t}_{j}\right\|=\) computeDistance \(\left(\mathbf{t}_{j}\right)\)
            \(\mathbf{D}[j]=\operatorname{saveImageData}\left(\mathbf{t}_{ \pm j},\left\|\mathbf{t}_{j}\right\|\right)\)
    \(\mathrm{I}_{+}=\operatorname{selectForwardImages}(\mathrm{D})\)
    \(\mathrm{I}_{-}=\operatorname{selectBackwardImages}(\mathrm{D})\)
    if \(\operatorname{size}\left(\mathbf{I}_{+}\right)>0\) then
        \(\mathcal{I}_{1}^{*}=\operatorname{selectMostSimilar}\left(\mathrm{I}_{+}\right)\)
    else
        \(\mathcal{I}_{1}^{*}=\) selectMostSimilar \(\left(\mathbf{I}_{-}\right)\)
    return \(\mathcal{I}_{1}^{*}\)
    Function selectMostSimilar( \(\mathbf{I}_{\mu}\) )
        \(\left(\mathcal{I}_{ \pm 1}^{*}, \mathcal{I}_{ \pm 2}^{*}\right)=\operatorname{selectTwoCandidates}\left(\mathbf{I}_{\mu}\right)\)
        if \(\left(\mathcal{I}_{ \pm 1}^{*}, \mathcal{I}_{ \pm 2}^{*}\right)\) belong to same branch then
            matches \(=\boldsymbol{\operatorname { m a t c h }}\left(\mathcal{I}_{ \pm 1}^{*}, \mathcal{I}_{ \pm 2}^{*}, \mathcal{I}\right)\)
            \(\left\{\mathbf{H}_{1}, \mathbf{H}_{2}\right\}=\) parametersFixedPlane( matches )
            \(\left\{\left\|\mathbf{t}_{1}\right\|,\left\|\mathbf{t}_{2}\right\|\right\}=\) computeDistance \(\left(\left\{\mathbf{H}_{1}, \mathbf{H}_{2}\right\}\right)\)
            \(\mathcal{I}_{1}^{*}=\operatorname{selectTheClosest}\left(\left\{\left\|\mathbf{t}_{1}\right\|,\left\|\mathbf{t}_{2}\right\|\right\}\right)\)
        else
            \(\left\{\mathbf{I}_{1}, \mathbf{I}_{2}\right\}=\operatorname{getShortestPath}\left(\mathcal{I}_{ \pm 1}^{*}, \mathcal{I}_{ \pm 2}^{*}, \mathcal{I}^{*}, \mathbf{G}\right)\)
            \(\left\{\mathbf{l}_{1}, \mathbf{l}_{2}\right\}=\) computePathLength \(\left(\left\{\mathbf{I}_{1}, \mathbf{I}_{2}\right\}\right)\)
            \(\mathcal{I}_{1}^{*}=\operatorname{getImageWithShortestPath}\left(\left\{\mathbf{l}_{1}, \mathbf{l}_{2}\right\}\right)\)
        return \(\mathcal{I}_{1}^{*}\)

We propose Algorithm 2 to perform the required appearancebased robot localization. To avoid an exhaustive comparison between the current image

I

and each key image in the VM, we
我们提出算法 2 以执行所需的基于外观的机器人定位。为了避免对当前图像

I

与视觉记忆中每个关键图像进行穷尽比较，我们...
propose a strategy to find a reduced neighborhood of key images surrounding

I

. This process is performed in line 1 , function findNodesNeighborhood. The underlying algorithmic strategy uses the divide-and-conquer paradigm [35]. First, the current image is compared with each of the “Intersection” nodes, i.e. the nodes where the branches of the graph intersect. Then, each branch of the graph is divided in two to identify the node at the middle of the branch which is compared with

I

, and in the next iteration each half of the branch is divided again. Thus, the worst case happens when the algorithm iterates until the current image has been eventually compared to all existing nodes in the graph. The search stops when at least one of the comparisons get a number of matches greater than

μ

. This minimum number of matches (

μ = 8

) guarantees the computation of the homography [32]. Once the search terminates, a neighborhood of

\hat{m}

nodes surrounding this node is selected, and they are collected in a vector

\hat{I}

of selected key images. The next step consists of finding the most similar key image to

I

among the reduced set

\hat{I}

. For this, match finds the matched features between the current image and all key images in

\hat{I}

.
提出了一种策略，以找到围绕

I

的关键图像的缩减邻域。该过程在第 1 行的函数 findNodesNeighborhood 中执行。其基本算法策略采用分治范式[35]。首先，将当前图像与每个“交叉点”节点进行比较，即图中分支相交的节点。然后，将图的每个分支分成两部分，以识别分支中间的节点，并将其与

I

进行比较，在下一次迭代中，每个分支的一半再次被分割。因此，最坏情况是算法迭代直到当前图像最终与图中所有现有节点进行比较。搜索在至少有一次比较的匹配数大于

μ

时停止。该最小匹配数（

μ = 8

）保证了单应矩阵的计算[32]。搜索终止后，选择围绕该节点的

\hat{m}

个邻域节点，并将它们收集到一个选定关键图像的向量

\hat{I}

中。下一步是从缩减集合

\hat{I}

中找到与

I

最相似的关键图像。为此，match 在当前图像与集合

\hat{I}

中的所有关键图像之间找到匹配特征。

The resulting matches allow the computation of the homography matrix and its decomposition in lines 5-6. Consequently, the estimation of the rotation and translation up to scale is obtained. Then, we classify the key images in two groups according to the resultant direction of the estimated translation vector

t

that is expressed in the current camera reference frame, which can be forward

I_{+}

or backward

I_{-}

with respect to the current image location. From the third component of

t

denoted by

t_{z}

, which is pointing towards the motion direction (see

{c}

reference frame in Fig. 1), we assign a value in line 7 as

t_{\pm} = + 1

t_{z} > 0

t_{\pm} = - 1

t_{z} < 0

.
由此得到的匹配允许在第 5-6 行计算单应矩阵及其分解。因此，获得了旋转和平移（至尺度）的估计。然后，我们根据估计平移向量

t

的结果方向将关键图像分为两组，该向量以当前相机参考系表示，相对于当前图像位置可为前进方向

I_{+}

或后退方向

I_{-}

。根据

t

的第三分量

t_{z}

，该分量指向运动方向（参见图 1 中的

{c}

参考系），我们在第 7 行赋值为

t_{\pm} = + 1

，如果满足

t_{z} > 0

，否则赋值为

t_{\pm} = - 1

，如果满足

t_{z} < 0

。

Although the vector

t

is scaled, its norm

‖ t ‖

in line 8 gives a notion of distance between key images if the homography computation considers that the distance

d_{i}^{*}

to the plane is constant (see Eq. (2)). The direction and distance parameters are saved in a vector

D

in line 9, from which the forward

I_{+}

and backward

I_{-}

groups of images are obtained (lines 10-11). Using the set

I_{+}

, the function selectMostSimilar finds (if exists) the most similar key image to the current one in terms of matched points. If there are no forward images, the algorithm selects the closest image among the backward set

I_{-}

. Clearly, we prefer to localize the robot even if the most similar key image is behind the current location.
尽管向量

t

被缩放，其在第 8 行的范数

‖ t ‖

提供了关键图像之间距离的概念，前提是单应性计算假设到平面的距离

d_{i}^{*}

是恒定的（见公式(2)）。方向和距离参数保存在第 9 行的向量

D

中，从中获得前向

I_{+}

和后向

I_{-}

图像组（第 10-11 行）。利用集合

I_{+}

，函数 selectMostSimilar 找到（如果存在）与当前图像在匹配点方面最相似的关键图像。如果没有前向图像，算法则从后向集合

I_{-}

中选择最近的图像。显然，即使最相似的关键图像位于当前位置之后，我们也更倾向于对机器人进行定位。

The function selectMostSimilar (lines 17-28) selects two candidate images (

I_{\pm 1}^{*}, I_{\pm 2}^{*}

), i.e., the two images with the highest number of point matches. Then, the algorithm verifies if both candidates belong to the same branch of the graph

G

. If it is the case, the algorithm selects the most similar key image using the relative distance

‖ t ‖

from a new homography decomposition. This process computes again

H

and

t

for the candidate images with respect to the current image, but now the virtual plane used for the homography estimation is fixed (lines 21-22). Thus, the estimated distances are consistent. This process is illustrated in Fig. 2. It consists in matching point features between the three images

I

I_{\pm 1}^{*}

and

I_{\pm 2}^{*}

. From the matched features, we select the three points which maximize the surface of the corresponding triangle in the three images [32]. Once the new distances

{‖ t_{1} ‖, ‖ t_{2} ‖}

are obtained, the closest candidate image

I_{1}^{*}

to the current one is selected (line 23), i.e., the image with smaller

‖ t ‖

.
函数 selectMostSimilar（第 17-28 行）选择两个候选图像（

I_{\pm 1}^{*}, I_{\pm 2}^{*}

），即具有最多点匹配数的两幅图像。然后，算法验证这两个候选图像是否属于图的同一分支

G

。如果是，则算法使用新的单应性分解中的相对距离

‖ t ‖

选择最相似的关键图像。该过程重新计算候选图像相对于当前图像的

H

和

t

，但此时用于单应性估计的虚拟平面是固定的（第 21-22 行）。因此，估计的距离是一致的。该过程如图 2 所示。它包括在三幅图像

I

、

I_{\pm 1}^{*}

和

I_{\pm 2}^{*}

之间匹配点特征。从匹配的特征中，我们选择在三幅图像中对应三角形面积最大的三个点[32]。一旦获得新的距离

{‖ t_{1} ‖, ‖ t_{2} ‖}

，则选择与当前图像最接近的候选图像

I_{1}^{*}

（第 23 行），即具有较小

‖ t ‖

的图像。

If the candidate images do not belong to the same branch of the graph, the algorithm selects the most similar key image aided by a path planner that finds the shortest path from the two candidates (

I_{\pm 1}^{*}, I_{\pm 2}^{*}

) to the target image

I_{n}^{*}

(lines 25-27). This consideration is needed to discard a localization that might derive in a long path in the navigation stage. The paths associated to the candidate images are denoted

I_{1}, I_{2}

and their corresponding lengths are

l_{1}, l_{2}

. The solution

I_{1}^{*}

of the localization is the candidate image associated
如果候选图像不属于图的同一分支，算法将借助路径规划器选择最相似的关键图像，该路径规划器寻找从两个候选图像（

I_{\pm 1}^{*}, I_{\pm 2}^{*}

）到目标图像

I_{n}^{*}

的最短路径（第 25-27 行）。此考虑是为了排除可能导致导航阶段路径过长的定位。与候选图像相关联的路径记为

I_{1}, I_{2}

，其对应的长度为

l_{1}, l_{2}

。定位的解

I_{1}^{*}

是与路径最短的候选图像相关联的图像，
to the path with the minimum length, which is obtained by a function getShortestPath. The length of a path is the sum of the values of its edges as defined in (1).
该路径最短由函数 getShortestPath 获得。路径的长度是其边值的总和，如公式（1）所定义。

Once the robot is localized, i.e.,

I_{1}^{*}

and

I_{n}^{*}

are known, the next stage is the visual path planning, which consists of finding the set of key images

I^{*} = {I_{1}^{*}, I_{2}^{*}, \dots, I_{n - 1}^{*}, I_{n}^{*}}

that links the starting and the desired key images of the VM. Thus,

I^{*}

is the path of minimum length in

G

that connects the nodes

I_{1}^{*}

and

I_{n}^{*}

and it is obtained by a function getShortestPath.
一旦机器人定位完成，即已知

I_{1}^{*}

和

I_{n}^{*}

，下一阶段是视觉路径规划，其内容是寻找连接起始关键图像和目标关键图像的关键图像集合

I^{*} = {I_{1}^{*}, I_{2}^{*}, \dots, I_{n - 1}^{*}, I_{n}^{*}}

。因此，

I^{*}

是在

G

中连接节点

I_{1}^{*}

和

I_{n}^{*}

的最短路径，由函数 getShortestPath 获得。

The localization problem addressed in this work can be seen as an image retrieval task. Several methods have been suggested in the literature for this purpose. For instance, methods based on bag of visual words [36-38], inverted multi-indexing [39], and nearest neighbor algorithms [40]. These sophisticated techniques can be useful to reduce the computational complexity of the appearancebased localization for very large visual memories, i.e., when thousands of images have to be analyzed. For datasets of around one hundred of images (as the examples in this paper), the proposed method represents a simple ad-hoc approach with very good performance in suitable computational time. Notice that the localization stage is carried out offline, and it is not necessary to have the results very fast.
本文所解决的定位问题可以视为图像检索任务。文献中为此提出了多种方法，例如基于视觉词袋的方法[36-38]、倒排多重索引[39]以及最近邻算法[40]。这些复杂技术对于减少基于外观的定位在处理非常大规模视觉记忆时的计算复杂度非常有用，即当需要分析数千张图像时。对于约一百张图像的数据集（如本文示例），所提出的方法是一种简单的专用方案，在合适的计算时间内表现出良好的性能。需要注意的是，定位阶段是在离线进行的，因此不必追求结果的极快速度。

We consider that all image retrieval methods are prone to provide false-positive detections, of course, also our proposed method, although it was not the case according to the realized experimental evaluation reported in Section 8. We think that a way to reduce the occurrence of false-positive detections could be the use of a local voting scheme considering only a set of neighbors of a candidate image, since more information is better to improve the response with the cost of increasing computation. Therefore, we kept the method as simple as possible but enough to achieve good performance in the localization task.
我们认为所有图像检索方法都容易产生误报，当然，我们提出的方法也不例外，尽管根据第 8 节报告的实验评估结果并未出现这种情况。我们认为减少误报发生的一种方法可能是使用局部投票机制，仅考虑候选图像的一组邻居，因为更多的信息有助于改善响应，但代价是计算量增加。因此，我们保持方法尽可能简单，但足以在定位任务中实现良好性能。

6. Visual path following
6. 视觉路径跟踪

A visual path is composed by successive key images from the VM, i.e.

I^{*} = {I_{1}^{*}, \dots, I_{k}^{*}, \dots, I_{n}^{*}}

where

k

denotes the

k

th key image in

I^{*}

. Thus, the path following problem consists of generating the CoM reference velocity to the WPG that regulates the error between the current and the corresponding key images along the visual path. The input of the problem is then the sequence of key images extracted from the VM, and the output is the humanoid walking to track the visual path.
视觉路径由来自视觉记忆（VM）的连续关键图像组成，即

I^{*} = {I_{1}^{*}, \dots, I_{k}^{*}, \dots, I_{n}^{*}}

，其中

k

表示

I^{*}

中的第

k

个关键图像。因此，路径跟踪问题包括生成质心（CoM）参考速度给路径跟踪器（WPG），以调节当前图像与视觉路径上对应关键图像之间的误差。问题的输入是从 VM 中提取的关键图像序列，输出是类人机器人沿视觉路径行走。

6.1. The sequence of visual tasks
6.1. 视觉任务序列

Let us define a visual error as:
令视觉误差定义为：

e_{k} = s - s_{k}^{*} \in R^{6}

,
where

s

and

s_{k}^{*}

are the current and desired visual features, respectively. For a given

k

, the regulation problem can be treated as a visual servoing task. Here, we propose to solve each visual task by means of a position-based visual servoing (PBVS), however, the problem can be also solved using an image-based visual servo (IBVS) controller as demonstrated in [13]. The choice of a PBVS is due to maintain the same approach in the whole scheme, as the localization stage is based on the estimation of translation and rotation.
其中

s

和

s_{k}^{*}

分别为当前和期望的视觉特征。对于给定的

k

，调节问题可以视为视觉伺服任务。在此，我们提出通过基于位置的视觉伺服（PBVS）来解决每个视觉任务，然而，如文献[13]所示，该问题也可以使用基于图像的视觉伺服（IBVS）控制器来解决。选择 PBVS 的原因是为了在整个方案中保持一致的方法，因为定位阶段基于平移和旋转的估计。

The visual features for the PBVS are expressed as:
PBVS 的视觉特征表示为：

s = (t, θ u) \in R^{6}

,
where

t

is the translation up to scale between the reference frame

C

associated to the current camera pose and the

k

th key image frame

C_{k}^{*}

. The rotation between those reference frames is represented by

θ u

where the axis/angle parametrization is employed. We consider that the translation and rotation are expressed with respect to

C_{k}^{*}

.
其中

t

是参考坐标系

C

与第

k

个关键图像帧

C_{k}^{*}

之间的尺度平移。两个参考坐标系之间的旋转由

θ u

表示，采用轴角参数化。我们认为平移和旋转均相对于

C_{k}^{*}

表达。

Fig. 2. Fixed virtual plane. The green circles (marked with number) are selected and maintained as features to form the same virtual plane. Left. The first candidate image

I_{\pm 1}^{*}

. Center. The second candidate image

I_{\pm 2}^{*}

. Right. The current image

I

. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
图 2. 固定虚拟平面。绿色圆圈（标有数字）被选中并保持为特征，以形成相同的虚拟平面。左侧。第一个候选图像

I_{\pm 1}^{*}

。中间。第二个候选图像

I_{\pm 2}^{*}

。右侧。当前图像

I

。（关于本图例中色彩的解释，读者请参阅本文的网络版本。）

Therefore

s_{k}^{*} = 0

and

e_{k} = s

, which is obtained by homography decomposition. We impose an exponential behavior of the error dynamics as:
因此，

s_{k}^{*} = 0

和

e_{k} = s

是通过单应性分解获得的。我们对误差动力学施加指数行为，表达为：

{\dot{e}}_{k} = - λ e_{k}

with

λ > 0

{\dot{e}}_{k} = - λ e_{k}

与

λ > 0

。
Since the translational and rotational velocities are decoupled, the camera twist is computed as in [9]:
由于平移速度和旋转速度是解耦的，摄像机扭转按照文献[9]计算：

v_{c} = [\begin{matrix} v_{c} \\ ω_{c} \end{matrix}] = [\begin{matrix} - λ t \\ - λ θ u \end{matrix}] \in R^{6}

.
If the camera pose is accurately estimated, the control law (5) yields to an exponentially stable error dynamics (4).
如果相机位姿被准确估计，控制律（5）将产生指数稳定的误差动力学（4）。

The resulting camera twist (5) could be used as the input to the WPG according to Fig. 3, with the aim of driving the humanoid robot to the location associated to

I_{k}^{*}

I^{*}

. However, the abrupt change from

k

k + 1

key images causes discontinuous velocity signals. In addition, it is assumed that the humanoid robot walks on a flat surface which implies that three components of

v_{c}

are useless. Both issues are treated in the next sections.
根据图 3，得到的相机扭转（5）可以作为 WPG 的输入，目的是驱动人形机器人到达与

I^{*}

中

I_{k}^{*}

6.2. Smooth transitions between key images
6.2. 关键图像之间的平滑过渡

The abrupt increment in the error function in (3), when the next key image is given as new target, generates discontinuous camera twists [11,12]. Since the camera velocity signals are the input reference to guide the humanoid locomotion, it is important to cope with such an undesired behavior. To deal with that, we propose a controller that achieves smooth transitions when the next visual servoing task becomes active. The main ingredient is a time dependent function

h (t)

that varies from

h_{0}

to 1 within a predefined transition interval. These functions penalize large errors at the beginning of each visual servoing task. The proposed control law written in terms of the PBVS becomes:
当下一个关键图像作为新的目标给出时，误差函数(3)的突然增大会产生不连续的相机扭转[11,12]。由于相机速度信号是引导人形机器人运动的输入参考，处理这种不良行为显得尤为重要。为此，我们提出了一种控制器，在下一个视觉伺服任务激活时实现平滑过渡。其主要成分是一个时间相关函数

h (t)

，该函数在预定义的过渡区间内从

h_{0}

变化到 1。这些函数在每个视觉伺服任务开始时对较大误差进行惩罚。以基于位置的视觉伺服（PBVS）形式表达的所提控制律为：

v_{c} = [\begin{matrix} - λ h (t) t \\ - λ h (t) θ u \end{matrix}] \in R^{6}

.
An adequate profile for the transition function

h (t)

can be obtained by means of the following computation:
通过以下计算可以获得适合的过渡函数

h (t)

的曲线：

h (t) = {\begin{cases} h_{0} + \frac{1}{2} (1 - h_{0}) (1 - \cos (\frac{π (t - t_{0})}{t_{f} - t_{0}})), & t_{0} \leq t \leq t_{f}, \\ 1, & t > t_{f}, \end{cases}

where

t_{0}

and

t_{f}

are the initial and final times of the transition function, respectively, and

h_{0}

is a minimum value from which

h (t)

increases smoothly up to the unity. Therefore, after

t_{f}

, the maximum value of the control gain

λ

is applied. The minimum value of

h (t)

allows the robot to achieve a continuous motion, i.e., the robot does not stop at each target while the discontinuities in the velocities are significantly reduced. The duration of the transition function

t_{f} - t_{0}

has to be defined adequately to ensure that the maximum control gain is applied at least during some time at each visual task.
其中

t_{0}

和

t_{f}

分别为过渡函数的初始和结束时间，

h_{0}

为一个最小值，从该值开始

h (t)

平滑增加至 1。因此，在

t_{f}

之后，施加控制增益的最大值

λ

。

h (t)

的最小值使机器人能够实现连续运动，即机器人不会在每个目标处停下，同时速度的不连续性显著减少。过渡函数

t_{f} - t_{0}

的持续时间必须适当定义，以确保在每个视觉任务中至少有一段时间施加最大控制增益。

The underlying computation of (3) involves the point features

p_{j}

of the current image

I

that are matched with the points

p_{j}^{*}

of the corresponding key image

I_{k}^{*}

while the humanoid is walking. The matched points are used to compute the control law (6) from the homography decomposition. The switching to the next key image

k + 1

occurs when the mean squared error between the corresponding point features

ε

remains below a threshold

T_{ε}

over a finite number of iterations:
公式(3)的基础计算涉及当前图像

I

中的点特征

p_{j}

，这些点特征与对应关键图像

I_{k}^{*}

中的点

p_{j}^{*}

匹配，同时机器人处于行走状态。匹配点用于通过单应性分解计算控制律(6)。当对应点特征

ε

之间的均方误差在有限次数迭代中保持低于阈值

T_{ε}

时，切换到下一个关键图像

k + 1

：

ε_{k} = \frac{1}{l} \sum_{j = 1}^{l} ‖ p_{j} - p_{j}^{*} ‖ < T_{ε}

,
where

l

is the number of corresponding point features for the

k

th visual task. The same steps are repeated for successive key images in

I^{*}

until the final target image

I_{n}^{*}

is reached.
其中

l

是第

k

个视觉任务对应的点特征数量。相同的步骤在

I^{*}

中的连续关键图像上重复，直到达到最终目标图像

I_{n}^{*}

。

Finally, to generate the locomotion from the visual control, we use a WPG that considers automatic footstep placements [17]. In particular, the WPG solves a linear model predictive control problem, and its input is the CoM reference velocity

{\dot{x}}_{r}

. The output is used to generate the motion coordination and the gait by means of an inverse kinematics method [41]. To obtain the reference velocity

{\dot{x}}_{r}

, the camera twist is expressed in the robot’s CoM reference frame as described in [14].
最后，为了从视觉控制生成运动，我们使用考虑自动步态位置的 WPG [17]。具体来说，WPG 解决一个线性模型预测控制问题，其输入是质心参考速度

{\dot{x}}_{r}

。输出用于通过逆运动学方法[41]生成运动协调和步态。为了获得参考速度

{\dot{x}}_{r}

，摄像机的扭转在机器人的质心参考系中表达，如文献[14]所述。

7. Obstacle avoidance 7. 障碍物避让

The proposed strategy for avoiding unexpected static obstacles is based on the same task function approach that we have applied to define the sequence of visual tasks. Most of the time, the visual task is the only active task to guide the humanoid walking unless an obstacle is detected. If this occurs, an obstacle avoidance task is smoothly activated as the primary task while the hierarchy of the visual task decreases to become secondary by projecting it onto the null-space of the primary task Jacobian. Depending on the size (width) of the obstacle measured by means of a range sensor mounted on the humanoid’s head, the controller decides which obstacle avoidance task should be executed. The coarse approximation of the width and height of detected obstacles are the criteria for the humanoid to circumvent or to step over the obstacle. Actually, we assume that all the paths containing an obstacle are feasible while obstacles are avoided by using one of the two options. The peripheral avoidance deforms the trajectory of the CoM without affecting the visual task. On the other hand, if the robot decides to step over the obstacle then the controller smoothly drives the CoM velocity to zero to reconfigure the humanoid’s posture for stepping over the obstacle. To perform such complex motion, several motion tasks become active. The set of active motion tasks are solved by the same hierarchical inverse kinematics method employed to generate the motion coordination for walking. It is important to note that during the obstacle avoidance the visual reference must remain in sight. Once the obstacle avoidance is accomplished, the visual task is smoothly switched to
所提出的避免意外静态障碍物的策略基于我们用于定义视觉任务序列的相同任务函数方法。大多数情况下，视觉任务是引导人形机器人行走的唯一活动任务，除非检测到障碍物。如果发生这种情况，避障任务将被平滑地激活为主任务，而视觉任务的层级则降低，成为次要任务，通过将其投影到主任务雅可比矩阵的零空间中实现。根据安装在人形机器人头部的测距传感器测量的障碍物尺寸（宽度），控制器决定应执行哪种避障任务。检测到的障碍物的宽度和高度的粗略估计是人形机器人绕行或跨越障碍物的判定标准。实际上，我们假设所有包含障碍物的路径都是可行的，同时通过使用两种选项之一来避开障碍物。外围避障会改变质心（CoM）的轨迹，而不影响视觉任务。另一方面，如果机器人决定跨越障碍物，则控制器会平滑地将质心速度驱动至零，以重新配置人形机器人的姿态以跨越障碍物。为了执行如此复杂的运动，多个运动任务会被激活。激活的运动任务集合通过同样的分层逆运动学方法求解，该方法用于生成行走的运动协调。需要注意的是，在避障过程中，视觉参考必须保持在视野内。一旦避障完成，视觉任务会平滑切换为

Fig. 3. Visual control scheme for humanoids.
图 3. 人形机器人视觉控制方案。

Algorithm 3: pathFollower allows the humanoid robot to au-
tonomously follow a visual path.
    Input: Visual path \(\mathbf{I}^{*}=\left\{\mathcal{I}_{1}^{*}, \mathcal{I}_{2}^{*}, \ldots, \mathcal{I}_{n-1}^{*}, \mathcal{I}_{n}^{*}\right\}\), current image \(\mathcal{I}\)
    Output: Humanoid walking along the visual path
    \(k \leftarrow 1 \quad / /\) index of key images in the visual path
    \(N_{o} \leftarrow 0 \quad\) // count the number of obstacles
    while \(k \leq n\) do
        \(\mathcal{I}^{*} \leftarrow \mathbf{I}^{*}[k]\)
        matches \(=\operatorname{match}\left(\mathcal{I}^{*}, \mathcal{I}\right)\)
        H = computeHomography( matches )
        \(\mathbf{e}_{k}=\) visualTaskComputation \((\mathbf{H})\)
        if an obstacle is detected and \(e_{0} \leq 0\) then
            if the obstacle is new then
                \(N_{o}++\)
                \(\mathcal{E}_{o} \cup \operatorname{getEdgeFromVM}(\mathbf{G}, k)\)
            \(L_{o}=\) getObstacleSize \((\) )
            if \(L_{o}>A\) then
                inactivateWPG() // \(h(t)\) varies from 1 to 0
            else
                activateAvoidance ()\(/ / h_{t}(t)\) varies from 0 to 1
        else
            activateWPG()// \(h(t)\) varies from 0 to 1
            inactivateAvoidance ()\(/ / h_{t}(t)\) varies from 1 to 0
        \(\boldsymbol{v}_{c}=\) computeCameraTwist \(\left(\mathbf{e}_{k}\right) / /\) Eq. Eq. (15)
        if \(h(t)==0\) and \(e_{o} \leq 0\) then
            stepOverObstacle \(\left(x_{0}, \mathbf{e}_{k}\right.\), MotionTasks)
        ROBOTGAIT \(=\mathbf{W P G}\left(\boldsymbol{v}_{c}\right)\)
        \(\varepsilon_{k}=\) meanSquaredErrorComputation \(\left(\mathcal{I}^{*}, \mathcal{I}\right)\)
        \(\mathcal{I}=\) captureNewImage
        if \(\varepsilon_{k}<T_{\varepsilon}\) then
            \(k++\)
    stopHumanoid()
    return \(N_{o}, \mathcal{E}_{o}\)

be the primary task, and consequently it generates the necessary CoM reference velocity to reactivate the WPG. The summary of computations required to guide the humanoid robot through the visual path while avoiding obstacles is given in Algorithm 3.
主要任务，因此它会生成必要的质心参考速度以重新激活步态生成器（WPG）。引导人形机器人沿视觉路径行进并避开障碍物所需计算的总结见算法 3。

Note that the output of this algorithm corresponds to the number of detected obstacles

N_{o}

and the edges

E_{o}

where such obstacles were found. The obstacle width

L_{0}

measured by the humanoid’s range sensor is compared to a given value

A

for deciding which obstacle avoidance behavior should be executed. Thin obstacles are surrounded while obstacles wider than

A

are passed over. In the first case, the distance from the robot to the obstacle

d_{o}

and the difference

e_{o}

d_{o}

with respect to a security distance are used in the algorithm. More details of the variables and functions of the algorithm are explained in the next subsections.
请注意，该算法的输出对应于检测到的障碍物数量

N_{o}

以及发现这些障碍物的边缘

E_{o}

。由人形机器人测距传感器测量的障碍物宽度

L_{0}

与给定值

A

进行比较，以决定应执行哪种避障行为。较窄的障碍物将被绕行，而宽度大于

A

的障碍物则从上方越过。在第一种情况下，机器人到障碍物的距离

d_{o}

以及与安全距离的差值

e_{o}

（即

d_{o}

）被用于算法中。算法中变量和函数的更多细节将在后续小节中详细说明。

If an obstacle completely obstructs the path, an additional strategy would be needed. For instance, the humanoid could use its
如果障碍物完全阻断路径，则需要额外的策略。例如，人形机器人可以利用其
visual memory (VM) for replanning a new path from reachable neighbor nodes with respect to its current location. Thus, the connectivity of the graph representing the VM and its number of branches are very important. In this work, however, we have assumed assume that obstacles can be avoided.
视觉记忆（VM）从其当前位置的可达邻近节点重新规划新路径。因此，表示视觉记忆的图的连通性及其分支数量非常重要。然而，在本工作中，我们假设障碍物是可以避开的。

7.1. Obstacle localization and graph updating
7.1. 障碍物定位与图更新

The localization of an unexpected static obstacle in the VM is important for subsequent navigation tasks. The strategy consists of updating the cost to go from

I_{i - 1}

I_{i}

where the obstacle is detected. Therefore, the cost encoded in the edge

E_{i}

that links both images increases as:
在视觉记忆（VM）中定位意外静态障碍物对于后续的导航任务非常重要。该策略包括更新从

I_{i - 1}

到

I_{i}

的路径代价，其中检测到障碍物。因此，连接这两幅图像的边

E_{i}

上的代价按以下方式增加：

E_{i} = W E_{i}

,
where

W

is a high value of weight assigned to the edge. In the case that the humanoid does not detect an obstacle that was previously labeled at

E_{i}

, the initial cost is recovered. Clearly, the new cost affects the paths that contain

E_{i}

. By means of the graph updating mechanism, next queries in the VM will return less costly visual paths, i.e. as straight as possible and without obstacles if possible. This simple strategy is a way to take into account previous experiences of the robotics system to exploit them in future navigation tasks [42].
其中

W

是分配给该边的较高权重值。如果机器人未检测到先前在

E_{i}

标记的障碍物，则恢复初始代价。显然，新的代价会影响包含

E_{i}

的路径。通过图的更新机制，VM 中的后续查询将返回代价较低的视觉路径，即尽可能直且无障碍物的路径（如果可能）。这一简单策略是机器人系统利用先前经验以优化未来导航任务的一种方式[42]。

7.2. Peripheral avoidance
7.2. 周边避障

The task function to circumvent an obstacle is defined as:
绕过障碍物的任务函数定义为：

e_{o} = d_{o} - d_{s} \in R

,
where 其中

d_{o} = ‖ x_{r} - x_{o} ‖

,
with

d_{s}

being the security distance around the obstacle. The vectors

x_{r} = {[x_{r} y_{r}]}^{T} \in R^{2}

and

x_{o} = {[x_{o} y_{o}]}^{T} \in R^{2}

correspond to the relative robot and obstacle positions with respect to the current frame

C

. By differentiating the task error (9) with respect to the vector

x_{r} - x_{o}

, we obtain its gradient:
其中

d_{s}

为障碍物周围的安全距离。向量

x_{r} = {[x_{r} y_{r}]}^{T} \in R^{2}

和

x_{o} = {[x_{o} y_{o}]}^{T} \in R^{2}

对应于相对于当前坐标系

C

的机器人和障碍物的相对位置。通过对任务误差(9)关于向量

x_{r} - x_{o}

求导，我们得到其梯度：

g_{o} = {(\frac{x_{r} - x_{o}}{d_{o}})}^{T} \in R^{1 \times 2}

.
Then, by defining a common reference frame to express the obstacle avoidance and the visual tasks, e.g. the corresponding key image frame

C_{k}^{*}

, we obtain:
然后，通过定义一个公共参考坐标系来表达避障任务和视觉任务，例如对应的关键图像坐标系

C_{k}^{*}

，我们得到：

{\dot{e}}_{o} = J_{o} v_{c}

,
where

J_{o}

represents the obstacle avoidance Jacobian:
其中

J_{o}

表示避障雅可比矩阵：

J_{o} = (\begin{array}{ll} g_{o} & 0_{1 \times 4} \end{array}) \in R^{1 \times 6}

and its corresponding null-space projector is given by:
及其对应的零空间投影算子为：

Q_{o} = I - J_{o}^{T} J_{o} \in R^{6 \times 6}

By activating the task (12) at the first hierarchical level, an abrupt change in the reference velocity of the robot’s CoM occurs (see [43] and [7]). This is a well-known problem in the visual control community where some solutions have been suggested, e.g. in [44] and [15]. In particular, we adopted the intermediate value strategy introduced in [45]. By handling smooth task transitions, the camera twist is computed as:
通过在第一级层次激活任务（12），机器人质心的参考速度会发生突变（参见[43]和[7]）。这是视觉控制领域的一个众所周知的问题，已有一些解决方案被提出，例如[44]和[15]。特别地，我们采用了[45]中引入的中间值策略。通过处理平滑的任务过渡，摄像机扭转速度计算为：

\begin{aligned} v_{c} & = h (t) (v_{o}^{'} + v_{v ∣ 0}^{'}), \\ v_{o}^{'} & = J_{o}^{T} {\dot{e}}_{o}^{'} \\ v_{v ∣ 0}^{'} & = Q_{o} (- λ_{v} s - v_{o}^{'}), \end{aligned}

where 其中

{\dot{e}}_{o}^{'} = - λ_{o} h_{t} (t) e_{o} - (1 - h_{t} (t)) J_{o} λ_{v} s

encodes the intermediate desired values. The twist to circumvent the obstacle is

v_{o}^{'}

, and

v_{v ∣ 0}^{'}

is the resulting twist for achieving the visual task without perturbing the execution of the obstacle avoidance task. The gains

λ_{o}

and

λ_{v}

are related to the obstacle and visual tasks, respectively. The smooth transition function

h_{t} (t)

varies within the interval

0 \leq h_{t} (t) \leq 1

. Thus,

h_{t} (t) = 0

implies that

e_{0} > 0

, and the obstacle avoidance task is not active. Otherwise,

e_{0} \leq 0

, and the function

h_{t} (t)

smoothly increases up to its maximum value while the obstacle avoidance task becomes active. The stability proof of the control law (15) is reported in [7].
编码了中间期望值。绕障扭转速度为

v_{o}^{'}

，

v_{v ∣ 0}^{'}

是实现视觉任务且不干扰避障任务执行的结果扭转速度。增益

λ_{o}

和

λ_{v}

分别与避障任务和视觉任务相关。平滑过渡函数

h_{t} (t)

在区间

0 \leq h_{t} (t) \leq 1

内变化。因此，

h_{t} (t) = 0

意味着

e_{0} > 0

，避障任务不激活。否则，

e_{0} \leq 0

，函数

h_{t} (t)

平滑增加至最大值，同时避障任务激活。控制律（15）的稳定性证明见[7]。

7.3. Step over the obstacle
7.3. 跨越障碍物

This obstacle avoidance behavior is achieved by solving several hierarchical quadratic programs (HQP) at kinematic level. Each HQP is formulated as in [46]:
该避障行为通过在运动学层面求解多个分层二次规划（HQP）实现。每个 HQP 的形式如文献[46]所示：

\begin{array}{ll} min_{{\dot{q}}_{i}, w_{i}} & \frac{1}{2} {‖ w_{i} ‖}^{2} \\ s.t. & b_{i}^{l} \leq J_{i} {\dot{q}}_{i} - w_{i} \leq b_{i}^{u} \\ {\overset{―}{b}}_{i}^{l} \leq {\overset{―}{J}}_{i - 1} {\dot{q}}_{i} \leq {\overset{―}{b}}_{i}^{u} \end{array}

where

\dot{q} \in R^{n}

is the humanoid’s joint velocity vector,

w \in R^{m}

is a vector of slack variables used to relax the infeasible constraints at hierarchical level

i, J_{i}

is the task Jacobian,

b_{i}^{l}

and

b_{i}^{u}

represent the corresponding lower and upper bounds, respectively. Note that the solution of higher hierarchical tasks is maintained by adding them in:
其中，

\dot{q} \in R^{n}

是人形机器人关节速度向量，

w \in R^{m}

是用于放宽不可行约束的松弛变量向量，

i, J_{i}

是任务雅可比矩阵，

b_{i}^{l}

和

b_{i}^{u}

分别表示对应的下界和上界。注意，通过将更高层级任务加入以下式子，保持其解的稳定性：

{\overset{―}{J}}_{i} = (\binom{{\overset{―}{J}}_{i - 1}}{J_{i}}), {\overset{―}{b}}_{i} (q) = (\binom{{\overset{―}{b}}_{i - 1}}{J_{i} {\dot{q}}_{i}^{*} + w_{i}^{*}})

where

{\dot{q}}_{i}^{*}

is the optimal solution at

i

. The active inequalities at

i

are also added to (18). It is important to mention that a dedicated HQP can be employed to dramatically reduce the computational cost [41].
其中，

{\dot{q}}_{i}^{*}

是在

i

层的最优解。位于

i

的活动不等式也被加入到式（18）中。值得一提的是，可以采用专门的 HQP 方法显著降低计算成本[41]。

The step over behavior corresponds to a quasi-static motion since the projection of the humanoid’s CoM on the floor is constrained to move inside the support polygon defined by the boundary of the contact area. In other words, the center of pressure criterion is not considered for this task. The whole-body motion is composed by a sequence of several ordered arrays of hierarchical tasks. Each array contains the visual task at the lowest hierarchy.
跨步行为对应于准静态运动，因为人形机器人质心在地板上的投影被限制在由接触区域边界定义的支撑多边形内。换言之，该任务不考虑压力中心准则。全身运动由多个有序的层级任务数组组成。每个数组包含位于最低层级的视觉任务。

In Algorithm 3, MotionTasks is an input to the step over obstacle strategy. In particular, it refers to the ordered array of hierarchical tasks to maintain the contact of the humanoid feet on the floor, to generate the motion of the CoM inside the support polygon for either single and double support stages, and to track the stepping trajectory. In addition, several inequality constraints are also imposed for joint position and velocity limits as well as for avoiding self collisions (e.g., the humanoid hands are constrained to move within a safe region of the task space). Each motion task is expressed as a linear differential system to be solved by means of (17).
在算法 3 中，MotionTasks 是跨越障碍策略的输入。具体而言，它指的是有序的层级任务数组，用于保持人形机器人双脚与地面的接触，生成质心在单支撑和双支撑阶段内支撑多边形内的运动，并跟踪跨步轨迹。此外，还施加了多个不等式约束，以限制关节位置和速度，并避免自我碰撞（例如，人形机器人的手被限制在任务空间的安全区域内移动）。每个运动任务都表示为线性微分系统，通过式（17）求解。

8. Experimental evaluation
8. 实验评估

We implemented the proposed navigation scheme in a NAO humanoid robot. The top camera of the NAO’s head was used as main sensor in the experiments and an RGB-D camera was mounted on the robot to detect obstacles. We divided the experimental evaluation in two parts, one related to the proposed appearancebased localization and the other related to the integration of the whole humanoid navigation scheme, i.e., localization and planning, path following with obstacle avoidance and graph updating.
我们在 NAO 人形机器人上实现了所提出的导航方案。实验中使用 NAO 头部的顶置摄像头作为主要传感器，并在机器人上安装了一台 RGB-D 摄像头用于检测障碍物。我们将实验评估分为两部分，一部分与所提出的基于外观的定位相关，另一部分则涉及整个人形机器人导航方案的集成，即定位与规划、路径跟踪与障碍物避让以及图更新。

For all the experimental evaluations the images captured by the humanoid were obtained with a resolution of

640 \times 480

pixels. The image features were acquired as follows: first, a corner detector based on [47] was used, which is implemented in the function goodFeaturesToTrack of the OpenCV library. Then, we assigned a SURF descriptor [48] to each detected point. To estimate the homography model, a robust matcher based on RANSAC matched all the points between the current image and the corresponding key image. This matching procedure was used in the localization and path following stages to initialize the points to be tracked. The HLM function of ViSP library [49] was used to compute the homography for planar and non-planar scenes [32]. All the poses of the robot during the experiments were captured with an Optitrack System to obtain ground truth of the navigation scheme.
在所有实验评估中，类人机器人的图像采集分辨率为

640 \times 480

像素。图像特征的获取过程如下：首先，使用基于文献[47]的角点检测器，该检测器在 OpenCV 库的 goodFeaturesToTrack 函数中实现。然后，为每个检测到的点分配一个 SURF 描述符[48]。为了估计单应性模型，基于 RANSAC 的鲁棒匹配器将当前图像与对应关键图像之间的所有点进行匹配。该匹配过程用于定位和路径跟踪阶段，以初始化待跟踪的点。ViSP 库[49]的 HLM 函数被用来计算平面和非平面场景的单应性[32]。实验过程中机器人的所有位姿均通过 Optitrack 系统捕获，以获得导航方案的真实值。

8.1. Localization 8.1. 定位

In this section, we present an evaluation of the localization method detailed in Algorithm 2.
本节将展示算法 2 中详细描述的定位方法的评估。

8.1.1. Boundary of the initialization around a key image
8.1.1. 关键图像周围初始化的边界

First, we made an evaluation of the boundaries around a key image where the robot can be initialized, i.e., the region where the localization algorithm works effectively. For this evaluation, we proposed two experimental settings for near and far scenes, respectively. To obtain the working boundaries, we varied the robot placement (position and orientation) with respect to a key image as shown in Fig. 4 to have what we call evaluation poses.
首先，我们对机器人可以初始化的关键图像周围的边界进行了评估，即定位算法能够有效工作的区域。为此评估，我们分别提出了针对近景和远景的两种实验设置。为了获得工作边界，我们改变了机器人相对于关键图像的位置和朝向，如图 4 所示，形成了我们称之为评估姿态的状态。

We defined two measurements for estimating the working boundary. One is the number of points matched between the key image and the images taken at the evaluation poses. The second measurement is the dispersion or distribution of the matched points in the image plane; the greater the dispersion, the greater the probability of good localization and navigation. To calculate a degree of dispersion, we build a symmetric matrix

U

that contains the set of matched points, previously centered around their mean values

(\bar{u}, \bar{v})

. The eigenvalues of the symmetric matrix

U^{T} U

(calculated through its determinant) span a quadratic form whose area with respect to the total number of pixels can be related to the dispersion

d

of the set as follows:
我们定义了两个度量来估计工作边界。第一个是关键图像与在评估姿态下拍摄的图像之间匹配点的数量。第二个度量是匹配点在图像平面上的离散度或分布；离散度越大，良好定位和导航的概率越高。为了计算离散度，我们构建了一个对称矩阵

U

，该矩阵包含了先前以其均值

(\bar{u}, \bar{v})

为中心的匹配点集合。通过计算其行列式得到的对称矩阵

U^{T} U

的特征值，形成了一个二次型，其面积相对于像素总数可以与集合的离散度

d

8.1.2. Localization in one branch
8.1.2. 单分支定位

First, we present the evaluation of the localization Algorithm 2 for the case when there is only one branch or visual path. Thus, the localization process is performed by the lines 19-23 in Algorithm 2 , which selects the closest key image

I_{1}^{*}

to the current image

I

.
首先，我们展示了定位算法 2 在仅有一个分支或视觉路径情况下的评估。因此，定位过程由算法 2 中第 19 至 23 行执行，该过程选择与当前图像

I

最接近的关键图像

I_{1}^{*}

。

We applied Algorithm 2 to each path shown in Fig. 5, but performing exhaustive comparisons in the localization process (line 1 of Algorithm 2 is not used). The results are shown in Table 1. We classified the results in two categories, the case when the closest key image was found ahead the robot, denoted as “Forward” and the case when the closest key image was found “Backward” the robot. We observed that the localization algorithm always found a solution. In particular, the resulting images were forward in more than

80 %

of the trials. Clearly, it is preferred to perform a forward navigation from the beginning of the autonomous motion. However, in

16 %

of the cases the key images were found behind the robot.
我们将算法 2 应用于图 5 中显示的每条路径，但在定位过程中进行了穷尽比较（未使用算法 2 第 1 行）。结果如表 1 所示。我们将结果分为两类：最近关键图像位于机器人前方的情况，记为“前方”，以及最近关键图像位于机器人“后方”的情况。我们观察到定位算法总能找到解决方案。具体而言，结果图像在超过

80 %

的试验中位于前方。显然，从自主运动开始就执行前向导航是更优的选择。然而，在

16 %

的情况下，关键图像位于机器人后方。

8.1.3. Localization aided by planning
8.1.3. 规划辅助定位

In these experiments, the localization was carried out with the graph depicted in Fig. 6. In this case, the two candidate images (

I_{\pm 1}^{*}, I_{\pm 2}^{*}

) referred in line 18 of Algorithm 2 might be in the same branch or in different branches of the graph. The VM contains 89 key images (see Fig. 6). Some branches of the VM are connected by “Intersection” nodes. For the planning algorithm, we decided to use the Dijkstra’s algorithm in getShortestPath to obtain the minimum length path.
在这些实验中，定位是使用图 6 所示的图进行的。在这种情况下，算法 2 第 18 行中提到的两个候选图像（

I_{\pm 1}^{*}, I_{\pm 2}^{*}

）可能位于图的同一分支或不同分支。视觉记忆（VM）包含 89 个关键图像（见图 6）。VM 的某些分支通过“交叉点”节点连接。对于路径规划算法，我们决定在 getShortestPath 中使用 Dijkstra 算法以获得最短路径。

The localization requires the homography matrix decomposition to obtain the scaled translation vector

t

. An efficient algorithm to decompose

H

is suggested in [50]. Such kind of decomposition generates two geometrically valid solutions where only one of them is physically admissible. The correct solution can be selected by taking the normal vector whose third value (

n_{z}

) is the largest. Although the vector

t

is scaled, it gives the direction and a notion of distance of the key images with respect to the current image if the virtual plane is fixed as described in Section 5.
定位需要通过单应矩阵分解来获得缩放后的平移向量

t

。[50]中提出了一种高效的

H

分解算法。这种分解会产生两个几何上有效的解，其中只有一个在物理上是可接受的。可以通过选择第三个分量（

n_{z}

）最大的法向量来确定正确的解。尽管向量

t

是缩放的，但如果虚拟平面如第 5 节所述固定，它能提供关键图像相对于当前图像的方向和距离的概念。

In Fig. 7 we present three cases of localization and planning. The first case is shown in Fig. 7 (left) where the robot starts near one branch (red triangle). The two candidate key images

I_{\pm 1}^{*}

and

I_{\pm 2}^{*}

(orange circle and gray square) are in the same branch, and the localization algorithm selects the closest key image

I_{1}^{*}

(orange circle). Once the robot is localized, getShortestPath finds the minimum length path

I^{*}

to the target image

I_{n}^{*}

(magenta triangle at top). The second and third cases are shown in Fig. 7 (center and right). The robot starts between two branches of the graph (red triangle) such that the two candidate key images belong to different branches (orange circle and gray square). In these cases, the localization algorithm is aided by the path planner to select the most similar key image while taking into account the shortest path to the target image. As it can be seen in Fig. 7 (center), the shortest path to the target image (magenta triangle) corresponds to the key image on the right (orange circle), but if the target image changes, as it is observed in Fig. 7 (right), the algorithm selects the key image on the left (gray square). Hence, the localization avoids long paths for initializing the robot navigation.
在图 7 中，我们展示了三种定位与路径规划的案例。第一个案例如图 7（左）所示，机器人从一条分支附近开始（红色三角形）。两个候选关键图像

I_{\pm 1}^{*}

和

I_{\pm 2}^{*}

（橙色圆圈和灰色方块）位于同一分支，定位算法选择了最近的关键图像

I_{1}^{*}

（橙色圆圈）。一旦机器人定位完成，getShortestPath 函数找到到目标图像

I_{n}^{*}

（顶部的品红色三角形）的最短路径

I^{*}

。第二和第三个案例如图 7（中间和右侧）所示。机器人从图中两条分支之间开始（红色三角形），两个候选关键图像分别属于不同分支（橙色圆圈和灰色方块）。在这些情况下，定位算法借助路径规划器选择最相似的关键图像，同时考虑到到目标图像的最短路径。如图 7（中间）所示，到目标图像（品红色三角形）的最短路径对应右侧的关键图像（橙色圆圈），但如果目标图像发生变化，如图 7（右）所示，算法则选择左侧的关键图像（灰色方块）。因此，定位避免了初始化机器人导航时的长路径。

Fig. 5. Results of localization in one branch. We used three visual paths: a straight line, a S-like path and an elliptic path. The green triangles represent the robot’s poses where key images were taken. The blue triangles represent the evaluation poses where the localization found forward images. The red pentagons represent results of localization for backward images. All this information was captured with the Optitrack System. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
图 5. 单一路径中的定位结果。我们使用了三条视觉路径：直线路径、S 形路径和椭圆路径。绿色三角形表示拍摄关键图像时机器人的位姿。蓝色三角形表示定位找到前向图像的评估位姿。红色五边形表示定位找到后向图像的结果。所有这些信息均由 Optitrack 系统捕获。（关于图例中颜色的解释，读者可参见本文的网络版本。）

Fig. 6. Navigation graph and examples of key images. The green triangles represent the pose of the robot where each key image (node) was taken. The black pentagons are intersection nodes that connect the branches of the graph. Some images that form the VM are shown at the right.
图 6. 导航图及关键图像示例。绿色三角形表示拍摄每个关键图像（节点）时机器人的位姿。黑色五边形为连接图中各分支的交叉节点。右侧展示了构成视觉记忆的一些图像。

We evaluated the CPU time dedicated to solve the localization by comparing the divide-and-conquer strategy using line 1 of Algorithm 2 against the exhaustive search for the same graph of Fig. 6. This evaluation was done offline using a laptop with CPU Intel Core i7 of 2.20 GHz with 8.00 GB in RAM. The results are shown in Table 2 for 18 trials with different initialization images. Since the termination condition of the function findNodesNeighborhood depends on

μ

, we decided to evaluate the localization with 8 and
我们通过比较算法 2 第 1 行中使用的分而治之策略与图 6 中相同图的穷举搜索，评估了解决定位问题所需的 CPU 时间。该评估在一台配备 Intel Core i7 2.20 GHz CPU 和 8.00 GB 内存的笔记本电脑上离线完成。结果如表 2 所示，基于 18 次不同初始化图像的试验。由于函数 findNodesNeighborhood 的终止条件依赖于

μ

，我们决定分别以 8 个和

16 matches. By setting a higher value of

μ

, the localization retrieved better key images in terms of common visual information. Also, the computation time does not suffer an important increment compared to the exhaustive procedure.
16 个匹配进行定位评估。通过设置较高的

μ

值，定位在共同视觉信息方面检索到了更优的关键图像。此外，与穷举过程相比，计算时间并未显著增加。

We evaluated the performance of the visual path following scheme described in Algorithm 3. Notice that the input of this stage
我们评估了算法 3 中描述的视觉路径跟随方案的性能。请注意，该阶段的输入是

Fig. 7. Evaluation of the localization and planning. Left: Case when the two candidate images are in the same branch of the graph. Center and right: Cases when the two candidate images are in different branches.
图 7. 定位与路径规划的评估。左图：两个候选图像位于图的同一分支的情况。中图和右图：两个候选图像位于不同分支的情况。

Table 2 表 2

Method 方法	Compared nodes 比较节点	Time (s) 时间（秒）
Division $μ = 8$ 分区 $μ = 8$	22.6 (Average) 22.6（平均值）	1.94 (Average) 1.94（平均值）
Division $μ = 16$ 部门 $μ = 16$	26.6 (Average) 26.6（平均）	2.27 (Average) 2.27（平均）
Exhaustive 详尽的	89	7.3

is the sequence of key images of the planned visual path. For this experiment, we used the initialization procedure of image features described at the beginning of this section. During the locomotion, we used a tracking algorithm based on a sparse iterative version of the Lucas-Kanade optical flow in pyramids, implemented in the function calcOpticalFlowPyrLK of OpenCV. We evaluated experimentally the performance of the visual tracker with the NAO robot subject to the effect of the sway motion generated by the robot’s locomotion. This issue has been addressed in the literature by canceling the oscillations in the feedback error [5], or by filtering the measurements [18]. In our case, we do not focus in a way to cancel the sway motion, we just tried to mitigate its effect by ensuring a good tracking of point features in spite of this unavoidable robot motion. This was achieved by tuning the points tracker setting a number of 3 pyramids and an appropriate size window of 31 pixels.
是规划视觉路径的关键图像序列。对于本实验，我们使用了本节开头描述的图像特征初始化过程。在运动过程中，我们采用了基于稀疏迭代版本的 Lucas-Kanade 金字塔光流的跟踪算法，该算法由 OpenCV 中的 calcOpticalFlowPyrLK 函数实现。我们通过实验评估了视觉跟踪器在 NAO 机器人上受到机器人运动产生的摇摆运动影响下的性能。文献中对此问题的解决方法包括通过消除反馈误差中的振荡[5]或滤波测量值[18]。在我们的研究中，我们并未专注于消除摇摆运动，而是尝试通过确保点特征的良好跟踪来减轻其影响，尽管这种机器人运动是不可避免的。通过设置 3 层金字塔和 31 像素的合适窗口大小，调整点跟踪器参数，实现了这一目标。

Since the robot’s translation and rotation are decoupled, different control gains were used in the control law (15):

λ_{v} = 0.11

for frontal translation,

λ_{v} = 0.055

for lateral translation,

λ_{v} = 0.6

for rotation and

λ_{o} = 0.6

. We set the transition time interval for

h_{t} (t)

in (16) as

t_{t, f} - t_{t, 0} = 10 s

for the obstacle peripheral avoidance. The duration of the transition function

h (t)

was set experimentally to

t_{f} - t_{0} = 3.5 s

to mitigate the discontinuities of walking velocities when the robot switches to the next key image for computing the visual error. Also, we defined experimentally

h_{0} = 0.2

to keep a minimal walking velocity along the robot navigation. It is worth emphasizing that a small threshold

T_{ε}

for switching to the next key image was not always the best option to obtain a natural robot motion during navigation. Indeed, we have achieved a good behavior by fixing

T_{ε} = 18

pixels.
由于机器人的平移和旋转是解耦的，控制律（15）中使用了不同的控制增益：

λ_{v} = 0.11

用于前向平移，

λ_{v} = 0.055

用于侧向平移，

λ_{v} = 0.6

用于旋转，

λ_{o} = 0.6

。我们将（16）中

h_{t} (t)

的过渡时间间隔设置为

t_{t, f} - t_{t, 0} = 10 s

，用于障碍物周边避让。过渡函数

h (t)

的持续时间通过实验设定为

t_{f} - t_{0} = 3.5 s

，以减轻机器人切换到下一个关键图像计算视觉误差时步态速度的不连续性。此外，我们通过实验定义了

h_{0} = 0.2

，以保持机器人导航过程中的最小行走速度。值得强调的是，用于切换到下一个关键图像的小阈值

T_{ε}

并不总是获得自然机器人运动的最佳选择。实际上，我们通过固定

T_{ε} = 18

像素实现了良好的行为表现。

We show the behavior of the robot for the whole navigation scheme according to Algorithm 1. Fig. 8 shows the performance of the whole navigation scheme using the humanoid robot NAO. The
我们展示了机器人根据算法 1 进行整个导航方案的行为。图 8 展示了使用人形机器人 NAO 的整个导航方案的性能。
red pentagon depicts the initial pose of the robot, the light blue pentagon is the most similar key image that Algorithm 2 returned. The target key image is the gray triangle, and the orange triangle is the final pose of the robot during the experiment. The planned visual path contained 18 key images over a total distance of 4.7 m .
红色五边形表示机器人的初始姿态，浅蓝色五边形是算法 2 返回的最相似关键图像。目标关键图像为灰色三角形，橙色三角形是实验中机器人的最终姿态。规划的视觉路径包含 18 个关键图像，总距离为 4.7 米。

8.2.1. Closed-loop path 8.2.1. 闭环路径

The behavior of the robot for a navigation in a loop is shown in Fig. 9, i.e., the target key image

I_{n}^{*}

is the most similar key image

I_{1}^{*}

(both are shown as the light blue pentagon) given by the localization algorithm. Therefore, the robot starts (red pentagon) and ends (orange triangle) near the location of the target image. The blue triangles refer to the path of 38 images followed by the robot over a distance of approximately 9 m . It can be seen that the proposed scheme effectively guided the robot to follow a path in closed-loop.
图 9 展示了机器人在闭环导航中的行为，即目标关键图像

I_{n}^{*}

是定位算法给出的最相似关键图像

I_{1}^{*}

（两者均以浅蓝色五边形表示）。因此，机器人从起点（红色五边形）出发，最终（橙色三角形）停在目标图像附近。蓝色三角形表示机器人沿路径跟随的 38 个图像，路径长度约为 9 米。可见，所提方案有效地引导机器人沿闭环路径行进。

8.2.2. Obstacle avoidance and graph updating
8.2.2. 障碍物避让与图更新

To evaluate the obstacle avoidance and graph updating mechanisms, we proposed four experiments with four different paths for which static obstacles appeared during the execution of the visual path following stage.
为了评估避障和图更新机制，我们设计了四个实验，采用四条不同路径，在视觉路径跟随阶段执行过程中出现静态障碍物。

Fig. 10 shows the detection of two types of obstacles with distance filters on images obtained from the RGB-D sensor. To distinguish between thin (peripheral avoidance) and wide (stepping over) obstacles, we defined

A = 0.1 m

in line 13 of Algorithm 3. Since our aim was to show the performance of the robot navigation, the shape of the obstacles as well as the image processing method for object segmentation are rather simple.
图 10 展示了利用距离滤波器在 RGB-D 传感器获取的图像中检测两种类型障碍物的情况。为了区分细长（外围避让）和宽大（跨越）障碍物，我们在算法 3 的第 13 行定义了

A = 0.1 m

。由于我们的目的是展示机器人导航的性能，障碍物的形状以及用于物体分割的图像处理方法相对简单。

Fig. 11 shows the first experiment of navigation with obstacle avoidance, where the blue triangles refer to the planned path toward the target key image (gray triangle). In this case, the obstacle length was

L_{o} < A

, therefore the robot surrounded it. The black dotted circle represents the security distance

d_{s} = 0.13 m

that, if it is violated then the smooth transition control of Eq. (15) is activated using

h_{t} (t) \to 1

. The brown triangles indicate the nodes associated to the edge

E_{i}

(represented by the red arrow) where the robot located the obstacle during the experiment, i.e. the edge that will be updated with a large weight for the next navigation.
图 11 展示了带有障碍物避让的首次导航实验，其中蓝色三角形表示规划路径指向目标关键图像（灰色三角形）。在此情况下，障碍物长度为

L_{o} < A

，因此机器人绕行障碍物。黑色虚线圆圈表示安全距离

d_{s} = 0.13 m

，如果被突破，则激活式(15)中的平滑过渡控制，使用参数

h_{t} (t) \to 1

。棕色三角形表示与边

E_{i}

（由红色箭头表示）相关联的节点，机器人在实验中定位到障碍物的位置，即将在下一次导航中以较大权重更新的边。

Fig. 12 illustrates the second experiment performed with the updated graph. Although the target key image (gray triangle) was
图 12 展示了使用更新图进行的第二次实验。尽管目标关键图像（灰色三角形）位于...

Fig. 8. Performance of the autonomous robot navigation. The magenta line shows the motion of the robot during the navigation. As it can be seen, the robot gets close to the target key image (gray triangle) with an error

< 10 cm

. The images on the right from top to bottom are: the current image

I

at the initial pose (red pentagon), the closest key image

^{p} I_{1}^{*}

(light blue pentagon), the current image at the final pose (orange triangle) and the target key image

I^{*}

(gray triangle). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
图 8. 自主机器人导航性能。品红色线条显示了机器人在导航过程中的运动轨迹。如图所示，机器人以误差

< 10 cm

接近目标关键图像（灰色三角形）。右侧从上到下的图像依次为：初始姿态下的当前图像

I

（红色五边形）、最接近的关键图像

^{p} I_{1}^{*}

（浅蓝色五边形）、最终姿态下的当前图像（橙色三角形）以及目标关键图像

I^{*}

（灰色三角形）。（关于本图例中色彩的解释，读者请参阅本文的网络版本。）

Fig. 9. Closed-loop path. The magenta line shows the motion of the robot during the navigation. In this case, the target key image

I_{n}^{*}

and the most similar key image

I_{1}^{*}

are depicted by the same light blue pentagon. The images on the right from top to bottom are: the current image

I

at the initial pose (red pentagon), the target key image

I_{n}^{*}

(light blue pentagon) and the current image at the final pose (orange triangle). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
图 9. 闭环路径。品红色线条显示了机器人在导航过程中的运动轨迹。在此情况下，目标关键图像

I_{n}^{*}

与最相似的关键图像

I_{1}^{*}

由同一浅蓝色五边形表示。右侧从上到下的图像依次为：初始姿态下的当前图像

I

（红色五边形）、目标关键图像

I_{n}^{*}

（浅蓝色五边形）以及最终姿态下的当前图像（橙色三角形）。（关于本图例中色彩的解释，读者请参阅本文的网络版本。）

Fig. 10. Detection of the two types of obstacles (short and long) with the ASUS Xtion RGB-D sensor. Top: small thin obstacle (the robot is able to surround it). Bottom: wide obstacle (the robot is not able to surround it, but it is able to step over, for example a fence).
图 10. 使用 ASUS Xtion RGB-D 传感器检测两种类型的障碍物（短障碍物和长障碍物）。上图：小而细的障碍物（机器人能够绕过它）。下图：宽障碍物（机器人无法绕过，但能够跨越，例如栅栏）。

Fig. 11. First navigation with obstacles: peripheral avoidance. The magenta line shows the motion of the robot during navigation with peripheral avoidance. The dotted black circle represents the security distance

d_{s}

that activates the avoidance. The red arrow represents the edge

E_{i}

where the robot located the obstacle in memory. The images on the right show some snapshots of the robot during the peripheral avoidance. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
图 11. 带障碍物的首次导航：周边避障。品红色线条显示机器人在周边避障导航过程中的运动轨迹。黑色虚线圆圈表示激活避障的安全距离

d_{s}

。红色箭头表示机器人在记忆中定位障碍物的边缘

E_{i}

。右侧图像展示了机器人在周边避障过程中的若干快照。（关于本图例中色彩引用的解释，读者可参见本文的网络版本。）
the same as in the experiment of Fig. 11, the resulting path (blue triangles) was different. This is due to the higher cost encoded in the previously updated edge where the obstacle avoidance occurred (see Fig. 11). Concerning the second experiment, the obstacle avoidance condition

L_{0} > A

was true. Hence, the robot
与图 11 实验相同，得到的路径（蓝色三角形）有所不同。这是由于先前更新的边缘中编码了更高的代价，该边缘发生了避障（见图 11）。关于第二个实验，避障条件

L_{0} > A

为真。因此，机器人
stepped over the obstacle. The dotted black line in Fig. 12 represents

d_{s} = 0.13 m

, which was violated according to Algorithm 3 , and the stepOverObstacle function handled the robot motion execution. The red arrow and the brown triangles show the localization of the obstacle during the experiment. In this case, the
机器人跨越了障碍物。图 12 中的黑色虚线表示

d_{s} = 0.13 m

，根据算法 3，该限制被违反，stepOverObstacle 函数负责机器人运动的执行。红色箭头和棕色三角形显示了实验过程中障碍物的位置。在这种情况下，

Fig. 12. Second navigation with obstacles: stepping over avoidance. The magenta line shows the motion of the robot during navigation with stepping over avoidance. The solid and dotted black lines represent the position of the obstacle and the security distance

d_{s}

, respectively. The red arrow shows the edge

E_{i}

where the robot located the large obstacle during the experiment. The black arrow represents the weighted edge

W E_{i}

where the robot located the short obstacle in Fig. 11. The images at the bottom show some snapshots of the robot during the stepping over avoidance. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
图 12. 第二次带障碍物导航：跨越避障。品红色线条显示了机器人在带跨越避障的导航过程中的运动轨迹。实线和虚线黑线分别表示障碍物的位置和安全距离

d_{s}

。红色箭头表示机器人在实验中定位大型障碍物的边缘

E_{i}

。黑色箭头表示机器人在图 11 中定位短障碍物的加权边缘

W E_{i}

。底部的图像展示了机器人在跨越避障过程中的一些快照。（关于本图例中色彩的解释，读者请参阅本文的网络版本。）
corresponding edge was also updated for future navigation. As it can be observed in Fig. 12, the edge detection did not reflect the proper location of the obstacle because the localization process uses the topology of the graph, which is not spatial. However, according to our experiments, the localization of the obstacles has an error of no more than one neighboring edge, which is sufficient to detect and to avoid paths with obstacles. Thus, the information of the localized obstacles is used to update the graph and to avoid planning paths containing obstacles as much as possible in future navigations.
相应的边也被更新以供未来导航使用。如图 12 所示，边缘检测未能反映障碍物的准确位置，因为定位过程使用的是图的拓扑结构，而非空间结构。然而，根据我们的实验，障碍物的定位误差不超过一个相邻边，这足以检测并避开含有障碍物的路径。因此，已定位障碍物的信息被用来更新图，并尽可能避免在未来导航中规划包含障碍物的路径。

We have included a video as a multimedia extension of the paper, in which most of the experiments reported herein are shown during their execution. The video also shows experiments in cluttered and corridor-like environments, where the robot performs lateral motion, since in those cases, it is demanded by the VM. These experiments demonstrate that the proposed scheme allows to consider the holonomic nature of humanoid robots, providing a versatile navigation solution.
我们已将一段视频作为论文的多媒体扩展，其中展示了本文报告的大部分实验过程。视频还展示了在杂乱和走廊状环境中的实验，机器人执行侧向运动，因为在这些情况下，视觉记忆（VM）对此有需求。这些实验表明，所提出的方案能够考虑类人机器人的全向特性，提供了一种多功能的导航解决方案。

9. Conclusions 9. 结论

In this paper, we have proposed a vision-based navigation scheme for humanoid robots that relies on a topological representation of the environment known as visual memory. Such environment model is constructed in a human-guided teaching phase, in which a set of key images are captured and organized as a directed graph. We have shown that this graph is enough to qualitatively solve the robot localization given the current image from the robot’s camera. The proposed appearance-based localization method finds the most similar key image with respect to the current image in terms of common visual information with sufficient accuracy. Once the robot is localized and given a desired target key image associated to the desired robot location, the visual path planner returns the sequence of key images to reach the desired location. The experiments have shown that the humanoid is able to follow the planned visual path. The visual-servo controller,
本文提出了一种基于视觉的类人机器人导航方案，该方案依赖于称为视觉记忆的环境拓扑表示。该环境模型在人工引导的教学阶段构建，在此阶段捕获一组关键图像并将其组织成有向图。我们已证明，给定机器人摄像头的当前图像，该图足以定性地解决机器人定位问题。所提出的基于外观的定位方法通过共同视觉信息的相似度，以足够的精度找到与当前图像最相似的关键图像。一旦机器人定位完成，并给定与期望机器人位置相关联的目标关键图像，视觉路径规划器将返回达到目标位置的关键图像序列。实验表明，类人机器人能够沿规划的视觉路径行进。仅依赖二维信息的视觉伺服控制器，
that only relies on 2D information, drives the robot to a vicinity of the location associated to a target key image, being sufficient for a navigation task, rather than reaching the target location or following the path with high accuracy.
将机器人驱动至与目标关键图像相关联位置的附近，这对于导航任务来说已足够，而无需高精度地达到目标位置或严格跟踪路径。

It is clear that in static scenarios, the navigation based on a visual memory naturally generates obstacle free paths since obstacles are taken into account in the initial supervised navigation stage. However, it is important to deal with changes in the visual memory due to obstacles that were not initially present. Thus, we have introduced an obstacle avoidance task for stepping over and circumvent obstacles. We have shown that the humanoid robot is able to avoid obstacles while keeping the performance of the visual path following. This has been handled by means of the taskbased control framework, in which the transitions of successive and hierarchical tasks are performed smoothly to keep the balance of the humanoid robot during navigation. In addition, we have shown that the humanoid localization and visual path following can be completely based on 2D information by taking advantage of the homography for planar and non planar scenes. Moreover, the proposed navigation scheme can be extended to rely on other geometric constraints like the epipolar geometry or the trifocal tensor, since they can provide the estimated translation and rotation required by the controller. In that case, it might be useful to combine models. Although in our results the obstacle avoidance strategy relies on depth measurements of an RGB-D sensor, this can be tackled in the future by using only the monocular camera.
显然，在静态场景中，基于视觉记忆的导航自然会生成无障碍路径，因为障碍物已在初始的监督导航阶段被考虑。然而，处理由于初始不存在的障碍物引起的视觉记忆变化是非常重要的。因此，我们引入了一个避障任务，用于跨越和绕行障碍物。我们已经证明，机器人能够在保持视觉路径跟随性能的同时避开障碍物。这是通过基于任务的控制框架实现的，其中连续且分层任务的切换平滑进行，以保持机器人在导航过程中的平衡。此外，我们还展示了机器人定位和视觉路径跟随可以完全基于二维信息，利用平面和非平面场景的单应性实现。此外，所提出的导航方案可以扩展为依赖其他几何约束，如极线几何或三极张量，因为它们能够提供控制器所需的估计平移和旋转。在这种情况下，结合多种模型可能会很有用。尽管我们的结果中避障策略依赖于 RGB-D 传感器的深度测量，但未来可以仅使用单目摄像头来解决这一问题。

An experimental evaluation of the different stages of the navigation as well as its integration have been reported in this paper using the NAO platform. We have shown experimentally that our navigation scheme works effectively in different indoor environments like corridors, uncluttered or cluttered environments. We have taken advantage of the motion capabilities of humanoid robots, since no motion constraints are imposed by the navigation scheme, allowing the robot to walk in any direction as required by the visual path and the obstacle avoidance strategy.
本文利用 NAO 平台对导航的不同阶段及其集成进行了实验评估。我们通过实验表明，我们的导航方案在走廊、无杂物或有杂物的室内环境中均能有效工作。我们充分利用了人形机器人运动能力，因为导航方案没有施加运动约束，允许机器人根据视觉路径和避障策略的需要向任意方向行走。

As future work, we plan to extend the obstacle avoidance strategy to deal with moving obstacles and to deal with huge visual
作为未来工作，我们计划扩展避障策略以应对移动障碍物，并处理庞大的视觉
memories. Additionally, since there exist several visual SLAM techniques based on keyframes, one could try to combine the proposed navigation scheme with those methods in order to take advantages of both 2D and 3D worlds.
记忆。此外，由于存在多种基于关键帧的视觉 SLAM 技术，可以尝试将所提导航方案与这些方法结合，以便同时利用二维和三维世界的优势。

Appendix A. Supplementary data
附录 A. 补充数据

Supplementary material related to this article can be found online at https://doi.org/10.1016/j.robot.2018.08.010.
与本文相关的补充材料可在线查阅，网址：https://doi.org/10.1016/j.robot.2018.08.010。

References 参考文献

[1] A.H. Javadi, B. Emo, L. Howard, F. Zisch, Y. Yu, R. Knight, J.P. Silva, H.J. Spiers, Hippocampal and prefrontal processing of network topology to simulate the future, Nature Commun. (2017).
[1] A.H. Javadi, B. Emo, L. Howard, F. Zisch, Y. Yu, R. Knight, J.P. Silva, H.J. Spiers, 海马体和前额叶对网络拓扑的处理以模拟未来，Nature Commun. (2017)。
[2] Y. Matsumoto, M. Inaba, H. Inoue, Visual navigation using view-sequenced route representation, in: IEEE Int. Conf. on Robotics and Automation, 1996, pp. 83-88.
[2] Y. Matsumoto, M. Inaba, H. Inoue, 使用视图序列路径表示的视觉导航，载于：IEEE 国际机器人与自动化会议，1996 年，第 83-88 页。
[3] J. Courbon, Y. Mezouar, P. Martinet, Indoor navigation of a non-holonomic mobile robot using a visual memory, Auton. Robots 2008 (25) (2008) 253266.
[3] J. Courbon, Y. Mezouar, P. Martinet, 使用视觉记忆的非完整约束移动机器人室内导航，Auton. Robots 2008 (25) (2008) 253-266。
[4] J. Courbon, Y. Mezouar, P. Martinet, Autonomous navigation of vehicles from a visual memory using a generic camera model, IEEE Trans. Intell. Transp. Syst. 10 (3) (2009) 392-402.
[4] J. Courbon, Y. Mezouar, P. Martinet, 基于通用相机模型的车辆视觉记忆自主导航，IEEE Trans. Intell. Transp. Syst. 10 (3) (2009) 392-402。
[5] C. Dune, A. Herdt, O. Stasse, P.B. Wieber, K. Yokoi, E. Yoshida, Cancelling the sway motion of dynamic walking in visual servoing, in: IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2010, pp. 3175-3180.
[5] C. Dune, A. Herdt, O. Stasse, P.B. Wieber, K. Yokoi, E. Yoshida, 视觉伺服中动态行走的摆动运动消除，载于：IEEE/RSJ 国际智能机器人与系统会议，2010 年，第 3175-3180 页。
[6] M. García, O. Stasse, J.B. Hayet, C. Dune, C. Esteves, J.P. Laumond, Vision-guided motion primitives for humanoid reactive walking: decoupled versus coupled approaches, Int. J. Robot. Res. 34 (4-5) (2014) 402-419.
[6] M. García, O. Stasse, J.B. Hayet, C. Dune, C. Esteves, J.P. Laumond，基于视觉引导的类人反应性行走运动原语：解耦与耦合方法，国际机器人研究杂志，34 卷（4-5 期）（2014）402-419。
[7] J. Delfin, H.M. Becerra, G. Arechavaleta, Visual servo walking control for humanoids with finite-time convergence and smooth robot velocities, Internat. J. Control 89 (7) (2016) 1342-1358.
[7] J. Delfin, H.M. Becerra, G. Arechavaleta，具有有限时间收敛和平滑机器人速度的类人视觉伺服行走控制，国际控制杂志，89 卷（7 期）（2016）1342-1358。
[8] G. Lopez-Nicolas, N.R. Gans, S. Bhattacharya, C. Sagues, J.J. Guerrero, S. Hutchinson, Homography-based control scheme for mobile robots with nonholonomic and field-of-view constraints, IEEE Trans. Syst. Man Cybern. B 40 (4) (2010) 1115-1127.
[8] G. Lopez-Nicolas, N.R. Gans, S. Bhattacharya, C. Sagues, J.J. Guerrero, S. Hutchinson，基于单应性的移动机器人控制方案，适用于非完整约束和视场限制，IEEE 系统、人类与控制论学报 B 卷，40 卷（4 期）（2010）1115-1127。
[9] F. Chaumette, S. Hutchinson, Visual servo control. Part I: Basic approaches, IEEE Robot. Autom. Mag. 13 (4) (2006) 82-90.
[9] F. Chaumette, S. Hutchinson，视觉伺服控制。第一部分：基本方法，IEEE 机器人与自动化杂志，13 卷（4 期）（2006）82-90。
[10] J. Ido, Y. Shimizu, Y. Matsumoto, T. Ogasawara, Indoor navigation for a humanoid robot using a view sequence, Int. J. Robot. Res. 28 (2) (2009) 315-325.
[10] J. Ido, Y. Shimizu, Y. Matsumoto, T. Ogasawara，利用视图序列进行类人机器人室内导航，国际机器人研究杂志，28(2) (2009) 315-325。
[11] A. Diosi, S. Segvic, A. Remazeilles, F. Chaumette, Experimental evaluation of autonomous driving based on visual memory and image-based visual servoing, IEEE Trans. Intell. Transp. Syst. 12 (3) (2011) 870-833.
[11] A. Diosi, S. Segvic, A. Remazeilles, F. Chaumette，基于视觉记忆和基于图像的视觉伺服的自动驾驶实验评估，IEEE 智能交通系统汇刊，12(3) (2011) 870-833。
[12] H.M. Becerra, C. Sagues, Y. Mezouar, J.B. Hayet, Visual navigation of wheeled mobile robots using direct feedback of a geometric constraint, Auton. Robots 37 (2) (2014) 137-156.
[12] H.M. Becerra, C. Sagues, Y. Mezouar, J.B. Hayet，利用几何约束的直接反馈进行轮式移动机器人视觉导航，自主机器人，37(2) (2014) 137-156。
[13] J. Delfin, H.M. Becerra, G. Arechavaleta, Visual path following using a sequence of target images and smooth robot velocities for humanoid navigation, in: IEEE-RAS Int. Conf. on Humanoid Robots, 2014, pp. 354-359.
[13] J. Delfin, H.M. Becerra, G. Arechavaleta，利用目标图像序列和平滑机器人速度进行视觉路径跟踪以实现类人导航，载于：IEEE-RAS 国际类人机器人会议，2014，页 354-359。
[14] J. Delfin, H.M. Becerra, G. Arechavaleta, Humanoid localization and navigation using a visual memory, in: IEEE-RAS Int. Conf. on Humanoid Robots, 2016, pp. 75-80.
[14] J. Delfin, H.M. Becerra, G. Arechavaleta, 使用视觉记忆的人形机器人定位与导航，载于：IEEE-RAS 国际人形机器人会议，2016 年，第 75-80 页。
[15] A. Cherubini, F. Chaumette, Visual navigation of a mobile robot with laserbased collision avoidance, Int. J. Robot. Res. 32 (2) (2013) 189-205.
[15] A. Cherubini, F. Chaumette, 基于激光的碰撞避免移动机器人视觉导航，国际机器人研究杂志，32(2) (2013) 189-205。
[16] N. Mansard, O. Stasse, F. Chaumette, K. Yokoi, Visually-guided grasping while walking on a humanoid robot, in: IEEE Int. Conf. on Robotics and Automation, 2007, pp. 3041-3047.
[16] N. Mansard, O. Stasse, F. Chaumette, K. Yokoi, 人形机器人行走时的视觉引导抓取，载于：IEEE 机器人与自动化国际会议，2007 年，第 3041-3047 页。
[17] A. Herdt, H. Diedam, P.B. Wieber, D. Dimitrov, K. Mombaur, M. Diehl, Online walking motion generation with automatic footstep placement, Adv. Robot. 24 (5-6) (2010) 719-737.
[17] A. Herdt, H. Diedam, P.B. Wieber, D. Dimitrov, K. Mombaur, M. Diehl, 具有自动步态规划的在线行走运动生成，机器人进展，24(5-6) (2010) 719-737。
[18] G. Oriolo, A. Paolillo, L. Rosa, M. Vendittelli, Vision-based trajectory control for humanoid navigation, in: IEEE-RAS Int. Conf. on Humanoid Robots, 2013, pp. 118-123.
[18] G. Oriolo, A. Paolillo, L. Rosa, M. Vendittelli，基于视觉的类人导航轨迹控制，载于：IEEE-RAS 国际类人机器人会议，2013 年，第 118-123 页。
[19] A. Paolillo, A. Faragasso, G. Oriolo, M. Vendittelli, Vision-based maze navigation for humanoid robots, Auton. Robots 41 (2) (2017) 293-309.
[19] A. Paolillo, A. Faragasso, G. Oriolo, M. Vendittelli，基于视觉的类人机器人迷宫导航，《自主机器人》41(2) (2017) 293-309。
[20] J.H. Hayet, C. Esteves, G. Arechavaleta, O. Stasse, E. Yoshida, Humanoid locomotion planning for visually-guided tasks, Int. J. Humanoid Robots 9 (2) (2012) 26.
[20] J.H. Hayet, C. Esteves, G. Arechavaleta, O. Stasse, E. Yoshida，类人机器人视觉引导任务的运动规划，《国际类人机器人杂志》9(2) (2012) 26。
[21] G. Oriolo, A. Paolillo, L. Rosa, M. Vendittelli, Humanoid odometric localization integrating kinematics, inertial and visual information, Auton. Robots 40 (5) (2016) 867-879.
[21] G. Oriolo, A. Paolillo, L. Rosa, M. Vendittelli，融合运动学、惯性和视觉信息的类人机器人里程计定位，《自主机器人》40(5) (2016) 867-879。
[22] L. George, A. Mazel, Humanoid robot indoor navigation based on 2D bar codes: Application to the NAO robot, in: IEEE-RAS Int. Conf. on Humanoid Robots, 2013, pp. 329-335.
[22] L. George, A. Mazel，基于二维条形码的人形机器人室内导航：应用于 NAO 机器人，载于：IEEE-RAS 国际人形机器人会议，2013，页 329-335。
[23] R. Cupec, G. Schmidt, O. Lorch, Vision-guided walking in a structured indoor scenario, Automatika 46 (1-2) (2005) 49-57.
[23] R. Cupec, G. Schmidt, O. Lorch，结构化室内场景中的视觉引导行走，Automatika 46 (1-2) (2005) 49-57。
[24] D. Maier, C. Stachniss, M. Bennewitz, Vision-based humanoid navigation using self-supervised obstacle detection, Int. J. Hum. Robot. 10 (2) (2013) 1-28.
[24] D. Maier, C. Stachniss, M. Bennewitz，基于视觉的人形机器人导航，采用自监督障碍物检测，国际人形机器人杂志 10 (2) (2013) 1-28。
[25] O. Stasse, B. Verrelst, B. Vanderborght, K. Yokoi, Strategies for humanoid robots to dynamically walk over large obstacles, IEEE Trans. Robot. 25 (4) (2009) 960-967.
[25] O. Stasse, B. Verrelst, B. Vanderborght, K. Yokoi，人形机器人动态跨越大型障碍物的策略，IEEE 机器人学报 25 (4) (2009) 960-967。
[26] N. Perrin, O. Stasse, L. Baudouin, F. Lamiraux, E. Yoshida, Fast humanoid robot collision-free footstep planning using swept volume approximations, IEEE Trans. Robot. 28 (2) (2012) 427-439.
[26] N. Perrin, O. Stasse, L. Baudouin, F. Lamiraux, E. Yoshida, 基于扫掠体积近似的快速类人机器人无碰撞步态规划，IEEE 机器人学报，28(2) (2012) 427-439。
[27] M. Naveau, M. Kudruss, O. Stasse, C. Kirches, K. Mombaur, P. Souères, A reactive walking pattern generator based on nonlinear model predictive control, IEEE Robot. Autom. Lett. 2 (1) (2017) 10-17.
[27] M. Naveau, M. Kudruss, O. Stasse, C. Kirches, K. Mombaur, P. Souères, 基于非线性模型预测控制的反应式步态生成器，IEEE 机器人与自动化快报，2(1) (2017) 10-17。
[28] M. Ferro, A. Paolillo, A. Cherubini, M. Vendittelli, Omnidirectional humanoid navigation in cluttered environments based on optical flow information, in: IEEE-RAS Int. Conf. on Humanoid Robots, 2016, pp. 75-80.
[28] M. Ferro, A. Paolillo, A. Cherubini, M. Vendittelli, 基于光流信息的杂乱环境中全向类人机器人导航，载于：IEEE-RAS 国际类人机器人会议，2016，页 75-80。
[29] A. Rioux, W. Suleiman, Autonomous SLAM based humanoid navigation in a cluttered environment while transporting a heavy load, Robot. Auton. Syst. 99 (2018) 50-62.
[29] A. Rioux, W. Suleiman, 在搬运重载时基于自主 SLAM 的杂乱环境类人机器人导航，机器人自主系统，99 (2018) 50-62。
[30] J. Stalbaum, J.-B. Song, Keyframe and inlier selection for visual SLAM, in: Int. Conf. on Ubiquitous Robots and Ambient Intelligence, 2013, pp. 391-396.
[30] J. Stalbaum, J.-B. Song，视觉 SLAM 中的关键帧和内点选择，载于：第十届普适机器人与环境智能国际会议，2013 年，第 391-396 页。
[31] G. Guan, Z. Wang, S. Lu, J. Da-Deng, D. Dagan-Feng, Keypoint-based keyframe selection, IEEE Trans. Circuits Syst. Video Technol. 23 (4) (2013) 729-734.
[31] G. Guan, Z. Wang, S. Lu, J. Da-Deng, D. Dagan-Feng，基于关键点的关键帧选择，IEEE 电路系统视频技术汇刊，23(4) (2013) 729-734。
[32] E. Malis, F. Chaumette, S. Boudet,

21 / 2

Visual servoing with respect to unknown objects through a new estimation scheme of camera displacement, Int. J. Comput. Vis. 37 (1) (2000) 79-97.
[32] E. Malis, F. Chaumette, S. Boudet，针对未知物体的视觉伺服控制及一种新的相机位移估计方案，国际计算机视觉杂志，37(1) (2000) 79-97。
[33] R.I. Hartley, A. Zisserman, Multiple View Geometry in Computer Vision, second ed., Cambridge University Press, 2004.
[33] R.I. Hartley, A. Zisserman，计算机视觉中的多视图几何，第二版，剑桥大学出版社，2004 年。
[34] R. Mur-Artal, J.M.M. Montiel, J.D. Tardos, ORB-SLAM: a versatile and accurate monocular SLAM system, IEEE Trans. Robot. 31 (5) (2015) 1147-1163.
[34] R. Mur-Artal, J.M.M. Montiel, J.D. Tardos, ORB-SLAM：一种多功能且精确的单目 SLAM 系统，IEEE 机器人学报，31(5) (2015) 1147-1163。
[35] T.H. Cormen, Introduction to Algorithms, MIT press, 2009.
[35] T.H. Cormen，《算法导论》，麻省理工学院出版社，2009 年。
[36] I. Kostavelis, A. Gasteratos, Learning spatially semantic representations for cognitive robot navigation, Robot. Auton. Syst. 61 (12) (2013) 1460-1475.
[36] I. Kostavelis, A. Gasteratos，认知机器人导航的空间语义表示学习，机器人与自主系统，61(12) (2013) 1460-1475。
[37] D. Gálvez-López, J.D. Tardos, Bags of binary words for fast place recognition in image sequences, IEEE Trans. Robot. 28 (5) (2012) 1188-1197.
[37] D. Gálvez-López, J.D. Tardos，基于二进制词袋的图像序列快速地点识别，IEEE 机器人学报，28(5) (2012) 1188-1197。
[38] N.G. Aldana-Murillo, J.B. Hayet, H.M. Becerra, Comparison of local descriptors for humanoid robots localization using a visual bag of words approach, Intell. Autom. Soft Comput. (2017) 1-11.
[38] N.G. Aldana-Murillo, J.B. Hayet, H.M. Becerra，基于视觉词袋方法的人形机器人定位局部描述符比较，智能自动化与软计算，2017，1-11。
[39] A. Babenko, V. Lempitsky, The inverted multi-index, IEEE Trans. Pattern Anal. Mach. Intell. 37 (6) (2015) 1247-1260.
[39] A. Babenko, V. Lempitsky，倒置多重索引，IEEE 模式分析与机器智能汇刊，37(6)，2015，1247-1260。
[40] M. Muja, D.G. Lowe, Scalable nearest neighbor algorithms for high dimensional data, IEEE Trans. Pattern Anal. Mach. Intell. 36 (11) (2014) 2227-2240.
[40] M. Muja, D.G. Lowe，高维数据可扩展最近邻算法，IEEE 模式分析与机器智能汇刊，36(11)，2014，2227-2240。
[41] A. Escande, N. Mansard, P.B. Wieber, Hierarchical quadratic programming: Fast online humanoid-robot motion generation, Int. J. Robot. Res. 33 (7) (2014) 1006-1028.
[41] A. Escande, N. Mansard, P.B. Wieber，分层二次规划：快速在线人形机器人运动生成，国际机器人研究杂志，33(7)，2014，1006-1028。
[42] L. Nardi, C. Stachniss, User preferred behaviors for robot navigation exploiting previous experiences, Robot. Auton. Syst. 97 (2017) 204-216.
[42] L. Nardi, C. Stachniss, 利用先前经验的用户偏好行为进行机器人导航, 机器人与自主系统, 97 (2017) 204-216.
[43] F. Keith, P.B. Wieber, N. Mansard, A. Kheddar, Analysis of the discontinuities in prioritized task-space control under discrete task scheduling operations, in: IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2011, pp. 3887-3892.
[43] F. Keith, P.B. Wieber, N. Mansard, A. Kheddar, 离散任务调度操作下优先级任务空间控制中的不连续性分析, 载于：IEEE/RSJ 智能机器人与系统国际会议, 2011, 页 3887-3892.
[44] N. Mansard, A. Remazeilles, F. Chaumette, Continuity of varying-feature-set control laws, IEEE Trans. Automat. Control 54 (11) (2009) 2493-2505.
[44] N. Mansard, A. Remazeilles, F. Chaumette, 变化特征集控制律的连续性, IEEE 自动控制汇刊, 54 (11) (2009) 2493-2505.
[45] J. Lee, N. Mansard, J. Park, Intermediate desired value approach for task transition of robots in kinematic control, IEEE Trans. Robot. 28 (6) (2012) 1260-1277.
[45] J. Lee, N. Mansard, J. Park, 机器人运动学控制中任务过渡的中间期望值方法, IEEE 机器人汇刊, 28 (6) (2012) 1260-1277.
[46] O. Kanoun, F. Lamiraux, P.-B. Wieber, Kinematic control of redundant manipulators: Generalizing the task-priority framework to inequality task, IEEE Trans. Robot. 27 (4) (2011) 785-792.
[46] O. Kanoun, F. Lamiraux, P.-B. Wieber, 冗余机械臂的运动学控制：将任务优先框架推广到不等式任务，IEEE 机器人学报，27(4) (2011) 785-792。
[47] J. Shi, C. Tomasi, Good features to track, in: IEEE Conf. on Computer Vision and Pattern Recognition, 1994, pp. 593-600.
[47] J. Shi, C. Tomasi, 优良的跟踪特征，载于：IEEE 计算机视觉与模式识别会议，1994，页 593-600。
[48] H. Bay, T. Tuytelaars, L.V. Gool, SURF: Speeded up robust features, in: European Conf. on Computer Vision, 2006, pp. 404-417.
[48] H. Bay, T. Tuytelaars, L.V. Gool, SURF：加速稳健特征，载于：欧洲计算机视觉会议，2006，页 404-417。
[49] E. Marchand, F. Spindler, F. Chaumette, VISP for visual servoing: a generic software platform with a wide class of robot control skills, IEEE Robot. Autom. Mag. 12 (4) (2005) 40-52.
[49] E. Marchand, F. Spindler, F. Chaumette, VISP 用于视觉伺服：一个具有广泛机器人控制技能的通用软件平台，IEEE 机器人与自动化杂志，12(4) (2005) 40-52。
[50] B. Triggs, Autocalibration from planar scenes, in: H. Burkhardt, B. Neumann (Eds.), Computer Vision - ECCV’98, in: LNCS, vol. 1406, Springer Berlin Heidelberg, 1998, pp. 89-105.
[50] B. Triggs，基于平面场景的自动校准，载于：H. Burkhardt, B. Neumann（编），《计算机视觉 - ECCV’98》，LNCS，第 1406 卷，施普林格柏林海德堡出版社，1998 年，第 89-105 页。

Josafat Delfin obtained his B.Sc. in Mechatronics Engineering in 2010 from University of Guadalajara, Mexico. He obtained his M.Sc. in 2012 and his Ph.D. in 2017 in Robotics and Advanced Manufacturing, both from the Research Center for Advance Studies of the National Polytechnic Institute, Mexico (CINVESTAV-Saltillo). His research interests are related to the area of computer vision, visual navigation, humanoid robots, visual servoing and 3D reconstruction.
Josafat Delfin 于 2010 年获得墨西哥瓜达拉哈拉大学机电一体化工程学士学位。2012 年和 2017 年分别在墨西哥国立理工学院高级研究中心（CINVESTAV-Saltillo）获得机器人与先进制造专业的硕士和博士学位。他的研究兴趣涉及计算机视觉、视觉导航、人形机器人、视觉伺服和三维重建领域。

Héctor M. Becerra received a degree in Electronics Engineering from the Tecnológico Nacional de México, campus Ciudad Guzmán, a M.Sc. degree in Automatic Control from CINVESTAV-Guadalajara, Mexico, and a Ph.D. degree in Systems Engineering and Computer Science from the University of Zaragoza, Spain, in 2003, 2005 and 2011, respectively. He is currently a full researcher at Centro de Investigación en Matemáticas, CIMAT-Guanajuato, Mexico. His research interests include applications of automatic control to robotics, particularly the use of computer vision as main sensory modality.
Héctor M. Becerra 于 2003 年、2005 年和 2011 年分别获得墨西哥国立技术学院瓜斯曼城校区的电子工程学士学位、墨西哥瓜达拉哈拉 CINVESTAV 的自动控制硕士学位以及西班牙萨拉戈萨大学的系统工程与计算机科学博士学位。他目前是墨西哥瓜纳华托数学研究中心（CIMAT-Guanajuato）的正式研究员。其研究兴趣包括自动控制在机器人学中的应用，特别是将计算机视觉作为主要感知方式的研究。

Gustavo Arechavaleta received his M.S. degree from Tecnológico de Monterrey (ITESM), Campus Estado de Mexico, in 2003, and his Ph.D. degree from Institut National des Sciences Appliquées de Toulouse, France, for his work on the computational principles of human walking via optimal control within the GEPETTO Group, Laboratoire d’analyst et d’architecture des systémes, CNRS, Toulouse, France, in 2007. In Fall 2008 he joined the Robotics and Advanced Manufacturing Group at CINVESTAV, Saltillo, Mexico, where he is a Researcher. In Fall 2015 he spent a year in the Coordinated Science Laboratory at the University of Illinois at Urbana-Champaign as a visiting researcher. His research interests are mainly focused on robot trajectory optimization and vision-based robot navigation.
古斯塔沃·阿雷查瓦莱塔于 2003 年获得墨西哥蒙特雷理工学院（ITESM）墨西哥州校区的硕士学位，2007 年在法国图卢兹应用科学国家研究院（Institut National des Sciences Appliquées de Toulouse）获得博士学位，博士研究工作在 GEPETTO 小组、系统分析与架构实验室（Laboratoire d’analyst et d’architecture des systémes）、法国国家科学研究中心（CNRS）图卢兹分部完成，研究主题为通过最优控制的人类步行计算原理。2008 年秋季，他加入墨西哥萨尔蒂约 CINVESTAV 的机器人与先进制造小组，担任研究员。2015 年秋季，他作为访问研究员在伊利诺伊大学厄巴纳-香槟分校协调科学实验室工作了一年。他的研究兴趣主要集中在机器人轨迹优化和基于视觉的机器人导航。

The first two authors were supported in part by CONACYT, Mexico [grant 220796].
前两位作者部分工作得到了墨西哥国家科学技术委员会（CONACYT）[资助号 220796]的支持。
- Corresponding author. 通讯作者。
E-mail addresses: josafat.delfin@cinvestav.edu.mx (J. Delfin), hector.becerra@cimat.mx (H.M. Becerra), garechav@cinvestav.edu.mx (G. Arechavaleta).
电子邮件地址：josafat.delfin@cinvestav.edu.mx（J. Delfin），hector.becerra@cimat.mx（H.M. Becerra），garechav@cinvestav.edu.mx（G. Arechavaleta）。

Path 路径	Forward 前进	Backward 向后
Line 直线	18	1
"S" “S”	16	3
Ellipse 椭圆	19	6
Total (%) 总计（%）	$53 (84)$	$10 (16)$

Humanoid navigation using a visual memory with obstacle avoidance $^{a}$
利用视觉记忆进行具有人形避障功能的导航 $^{a}$

H I G H L I G H T S
重点

A R T I C L E I N F O
文章信息

Article history: 文章历史：

Keywords: 关键词：

Abstract 摘要

1. Introduction 1. 引言

3. Outline of the navigation strategy
3. 导航策略概述

4. Structure of the visual memory
4. 视觉记忆的结构

5. Visual localization and planning
5. 视觉定位与规划

6. Visual path following
6. 视觉路径跟踪

6.1. The sequence of visual tasks
6.1. 视觉任务序列

6.2. Smooth transitions between key images
6.2. 关键图像之间的平滑过渡

7. Obstacle avoidance 7. 障碍物避让

7.1. Obstacle localization and graph updating
7.1. 障碍物定位与图更新

7.2. Peripheral avoidance
7.2. 周边避障

7.3. Step over the obstacle
7.3. 跨越障碍物

8. Experimental evaluation
8. 实验评估

8.1. Localization 8.1. 定位

8.1.1. Boundary of the initialization around a key image
8.1.1. 关键图像周围初始化的边界

8.1.2. Localization in one branch
8.1.2. 单分支定位

8.1.3. Localization aided by planning
8.1.3. 规划辅助定位

8.2. Visual navigation with obstacle avoidance
8.2. 带障碍物避让的视觉导航

8.2.1. Closed-loop path 8.2.1. 闭环路径

8.2.2. Obstacle avoidance and graph updating
8.2.2. 障碍物避让与图更新

9. Conclusions 9. 结论

Appendix A. Supplementary data
附录 A. 补充数据

References 参考文献

Humanoid navigation using a visual memory with obstacle avoidance a a ^("a "){ }^{\text {a }}利用视觉记忆进行具有人形避障功能的导航 a a ^("a "){ }^{\text {a }}

H I G H L I G H T S重点

A R T I C L E I N F O文章信息

Article history: 文章历史：

Keywords: 关键词：

Abstract 摘要

1. Introduction 1. 引言

2. Related work 2. 相关工作

3. Outline of the navigation strategy3. 导航策略概述

4. Structure of the visual memory4. 视觉记忆的结构

5. Visual localization and planning5. 视觉定位与规划

6. Visual path following6. 视觉路径跟踪

6.1. The sequence of visual tasks6.1. 视觉任务序列

6.2. Smooth transitions between key images6.2. 关键图像之间的平滑过渡

7. Obstacle avoidance 7. 障碍物避让

7.1. Obstacle localization and graph updating7.1. 障碍物定位与图更新

7.2. Peripheral avoidance7.2. 周边避障

7.3. Step over the obstacle7.3. 跨越障碍物

8. Experimental evaluation8. 实验评估

8.1. Localization 8.1. 定位

8.1.1. Boundary of the initialization around a key image8.1.1. 关键图像周围初始化的边界

8.1.2. Localization in one branch8.1.2. 单分支定位

8.1.3. Localization aided by planning8.1.3. 规划辅助定位

8.2. Visual navigation with obstacle avoidance8.2. 带障碍物避让的视觉导航

8.2.1. Closed-loop path 8.2.1. 闭环路径

8.2.2. Obstacle avoidance and graph updating8.2.2. 障碍物避让与图更新

9. Conclusions 9. 结论

Appendix A. Supplementary data附录 A. 补充数据

References 参考文献

Humanoid navigation using a visual memory with obstacle avoidance $^{a}$
利用视觉记忆进行具有人形避障功能的导航 $^{a}$

H I G H L I G H T S
重点

A R T I C L E I N F O
文章信息

3. Outline of the navigation strategy
3. 导航策略概述

4. Structure of the visual memory
4. 视觉记忆的结构

5. Visual localization and planning
5. 视觉定位与规划

6. Visual path following
6. 视觉路径跟踪

6.1. The sequence of visual tasks
6.1. 视觉任务序列

6.2. Smooth transitions between key images
6.2. 关键图像之间的平滑过渡

7.1. Obstacle localization and graph updating
7.1. 障碍物定位与图更新

7.2. Peripheral avoidance
7.2. 周边避障

7.3. Step over the obstacle
7.3. 跨越障碍物

8. Experimental evaluation
8. 实验评估

8.1.1. Boundary of the initialization around a key image
8.1.1. 关键图像周围初始化的边界

8.1.2. Localization in one branch
8.1.2. 单分支定位

8.1.3. Localization aided by planning
8.1.3. 规划辅助定位

8.2. Visual navigation with obstacle avoidance
8.2. 带障碍物避让的视觉导航

8.2.2. Obstacle avoidance and graph updating
8.2.2. 障碍物避让与图更新

Appendix A. Supplementary data
附录 A. 补充数据