Deep Lagrangian Networks:
Using Physics as Model Prior for Deep Learning
深度拉格朗日网络：使用物理学作为深度学习的模型先验

Michael Lutter, Christian Ritter & Jan Peters
迈克尔·卢特、克里斯蒂安·里特和简·彼得斯
Department of Computer Science
Technische Universität Darmstadt
Hochschulstr. 10, 64289 Darmstadt, Germany
{Lutter, Peters}@ias.tu-darmstadt.de
达姆施塔特工业大学计算机科学系 Hochschulstr. 10, 64289 达姆施塔特, 德国 {Lutter, Peters}@ias.tu-darmstadt.de
Max Planck Institute for Intelligent Systems, Spemannstr. 41, 72076 Tübingen, Germany
马克斯普朗克智能系统研究所，Spemannstr. 41, 72076 Tübingen，德国

Abstract 抽象的

Deep learning has achieved astonishing results on many tasks with large amounts of data and generalization within the proximity of training data. For many important real-world applications, these requirements are unfeasible and additional prior knowledge on the task domain is required to overcome the resulting problems. In particular, learning physics models for model-based control requires robust extrapolation from fewer samples – often collected online in real-time – and model errors may lead to drastic damages of the system.
深度学习在许多需要大量数据的任务上取得了惊人的成果，并在接近训练数据的情况下实现了泛化。但对于许多重要的实际应用而言，这些要求难以实现，需要更多有关任务领域的先验知识来克服由此产生的问题。尤其是，学习用于基于模型控制的物理模型需要从较少的样本（通常是实时在线收集的）中进行稳健的外推，而模型误差可能会导致系统严重受损。

Directly incorporating physical insight has enabled us to obtain a novel deep model learning approach that extrapolates well while requiring fewer samples. As a first example, we propose Deep Lagrangian Networks (DeLaN) as a deep network structure upon which Lagrangian Mechanics have been imposed. DeLaN can learn the equations of motion of a mechanical system (i.e., system dynamics) with a deep network efficiently while ensuring physical plausibility.
直接融入物理洞察使我们能够获得一种新颖的深度模型学习方法，该方法在减少样本需求的同时，能够实现良好的外推效果。作为第一个例子，我们提出了深度拉格朗日网络（DeLaN），将其作为一种深度网络结构，并在其上应用了拉格朗日力学。DeLaN 能够利用深度网络高效地学习机械系统的运动方程（即系统动力学），同时确保其物理合理性。

The resulting DeLaN network performs very well at robot tracking control. The proposed method did not only outperform previous model learning approaches at learning speed but exhibits substantially improved and more robust extrapolation to novel trajectories and learns online in real-time.
由此产生的 DeLaN 网络在机器人跟踪控制方面表现非常出色。该方法不仅在学习速度上超越了以往的模型学习方法，而且在对新轨迹的外推方面也表现出显著的改进和更强的鲁棒性，并且能够实时在线学习。

1 Introduction
1 简介

In the last five years, deep learning has propelled most areas of learning forward at an impressive pace (Krizhevsky et al., 2012; Mnih et al., 2015; Silver et al., 2017) – with the exception of physically embodied systems. This lag in comparison to other application areas is somewhat surprising as learning physical models is critical for applications that control embodied systems, reason about prior actions or plan future actions (e.g., service robotics, industrial automation). Instead, most engineers prefer classical off-the-shelf modeling as it ensures physical plausibility – at a high cost of precise measurements¹¹1Highly precise models usually require taking the physical system apart and measuring the separated pieces (Albu-Schäffer, 2002). and engineering effort. These plausible representations are preferred as these models guarantee to extrapolate to new samples, while learned models only achieve good performance in the vicinity of the training data.
在过去五年中，深度学习以惊人的速度推动了大多数学习领域的发展（Krizhevsky 等人， 2012 年；Mnih 等人， 2015 年；Silver 等人， 2017 年），但物理体现系统除外。与其他应用领域相比，这种滞后有些令人惊讶，因为学习物理模型对于控制体现系统、推理先前操作或规划未来操作（例如服务机器人、工业自动化）的应用至关重要。相反，大多数工程师更喜欢经典的现成建模，因为它可以确保物理上的合理性，但需要付出精确测量 ¹¹1Highly precise models usually require taking the physical system apart and measuring the separated pieces (Albu-Schäffer, 2002). 和工程工作的高昂成本。这些合理的表示受到青睐，因为这些模型保证可以推断到新的样本，而学习模型仅在训练数据附近才能取得良好的性能。

To learn a model that obtains physically plausible representations, we propose to use the insights from physics as a model prior for deep learning. In particular, the combination of deep learning and physics seems natural as the compositional structure of deep networks enables the efficient computation of the derivatives at machine precision (Raissi & Karniadakis, 2018) and, thus, can encode a differential equation describing physical processes. Therefore, we suggest to encode the physics prior in the form of a differential in the network topology. This adapted topology amplifies the information content of the training samples, regularizes the end-to-end training, and emphasizes robust models capable of extrapolating to new samples while simultaneously ensuring physical plausibility. Hereby, we concentrate on learning models of mechanical systems using the Euler-Lagrange-Equation, a second order ordinary differential equation (ODE) originating from Lagrangian Mechanics, as physics prior. We focus on learning models of mechanical systems as this problem is one of the fundamental challenges of robotics (de Wit et al., 2012; Schaal et al., 2002).
为了学习一个能获得物理上可信表示的模型，我们建议将物理学见解用作深度学习的先验模型。特别是，深度学习和物理学的结合看起来很自然，因为深度网络的组合结构能够以机器精度高效计算导数（Raissi & Karniadakis, 2018 ），从而可以编码描述物理过程的微分方程。因此，我们建议将物理先验编码为网络拓扑中的微分形式。这种自适应拓扑放大了训练样本的信息内容，规范了端到端训练，并强调了能够推断到新样本的鲁棒模型，同时确保了物理可信度。在此，我们专注于使用欧拉-拉格朗日方程（源自拉格朗日力学的二阶常微分方程 (ODE)）作为物理先验来学习机械系统模型。我们专注于学习机械系统的模型，因为这个问题是机器人技术的基本挑战之一（de Wit 等人， 2012 年；Schaal 等人， 2002 年）。

Contribution 贡献

The contribution of this work is twofold. First, we derive a network topology called Deep Lagrangian Networks (DeLaN) encoding the Euler-Lagrange equation originating from Lagrangian Mechanics. This topology can be trained using standard end-to-end optimization techniques while maintaining physical plausibility. Therefore, the obtained model must comply with physics. Unlike previous approaches to learning physics (Atkeson et al., 1986; Ledezma & Haddadin, 2017), which engineered fixed features from physical assumptions requiring knowledge of the specific physical embodiment, we are ‘only’ enforcing physics upon a generic deep network. For DeLaN only the system state and the control signal are specific to the physical system but neither the proposed network structure nor the training procedure. Second, we extensively evaluate the proposed approach by using the model to control a simulated 2 degrees of freedom (dof) robot and the physical 7-dof robot Barrett WAM in real time. We demonstrate DeLaN’s control performance where DeLaN learns the dynamics model online starting from random initialization. In comparison to analytic- and other learned models, DeLaN yields a better control performance while at the same time extrapolates to new desired trajectories.
这项工作的贡献是双重的。首先，我们推导出一种称为深度拉格朗日网络 (DeLaN) 的网络拓扑，它对源自拉格朗日力学的欧拉-拉格朗日方程进行编码。可以使用标准端到端优化技术训练此拓扑，同时保持物理合理性。因此，获得的模型必须符合物理规律。与以前的学习物理的方法（Atkeson 等人， 1986 年；Ledezma 和 Haddadin， 2017 年）不同，这些方法根据需要了解具体物理实施例的物理假设来设计固定特征，而我们“仅仅”在通用深度网络上强制执行物理。对于 DeLaN，只有系统状态和控制信号特定于物理系统，而所提出的网络结构和训练过程均不是特定于物理系统的。其次，我们通过使用该模型实时控制模拟的 2 自由度 (dof) 机器人和物理 7 自由度机器人 Barrett WAM 来广泛评估所提出的方法。我们展示了 DeLaN 的控制性能，其中 DeLaN 从随机初始化开始在线学习动态模型。与解析模型和其他学习模型相比，DeLaN 的控制性能更佳，同时能够外推到新的期望轨迹。

In the following we provide an overview about related work (Section 2) and briefly summarize Lagrangian Mechanics (Section 3). Subsequently, we derive our proposed approach DeLaN and the necessary characteristics for end-to-end training are shown (Section 4). Finally, the experiments in Section 5 evaluate the model learning performance for both simulated and physical robots. Here, DeLaN outperforms existing approaches.
接下来，我们将概述相关工作（第 2 节），并简要总结拉格朗日力学（第 3 节）。随后，我们将推导我们提出的方法 DeLaN，并展示端到端训练所需的特性（第 4 节）。最后，第 5 节中的实验评估了模拟机器人和实体机器人的模型学习性能。DeLaN 在这方面的表现优于现有方法。

2 Related Work
2 相关工作

Models describing system dynamics, i.e. the coupling of control input $\bm{\tau}$ and system state $\mathbf{q}$ , are essential for model-based control approaches (Ioannou & Sun, 1996). Depending on the control approach, the control law relies either on the forward model $f$ , mapping from control input to the change of system state, or on the inverse model $f^{-1}$ , mapping from system change to control input, i.e.,
描述系统动力学的模型，即控制输入 $\bm{\tau}$ 和系统状态 $\mathbf{q}$ 的耦合，对于基于模型的控制方法至关重要（Ioannou & Sun, 1996 ）。根据控制方法的不同，控制律要么依赖于正向模型 $f$ （从控制输入映射到系统状态的变化），要么依赖于逆向模型 $f^{-1}$ （从系统状态映射到控制输入），即

\displaystyle f(\mathbf{q},\>\dot{\mathbf{q}},\>\bm{\tau})=\ddot{\mathbf{q}},\hskip 15.0ptf^{-1}(\mathbf{q},\>\dot{\mathbf{q}},\>\ddot{\mathbf{q}})=\bm{\tau}.

(1)

Examples for application of these models are inverse dynamics control (de Wit et al., 2012), which uses the inverse model to compensate system dynamics, while model-predictive control (Camacho & Alba, 2013) and optimal control (Zhou et al., 1996) use the forward model to plan the control input. These models can be either derived from physics or learned from data. The physics models must be derived for the individual system embodiment and requires precise knowledge of the physical properties (Albu-Schäffer, 2002). When learning the model²²2Further information can be found in the model learning survey by Nguyen-Tuong & Peters (2011)., mostly standard machine learning techniques are applied to fit either the forward- or inverse-model to the training data. E.g., authors used Linear Regression (Schaal et al., 2002; Haruno et al., 2001), Gaussian Mixture Regression (Calinon et al., 2010; Khansari-Zadeh & Billard, 2011), Gaussian Process Regression (Kocijan et al., 2004; Nguyen-Tuong et al., 2009; Nguyen-Tuong & Peters, 2010), Support Vector Regression (Choi et al., 2007; Ferreira et al., 2007), feedforward- (Jansen, 1994; Lenz et al., 2015; Ledezma & Haddadin, 2017; Sanchez-Gonzalez et al., 2018) or recurrent neural networks (Rueckert et al., 2017) to fit the model to the observed measurements.
这些模型的应用示例是逆动态控制（de Wit 等人， 2012 年），它使用逆模型来补偿系统动态，而模型预测控制（Camacho & Alba， 2013 年）和最优控制（Zhou 等人， 1996 年）则使用正向模型来规划控制输入。这些模型可以从物理学中推导出来，也可以从数据中学习到。物理模型必须针对单个系统实施例而推导，并且需要精确了解物理特性（Albu-Schäffer， 2002 年）。在学习模型 ²²2Further information can be found in the model learning survey by Nguyen-Tuong & Peters (2011). 时，大多应用标准机器学习技术来将正向或逆模型与训练数据拟合。例如，作者使用了线性回归（Schaal 等人， 2002 ；Haruno 等人， 2001 ）、高斯混合回归（Calinon 等人， 2010 ；Khansari-Zadeh 和 Billard， 2011 ）、高斯过程回归（Kocijan 等人， 2004 ；Nguyen-Tuong 等人）等人， 2009 ；Nguyen-Tuong 和 Peters， 2010 ），支持向量回归（Choi 等人， 2007 ；Ferreira 等人， 2007 ），前馈（Jansen， 1994 ；Lenz 等人， 2015 ；Ledezma 和 Haddadin， 2017 ；Sanchez-Gonzalez 等人） al., 2018 ）或循环神经网络（Rueckert et al., 2017 ）来将模型与观察到的测量值进行拟合。

Only few approaches incorporate prior knowledge into the learning problem. Sanchez-Gonzalez et al. (2018) use the graph representation of the kinematic structure as input. While the work of Atkeson et al. (1986), commonly referenced as the standard system identification technique for robot manipulators (Siciliano & Khatib, 2016), uses the Newton-Euler formalism to derive physics features using the kinematic structure and the joint measurements such that the learning of the dynamics model simplifies to linear regression. Similarly, Ledezma & Haddadin (2017) hard-code these physics features within a neural network and learn the dynamics parameters using gradient descent rather than linear regression. Even though these physics features are derived from physics, the learned parameters for mass, center of gravity and inertia must not necessarily comply with physics as the learned parameters may violate the positive definiteness of the inertia matrix or the parallel axis theorem (Ting et al., 2006). Furthermore, the linear regression is commonly underdetermined and only allows to infer linear combinations of the dynamics parameters and cannot be applied to close-loop kinematics (Siciliano & Khatib, 2016).
只有少数方法将先验知识融入学习问题中。Sanchez -Gonzalez 等人 ( 2018 ) 使用运动结构的图形表示作为输入。而 Atkeson 等人 ( 1986 ) 的工作，通常被称为机器人机械手的标准系统识别技术 (Siciliano & Khatib, 2016 ) ，使用牛顿-欧拉形式，通过运动结构和关节测量来推导物理特征，从而使动力学模型的学习简化为线性回归。同样， Ledezma & Haddadin ( 2017 ) 将这些物理特征硬编码到神经网络中，并使用梯度下降而不是线性回归来学习动力学参数。尽管这些物理特征源自物理学，但学习到的质量、重心和惯性参数并不一定符合物理学，因为学习到的参数可能违反惯性矩阵的正定性或平行轴定理 (Ting et al., 2006 ) 。此外，线性回归通常是欠定的，只能推断动力学参数的线性组合，不能应用于闭环运动学 (Siciliano & Khatib, 2016 ) 。

DeLaN follows the line of structured learning problems but in contrast to previous approaches guarantees physical plausibility and provides a more general formulation. This general formulation enables DeLaN to learn the dynamics for any kinematic structure, including kinematic trees and closed-loop kinematics, and in addition does not require any knowledge about the kinematic structure. Therefore, DeLaN is identical for all mechanical systems, which is in strong contrast to the Newton-Euler approaches, where the features are specific to the kinematic structure. Only the system state and input is specific to the system but neither the network topology nor the optimization procedure.
DeLaN 遵循结构化学习问题的思路，但与以往方法不同，它保证了物理合理性，并提供了更通用的公式。这种通用公式使 DeLaN 能够学习任何运动结构的动力学，包括运动树和闭环运动学，而且无需任何运动结构方面的知识。因此，DeLaN 适用于所有机械系统，这与牛顿-欧拉方法形成了鲜明对比，后者的特征特定于运动结构。只有系统状态和输入特定于系统，而网络拓扑和优化过程则并非如此。

The combination of differential equations and Neural Networks has previously been investigated in literature. Early on Lagaris et al. (1998; 2000) proposed to learn the solution of partial differential equations (PDE) using neural networks and currently this topic is being rediscovered by Raissi & Karniadakis (2018); Sirignano & Spiliopoulos (2017); Long et al. (2017). Most research focuses on using machine learning to overcome the limitations of PDE solvers. E.g., Sirignano & Spiliopoulos (2017) proposed the Deep Galerkin method to solve a high-dimensional PDE from scattered data. Only the work of Raissi et al. (2017) took the opposite standpoint of using the knowledge of the specific differential equation to structure the learning problem and achieve lower sample complexity. In this paper, we follow the same motivation as Raissi et al. (2017) but take a different approach. Rather than explicitly solving the differential equation, DeLaN only uses the structure of the differential equation to guide the learning problem of inferring the equations of motion. Thereby the differential equation is only implicitly solved. In addition, the proposed approach uses different encoding of the partial derivatives, which achieves the efficient computation within a single feed-forward pass, enabling the application within control loops.
文献中已经对微分方程和神经网络的结合进行了研究。早期 Lagaris 等人（ 1998 ； 2000 ）提出使用神经网络学习偏微分方程（PDE）的解，目前这个主题正由 Raissi & Karniadakis（ 2018 ）；Sirignano & Spiliopoulos（ 2017 ）；Long 等人（ 2017 ）重新发现。大多数研究集中于使用机器学习来克服 PDE 求解器的局限性。例如， Sirignano & Spiliopoulos（ 2017 ）提出了深度 Galerkin 方法从散乱数据中求解高维 PDE。只有 Raissi 等人（ 2017 ）的工作采取了相反的立场，使用特定微分方程的知识来构建学习问题并实现较低的样本复杂度。在本文中，我们遵循与 Raissi 等人相同的动机。（ 2017 ）但采取了不同的方法。DeLaN 并非明确地求解微分方程，而是仅利用微分方程的结构来指导推断运动方程的学习问题。因此，微分方程仅被隐式地求解。此外，所提出的方法对偏导数采用了不同的编码，从而在单次前馈过程中实现了高效计算，使其能够应用于控制回路。

3 Preliminaries: Lagrangian Mechanics
3 预备知识：拉格朗日力学

Describing the equations of motion for mechanical systems has been extensively studied and various formalisms to derive these equations exist. The most prominent are Newtonian-, Hamiltonian- and Lagrangian-Mechanics. Within this work Lagrangian Mechanics is used, more specifically the Euler-Lagrange formulation with non-conservative forces and generalized coordinates.³³3More information can be found in the textbooks (Greenwood, 2006; de Wit et al., 2012; Featherstone, 2007) Generalized coordinates are coordinates that uniquely define the system configuration. This formalism defines the Lagrangian $L$ as a function of generalized coordinates $\mathbf{q}$ describing the complete dynamics of a given system. The Lagrangian is not unique and every $L$ which yields the correct equations of motion is valid. The Lagrangian is generally chosen to be
描述机械系统运动方程的方法已被广泛研究，并且存在各种推导这些方程的形式化方法。其中最突出的是牛顿力学、哈密顿力学和拉格朗日力学。本文采用了拉格朗日力学，更具体地说是包含非保守力和广义坐标的欧拉-拉格朗日公式。 ³³3More information can be found in the textbooks (Greenwood, 2006; de Wit et al., 2012; Featherstone, 2007) 广义坐标是唯一定义系统配置的坐标。该形式化方法将拉格朗日量 $L$ 定义为广义坐标 $\mathbf{q}$ 的函数，用于描述给定系统的完整动力学。拉格朗日量并非唯一，并且每个能够得出正确运动方程的 $L$ 都是有效的。拉格朗日量通常选择为

\displaystyle L=T-V

(2)

where $T$ is the kinetic energy and $V$ is the potential energy. The kinetic energy $T$ can be computed for all choices of generalized coordinates using $T=\frac{1}{2}\dot{\mathbf{q}}^{T}\mathbf{H}(\mathbf{q})\dot{\mathbf{q}}$ , whereas $\mathbf{H}(\mathbf{q})$ is the symmetric and positive definite inertia matrix (de Wit et al., 2012). The positive definiteness ensures that all non-zero velocities lead to positive kinetic energy. Applying the calculus of variations yields the Euler-Lagrange equation with non-conservative forces described by
其中 $T$ 为动能， $V$ 为势能。对于所有广义坐标系，动能 $T$ 均可使用 $T=\frac{1}{2}\dot{\mathbf{q}}^{T}\mathbf{H}(\mathbf{q})\dot{\mathbf{q}}$ 计算，而 $\mathbf{H}(\mathbf{q})$ 为对称正定惯性矩阵 (de Wit et al., 2012 ) 。正定性确保所有非零速度都会导致正动能。应用变分法可得到欧拉-拉格朗日方程，其中非保守力由下式描述

\displaystyle\frac{d}{dt}\frac{\partial L}{\partial\dot{\mathbf{q}}_{i}}-\frac{\partial L}{\partial\mathbf{q}_{i}}=\bm{\tau}_{i}

(3)

where $\bm{\tau}$ are generalized forces. Substituting $L$ and $dV/d\mathbf{q}=\mathbf{g}(\mathbf{q})$ into Equation 3 yields the second order ordinary differential equation (ODE) described by
其中 $\bm{\tau}$ 是广义力。将 $L$ 和 $dV/d\mathbf{q}=\mathbf{g}(\mathbf{q})$ 代入方程 3 ，可得到二阶常微分方程 (ODE)，其表达式为

\displaystyle\mathbf{H}(\mathbf{q})\ddot{\mathbf{q}}+\underbrace{\dot{\mathbf{H}}(\mathbf{q})\dot{\mathbf{q}}-\frac{1}{2}\left(\frac{\partial}{\partial\mathbf{q}}\left(\dot{\mathbf{q}}^{T}\mathbf{H}(\mathbf{q})\dot{\mathbf{q}}\right)\right)^{T}}_{\coloneqq\mathbf{c}(\mathbf{q},\dot{\mathbf{q}})}+\>\mathbf{g}(\mathbf{q})=\bm{\tau}

(4)

where $\mathbf{c}$ describes the forces generated by the Centripetal and Coriolis forces (Featherstone, 2007). Using this ODE any multi-particle mechanical system with holonomic constraints can be described. For example various authors used this ODE to manually derived the equations of motion for coupled pendulums (Greenwood, 2006), robotic manipulators with flexible joints (Book, 1984; Spong, 1987), parallel robots (Miller, 1992; Geng et al., 1992; Liu et al., 1993) or legged robots (Hemami & Wyman, 1979; Golliday & Hemami, 1977).
其中 $\mathbf{c}$ 描述由向心力和科里奥利力产生的力 (Featherstone, 2007 ) 。使用该 ODE 可以描述任何具有完整约束的多粒子力学系统。例如，许多作者使用该 ODE 手动推导了耦合摆 (Greenwood, 2006 ) 、具有柔性关节的机器人操纵器 (Book, 1984 ; Spong, 1987 ) 、并联机器人 (Miller, 1992 ; Geng et al., 1992 ; Liu et al., 1993 ) 或腿式机器人 (Hemami & Wyman, 1979 ; Golliday & Hemami, 1977 ) 的运动方程。

4 Incorporating Lagrangian Mechanics into Deep Learning
4 将拉格朗日力学融入深度学习

Starting from the Euler-Lagrange equation (Equation 4), traditional engineering approaches would estimate $\mathbf{H}(\mathbf{q})$ and $\mathbf{g}(\mathbf{q})$ from the approximated or measured masses, lengths and moments of inertia. On the contrary most traditional model learning approaches would ignore the structure and learn the inverse dynamics model directly from data. DeLaN bridges this gap by incorporating the structure introduced by the ODE into the learning problem and learns the parameters in an end-to-end fashion. More concretely, DeLaN approximates the inverse model by representing the unknown functions $\mathbf{g}(\mathbf{q})$ and $\mathbf{H}(\mathbf{q})$ as a feed-forward networks. Rather than representing $\mathbf{H}(\mathbf{q})$ directly, the lower-triangular matrix $\mathbf{L}(\mathbf{q})$ is represented as deep network. Therefore, $\mathbf{g}(\mathbf{q})$ and $\mathbf{H}(\mathbf{q})$ are described by
从欧拉-拉格朗日方程（方程 4 ）出发，传统的工程方法会根据近似或测量的质量、长度和转动惯量来估算 $\mathbf{H}(\mathbf{q})$ 和 $\mathbf{g}(\mathbf{q})$ 。相反，大多数传统的模型学习方法会忽略结构，直接从数据中学习逆动力学模型。DeLaN 通过将 ODE 引入的结构纳入学习问题并以端到端的方式学习参数来弥补这一差距。更具体地说，DeLaN 通过将未知函数 $\mathbf{g}(\mathbf{q})$ 和 $\mathbf{H}(\mathbf{q})$ 表示为前馈网络来近似逆模型。下三角矩阵 $\mathbf{L}(\mathbf{q})$ 不是直接表示 $\mathbf{H}(\mathbf{q})$ ，而是表示为深度网络。因此， $\mathbf{g}(\mathbf{q})$ 和 $\mathbf{H}(\mathbf{q})$ 描述为

\displaystyle\operatorname{\mathbf{{\hat{H\mkern 0.0mu}\mkern 0.0mu}{}}}(\mathbf{q})=\operatorname{\mathbf{{\hat{L\mkern-5.0mu}\mkern 5.0mu}{}}}(\mathbf{q}\>;\>\theta)\operatorname{\mathbf{{\hat{L\mkern-5.0mu}\mkern 5.0mu}{}}}(\mathbf{q}\>;\>\theta)^{T}\hskip 15.0pt\operatorname{\mathbf{\hat{g}}}(\mathbf{q})=\operatorname{\mathbf{\hat{g}}}(\mathbf{q}\>;\>\psi)

where $\hat{.}$ refers to an approximation and $\theta$ and $\psi$ are the respective network parameters. The parameters $\theta$ and $\psi$ can be obtained by minimizing the violation of the physical law described by Lagrangian Mechanics. Therefore, the optimization problem is described by
其中 $\hat{.}$ 表示近似值， $\theta$ 和 $\psi$ 分别为网络参数。参数 $\theta$ 和 $\psi$ 可以通过最小化违反拉格朗日力学所描述的物理定律来获得。因此，优化问题描述为

$\displaystyle\left(\theta^{},\>\psi^{}\right)$	$\displaystyle=\operatorname*{arg\,min}_{\theta,\psi}\hskip 5.0pt\ell\left(\hat{f}^{-1}(\mathbf{q},\dot{\mathbf{q}},\ddot{\mathbf{q}}\>;\>\theta,\psi),\>\>\bm{\tau}\right)$	(5)
$\displaystyle\text{with}\hskip 10.0pt\hat{f}^{-1}(\mathbf{q},\>\dot{\mathbf{q}},\>\ddot{\mathbf{q}}\>;\>\theta,\>\psi)$	$\displaystyle=\operatorname{\mathbf{{\hat{L\mkern-5.0mu}\mkern 5.0mu}{}}}\operatorname{\mathbf{{\hat{L\mkern-5.0mu}\mkern 5.0mu}{}}}^{T}\ddot{\mathbf{q}}+\frac{d}{dt}\left(\operatorname{\mathbf{{\hat{L\mkern-5.0mu}\mkern 5.0mu}{}}}\operatorname{\mathbf{{\hat{L\mkern-5.0mu}\mkern 5.0mu}{}}}^{T}\right)\dot{\mathbf{q}}-\frac{1}{2}\left(\frac{\partial}{\partial\mathbf{q}}\left(\dot{\mathbf{q}}^{T}\operatorname{\mathbf{{\hat{L\mkern-5.0mu}\mkern 5.0mu}{}}}\operatorname{\mathbf{{\hat{L\mkern-5.0mu}\mkern 5.0mu}{}}}^{T}\dot{\mathbf{q}}\right)\right)^{T}+\>\operatorname{\mathbf{\hat{g}}}$	(6)
$\displaystyle\text{s.t.}\hskip 5.0pt0$	$\displaystyle<\mathbf{x}^{T}\hskip 3.0pt\operatorname{\mathbf{{\hat{L\mkern-5.0mu}\mkern 5.0mu}{}}}\operatorname{\mathbf{{\hat{L\mkern-5.0mu}\mkern 5.0mu}{}}}^{T}\mathbf{x}\hskip 15.0pt\forall\>\>\mathbf{x}\in\mathbb{R}_{0}^{n}$	(7)

where $\hat{f}^{-1}$ is the inverse model and $\ell$ can be any differentiable loss function. The computational graph of $\hat{f}^{-1}$ is shown in Figure 1.
其中 $\hat{f}^{-1}$ 是逆模型， $\ell$ 可以是任何可微分的损失函数。 $\hat{f}^{-1}$ 的计算图如图 1 所示。

Using this formulation one can conclude further properties of the learned model. Neither $\operatorname{\mathbf{{\hat{L\mkern-5.0mu}\mkern 5.0mu}{}}}$ nor $\operatorname{\mathbf{\hat{g}}}$ are functions of $\dot{\mathbf{q}}$ or $\ddot{\mathbf{q}}$ and, hence, the obtained parameters should, within limits, generalize to arbitrary velocities and accelerations. In addition, the obtained model can be reformulated and used as a forward model. Solving Equation 6 for $\ddot{\mathbf{q}}$ yields the forward model described by
利用此公式，我们可以推导出学习模型的更多属性。 $\operatorname{\mathbf{{\hat{L\mkern-5.0mu}\mkern 5.0mu}{}}}$ 和 $\operatorname{\mathbf{\hat{g}}}$ 都不是 $\dot{\mathbf{q}}$ 或 $\ddot{\mathbf{q}}$ 的函数，因此，所获得的参数在一定范围内应该可以推广到任意速度和加速度。此外，所获得的模型可以重新表述并用作正演模型。求解方程 6 中的 $\ddot{\mathbf{q}}$ 可得到如下正演模型：

\displaystyle\hat{f}(\mathbf{q},\>\dot{\mathbf{q}},\>\bm{\tau}\>;\>\theta,\>\psi)

\displaystyle=\left(\operatorname{\mathbf{{\hat{L\mkern-5.0mu}\mkern 5.0mu}{}}}\operatorname{\mathbf{{\hat{L\mkern-5.0mu}\mkern 5.0mu}{}}}^{T}\right)^{-1}\left(\bm{\tau}-\frac{d}{dt}\left(\operatorname{\mathbf{{\hat{L\mkern-5.0mu}\mkern 5.0mu}{}}}\operatorname{\mathbf{{\hat{L\mkern-5.0mu}\mkern 5.0mu}{}}}^{T}\right)\dot{\mathbf{q}}+\frac{1}{2}\left(\frac{\partial}{\partial\mathbf{q}}\left(\dot{\mathbf{q}}^{T}\operatorname{\mathbf{{\hat{L\mkern-5.0mu}\mkern 5.0mu}{}}}\operatorname{\mathbf{{\hat{L\mkern-5.0mu}\mkern 5.0mu}{}}}^{T}\dot{\mathbf{q}}\right)\right)^{T}-\>\operatorname{\mathbf{\hat{g}}}\right)

(8)

where $\operatorname{\mathbf{{\hat{L\mkern-5.0mu}\mkern 5.0mu}{}}}\operatorname{\mathbf{{\hat{L\mkern-5.0mu}\mkern 5.0mu}{}}}^{T}$ is guaranteed to be invertible due to the positive definite constraint (Equation 7). However, solving the optimization problem of Equation 5 directly is not possible due to the ill-posedness of the Lagrangian $L$ not being unique. The Euler-Lagrange equation is invariant to linear transformation and, hence, the Lagrangian $L^{\prime}=\alpha L+\beta$ solves the Euler-Lagrange equation if $\alpha$ is non-zero and $L$ is a valid Lagrangian. This problem can be mitigated by adding an additional penalty term to Equation 5 described by

\displaystyle\left(\theta^{*},\>\psi^{*}\right)

\displaystyle=\operatorname*{arg\,min}_{\theta,\psi}\hskip 5.0pt\ell\left(\hat{f}^{-1}(\mathbf{q},\dot{\mathbf{q}},\ddot{\mathbf{q}}\>;\>\theta,\psi),\>\>\bm{\tau}\right)+\lambda\>\>\Omega(\theta,\>\psi)

(9)

where $\Omega$ is the $L_{2}$ -norm of the network weights.

Solving the optimization problem of Equation 9 with a gradient based end-to-end learning approach is non-trivial due to the positive definite constraint (Equation 7) and the derivatives contained in $\hat{f}^{-1}$ . In particular, $d(\mathbf{L}\mathbf{L}^{T})/dt$ and $\partial\left(\dot{\mathbf{q}}^{T}\mathbf{L}\mathbf{L}^{T}\dot{\mathbf{q}}\right)/\partial\mathbf{q}_{i}$ cannot be computed using automatic differentiation as $t$ is not an input of the network and most implementations of automatic differentiation do not allow the backpropagation of the gradient through the computed derivatives. Therefore, the derivatives contained in $\hat{f}^{-1}$ must be computed analytically to exploit the full gradient information for training of the parameters. In the following we introduce a network structure that fulfills the positive-definite constraint for all parameters (Section 4.1), prove that the derivatives $d(\mathbf{L}\mathbf{L}^{T})/dt$ and $\partial\left(\dot{\mathbf{q}}^{T}\mathbf{L}\mathbf{L}^{T}\dot{\mathbf{q}}\right)/\partial\mathbf{q}_{i}$ can be computed analytically (Section 4.2) and show an efficient implementation for computing the derivatives using a single feed-forward pass (Section 4.3). Using these three properties the resulting network architecture can be used within a real-time control loop and trained using standard end-to-end optimization techniques.

Refer to caption — Figure 1: The computational graph of the Deep Lagrangian Network (DeLaN). Shown in blue and green is the neural network with the three separate heads computing $\mathbf{g}(\mathbf{q})$ , $\mathbf{l}_{d}(\mathbf{q})$ , $\mathbf{l}_{o}(\mathbf{q})$ . The orange boxes correspond to the reshaping operations and the derivatives contained in the Euler-Lagrange equation. For training the gradients are backpropagated through all vertices highlighted in orange.

4.1 Symmetry and Positive Definiteness of $\mathbf{H}$

Ensuring the symmetry and positive definiteness of $\mathbf{H}$ is essential as this constraint enforces positive kinetic energy for all non-zero velocities. In addition, the positive definiteness ensures that $\mathbf{H}$ is invertible and the obtained model can be used as forward model. By representing the matrix $\mathbf{H}$ as the product of a lower-triangular matrix the symmetry and the positive semi-definiteness is ensured while simultaneously reducing the number of parameters. The positive definiteness is obtained if the diagonal of $\mathbf{L}$ is positive. This positive diagonal also guarantees that $\mathbf{L}$ is invertible. Using a deep network with different heads and altering the activation of the output layer one can obtain a positive diagonal. The off-diagonal elements $\mathbf{L}_{o}$ use a linear activation while the diagonal elements $\mathbf{L}_{d}$ use a non-negative activation, e.g., ReLu or Softplus. In addition, a positive scalar $b$ is added to diagonal elements. Thereby, ensuring a positive diagonal of $\mathbf{L}$ and the positive eigenvalues of $\mathbf{H}$ . In addition, we chose to share parameters between $\mathbf{L}$ and $\mathbf{g}$ as both rely on the same physical embodiment. The network architecture, with three-heads representing the diagonal $\mathbf{l}_{d}$ and off-diagonal $\mathbf{l}_{o}$ entries of $\mathbf{L}$ and $\mathbf{g}$ , is shown in Figure 1.

4.2 Deriving the derivatives

The derivatives $d\left(\mathbf{L}\mathbf{L}^{T}\right)/dt$ and $\partial\left(\dot{\mathbf{q}}^{T}\mathbf{L}\mathbf{L}^{T}\dot{\mathbf{q}}\right)/\partial\mathbf{q}_{i}$ are required for computing the control signal $\bm{\tau}$ using the inverse model and, hence, must be available within the forward pass. In addition, the second order derivatives, used within the backpropagation of the gradients, must exist to train the network using end-to-end training. To enable the computation of the second order derivatives using automatic differentiation the forward computation must be performed analytically. Both derivatives, $d\left(\mathbf{L}\mathbf{L}^{T}\right)/dt$ and $\partial\left(\dot{\mathbf{q}}^{T}\mathbf{L}\mathbf{L}^{T}\dot{\mathbf{q}}\right)/\partial\mathbf{q}_{i}$ , have closed form solutions and can be derived by first computing the respective derivative of $\mathbf{L}$ and second substituting the reshaped derivative of the vectorized form $\mathbf{l}$ . For the temporal derivative $d\left(\mathbf{L}\mathbf{L}^{T}\right)/dt$ this yields

\displaystyle\frac{d}{dt}\mathbf{H}(\mathbf{q})

\displaystyle=\frac{d}{dt}\left(\mathbf{L}\mathbf{L}^{T}\right)=\mathbf{L}\frac{d\mathbf{L}}{dt}^{T}+\frac{d\mathbf{L}}{dt}\mathbf{L}^{T}

(10)

whereas $d\mathbf{L}/dt$ can be substituted with the reshaped form of

\displaystyle\frac{d}{dt}\mathbf{l}

\displaystyle=\frac{\partial\mathbf{l}}{\partial\mathbf{q}}\frac{\partial\mathbf{q}}{\partial t}+\sum_{i=1}^{N}\frac{\partial\mathbf{l}}{\partial\mathbf{W}_{i}}\frac{\partial\mathbf{W}_{i}}{\partial t}+\sum_{i=1}^{N}\frac{\partial\mathbf{l}}{\partial\mathbf{b}_{i}}\frac{\partial\mathbf{b}_{i}}{\partial t}

(11)

where $i$ refers to the $i$ -th network layer consisting of an affine transformation and the non-linearity $g$ , i.e., $\mathbf{h}_{i}=g_{i}\left(\mathbf{W}_{i}^{T}\mathbf{h}_{i-1}+\mathbf{b}_{i}\right)$ . Equation 11 can be simplified as the network weights $\mathbf{W}_{i}$ and biases $\mathbf{b}_{i}$ are time-invariant, i.e., $d\mathbf{W}_{i}/dt=0$ and $d\mathbf{b}_{i}/dt=0$ . Therefore, $d\mathbf{l}/dt$ is described by

\displaystyle\frac{d}{dt}\mathbf{l}=\frac{\partial\mathbf{l}}{\partial\mathbf{q}}\dot{\mathbf{q}}.

(12)

Due to the compositional structure of the network and the differentiability of the non-linearity, the derivative with respect to the network input $d\mathbf{l}/d\mathbf{q}$ can be computed by recursively applying the chain rule, i.e.,

\displaystyle\frac{\partial\mathbf{l}}{\partial\mathbf{q}}

\displaystyle=\frac{\partial\mathbf{l}}{\partial\mathbf{h}_{N-1}}\frac{\partial\mathbf{h}_{N-1}}{\partial\mathbf{h}_{N-2}}\hskip 5.0pt\cdots\hskip 5.0pt\frac{\partial\mathbf{h}_{1}}{\partial\mathbf{q}}\hskip 30.0pt\frac{\partial\mathbf{h}_{i}}{\partial\mathbf{h}_{i-1}}=\text{diag}\left(g^{\prime}(\mathbf{W}_{i}^{T}\mathbf{h}_{i-1}+\mathbf{b}_{i})\right)\mathbf{W}_{i}

(13)

where $g^{\prime}$ is the derivative of the non-linearity. Similarly to the previous derivation, the partial derivative of the quadratic term can be computed using the chain rule, which yields

\displaystyle\frac{\partial}{\partial\mathbf{q}_{i}}\left[\dot{\mathbf{q}}^{T}\mathbf{H}\dot{\mathbf{q}}\right]=\text{tr}\left[\left(\dot{\mathbf{q}}\dot{\mathbf{q}}^{T}\right)^{T}\frac{\partial\mathbf{H}}{\partial\mathbf{q}_{i}}\right]=\dot{\mathbf{q}}^{T}\left(\frac{\partial\mathbf{L}}{\partial\mathbf{q}_{i}}\mathbf{L}^{T}+\mathbf{L}\frac{\partial\mathbf{L}}{\partial\mathbf{q}_{i}}^{T}\right)\dot{\mathbf{q}}

(14)

whereas $\partial\mathbf{L}/\partial\mathbf{q}_{i}$ can be constructed using the columns of previously derived $\partial\mathbf{l}/\partial\mathbf{q}$ . Therefore, all derivatives included within $\hat{f}$ can be computed in closed form.

4.3 Computing the Derivatives

The derivatives of Section 4.2 must be computed within a real-time control loop and only add minimal computational complexity in order to not break the real-time constraint. $\mathbf{l}$ and $\partial\mathbf{l}/\partial\mathbf{q}$ , required within Equation 10 and Equation 14, can be simultaneously computed using an extended standard layer. Extending the affine transformation and non-linearity of the standard layer with an additional sub-graph for computing $\partial\mathbf{h}_{i}/\partial\mathbf{h}_{i-1}$ yields the Lagrangian layer described by

\displaystyle\mathbf{a}_{i}=\mathbf{W}_{i}\mathbf{h}_{i-1}+\mathbf{b}_{i}\hskip 30.0pt\mathbf{h}_{1}=g_{i}\left(\mathbf{a}_{i}\right)\hskip 30.0pt\frac{\partial\mathbf{h}_{i}}{\partial\mathbf{h}_{i-1}}=\text{diag}\left(g^{\prime}_{i}(\mathbf{a}_{i})\right)\mathbf{W}_{i}.

The computational graph of the Lagrangian layer is shown in Figure 2a. Chaining the Lagrangian layer yields the compositional structure of $\partial\mathbf{l}/\partial\mathbf{q}$ (Equation 13) and enables the efficient computation of $\partial\mathbf{l}/\partial\mathbf{q}$ . Additional reshaping operations compute $d\mathbf{L}/dt$ and $\partial\mathbf{L}/\partial\mathbf{q}_{i}$ .

5 Experimental Evaluation: Learning an Inverse Dynamics Model for Robot Control

To demonstrate the applicability and extrapolation of DeLaN, the proposed network topology is applied to model-based control for a simulated 2-dof robot (Figure 3b) and the physical 7-dof robot Barrett WAM (Figure 3d). The performance of DeLaN is evaluated using the tracking error on train and test trajectories and compared to a learned and analytic model. This evaluation scheme follows existing work (Nguyen-Tuong et al., 2009; Sanchez-Gonzalez et al., 2018) as the tracking error is the relevant performance indicator while the mean squared error (MSE)⁴⁴4An offline comparisons evaluating the MSE on datasets can be found in the Appendix A. obtained using sample based optimization exaggerates model performance (Hobbs & Hepenstal, 1989). In addition to most previous work, we strictly limit all model predictions to real-time and perform the learning online, i.e., the models are randomly initialized and must learn the model during the experiment.

Experimental Setup

Within the experiment the robot executes multiple desired trajectories with specified joint positions, velocities and accelerations. The control signal, consisting of motor torques, is generated using a non-linear feedforward controller, i.e., a low gain PD-Controller augmented with a feed-forward torque $\bm{\tau}_{ff}$ to compensate system dynamics. The control law is described by

\displaystyle\bm{\tau}=\mathbf{K}_{p}\left(\mathbf{q}_{d}-\mathbf{q}\right)+\mathbf{K}_{d}\left(\dot{\mathbf{q}}_{d}-\dot{\mathbf{q}}\right)+\bm{\tau}_{ff}\hskip 10.0pt\text{with}\hskip 5.0pt\bm{\tau}_{ff}=\hat{f}^{-1}(\mathbf{q}_{d},\dot{\mathbf{q}}_{d},\ddot{\mathbf{q}}_{d})

where $\mathbf{K}_{p}$ , $\mathbf{K}_{d}$ are the controller gains and $\mathbf{q}_{d}$ , $\dot{\mathbf{q}}_{d}$ , $\ddot{\mathbf{q}}_{d}$ the desired joint positions, velocities and accelerations. The control-loop is shown in Figure 3a. For all experiments the control frequency is set to $500$ Hz while the desired joint state and respectively $\bm{\tau}_{ff}$ is updated with a frequency of $f_{d}=200$ Hz. All feed-forward torques are computed online and, hence, the computation time is strictly limited to $T\leq 1/200$ s. The tracking performance is defined as the sum of the MSE evaluated at the sampling points of the reference trajectory.

For the desired trajectories two different data sets are used. The first data set contains all single stroke characters⁵⁵5The data set was created by Williams et al. (2008) and is available at Dheeru & Karra Taniskidou (2017)) while the second data set uses cosine curves in joint space (Figure 3c). The 20 characters are spatially and temporally re-scaled to comply with the robot kinematics. The joint references are computed using the inverse kinematics. Due to the different characters, the desired trajectories contain smooth and sharp turns and cover a wide variety of different shapes but are limited to a small task space region. In contrast, the cosine trajectories are smooth but cover a large task space region.

Baselines

The performance of DeLaN is compared to an analytic inverse dynamics model, a standard feed-forward neural network (FF-NN) and a PD-Controller. For the analytic models the torque is computed using the Recursive Newton-Euler algorithm (RNE) (Luh et al., 1980), which computes the feed-forward torque using estimated physical properties of the system, i.e. the link dimensions, masses and moments of inertia. For implementations the open-source library PyBullet (Coumans & Bai, 2016–2018) is used.

Both deep networks use the same dimensionality, ReLu nonlinearities and must learn the system dynamics online starting from random initialization. The training samples containing joint states and applied torques $\left(\mathbf{q},\>\dot{\mathbf{q}},\>\ddot{\mathbf{q}},\>\bm{\tau}\right)_{0,\dots T}$ are directly read from the control loop as shown in Figure 3a. The training runs in a separate process on the same machine and solves the optimization problem online. Once the training process computed a new model, the inverse model $\hat{f}^{-1}$ of the control loop is updated.

5.1 Simulated Robot Experiments

The 2-dof robot shown in Figure 3b is simulated using PyBullet and executes the character and cosine trajectories. Figure 4 shows the ground truth torques of the characters ’a’, ’d’, ’e’, the torque ground truth components and the learned decomposition using DeLaN (Figure 4a-d). Even though DeLaN is trained on the super-imposed torques, DeLaN learns to disambiguate the inertial force $\mathbf{H}\ddot{\mathbf{q}}$ , the Coriolis and Centrifugal force $\mathbf{c}(\mathbf{q},\dot{\mathbf{q}})$ and the gravitational force $\mathbf{g}(\mathbf{q})$ as the respective curves overlap closely. Hence, DeLaN is capable of learning the underlying physical model using the proposed network topology trained with standard end-to-end optimization.

Figure 4d shows the offline MSE on the test set averaged over multiple seeds for the FF-NN and DeLaN w.r.t. to different training set sizes. The different training set sizes correspond to the combination of $n$ random characters, i.e., a training set size of $1$ corresponds to training the model on a single character and evaluating the performance on the remaining $19$ characters. DeLaN clearly obtains a lower test MSE compared to the FF-NN. Especially the difference in performance increases when the training set is reduced. This increasing difference on the test MSE highlights the reduced sample complexity and the good extrapolation to unseen samples. This difference in performance is amplified on the real-time control-task where the models are learned online starting from random initialization. Figure 5a and b shows the accumulated tracking error per testing character and the testing error averaged over all test characters while Figure 5c shows the qualitative comparison of the control performance⁶⁶6The full results containing all characters are provided in the Appendix B.. It is important to point out that all shown results are averaged over multiple seeds and only incorporate characters not used for training and, hence, focus the evaluation on the extrapolation to new trajectories. The qualitative comparison shows that DeLaN is able to execute all $20$ characters when trained on $8$ random characters. The obtained tracking error is comparable to the analytic model, which in this case contains the simulation parameters and is optimal. In contrast, the FF-NN shows significant deviation from the desired trajectories when trained on $8$ random characters. The quantitative comparison of the accumulated tracking error over seeds (Figure 5b) shows that DeLaN obtains lower tracking error on all training set sizes compared to the FF-NN. This good performance using only few training characters shows that DeLaN has a lower sample complexity and better extrapolation to unseen trajectories compared to the FF-NN.

Figure 6a and b show the performance on the cosine trajectories. For this experiment the models are only trained online on two trajectories with a velocity scale of $1$ x. To assess the extrapolation w.r.t. velocities and accelerations the learned models are tested on the same trajectories with scaled velocities (gray area of Figure 6). On the training trajectories DeLaN and the FF-NN perform comparable. When the velocities are increased the performance of FF-NN deteriorates because the new trajectories do not lie within the vicinity of the training distribution as the domain of the FF-NN is defined as $\left(\mathbf{q},\dot{\mathbf{q}},\ddot{\mathbf{q}}\right)$ . Therefore, FF-NN cannot extrapolate to the testing data. In contrast, the domain of the networks $\operatorname{\mathbf{{\hat{L\mkern-5.0mu}\mkern 5.0mu}{}}}$ and $\operatorname{\mathbf{\hat{g}}}$ composing DeLaN only consist of $\mathbf{q}$ , rather than $\left(\mathbf{q},\dot{\mathbf{q}},\ddot{\mathbf{q}}\right)$ . This reduced domain enables DeLaN, within limit, to extrapolate to the test trajectories. The increase in tracking error is caused by the structure of $\hat{f}^{-1}$ , where model errors to scale quadratic with velocities. However, the obtained tracking error on the testing trajectories is significantly lower compared to FF-NN.

5.2 Physical Robot Experiments

For physical experiments the desired trajectories are executed on the Barrett WAM, a robot with direct cable drives. The direct cable drives produce high torques generating fast and dexterous movements but yield complex dynamics, which cannot be modelled using rigid-body dynamics due to the variable stiffness and lengths of the cables⁷⁷7The cable drives and cables could be modelled simplistically using two joints connected by massless spring.. Therefore, the Barrett WAM is ideal for testing the applicability of model learning and analytic models⁸⁸8The analytic model of the Barrett WAM is obtained using a publicly available URDF (JHU LCSR, 2018) on complex dynamics. For the physical experiments we focus on the cosine trajectories as these trajectories produce dynamic movements while character trajectories are mainly dominated by the gravitational forces. In addition, only the dynamics of the four lower joints are learned because these joints dominate the dynamics and the upper joints cannot be sufficiently excited to retrieve the dynamics parameters.

Figure 6c and d show the tracking error on the cosine trajectories using the the simulated Barrett WAM while Figure 6e and f show the tracking error of the physical Barrett WAM. It is important to note, that the simulation only simulates the rigid-body dynamics not including the direct cables drives and the simulation parameters are inconsistent with the parameters of the analytic model. Therefore, the analytic model is not optimal. On the training trajectories executed on the physical system the FF-NN performs better compared to DeLaN and the analytic model. DeLaN achieves slightly better tracking error than the analytic model, which uses the same rigid-body assumptions as DeLaN. That shows DeLaN can learn a dynamics model of the WAM but is limited by the model assumptions of Lagrangian Mechanics. These assumptions cannot represent the dynamics of the cable drives. When comparing to the simulated results, DeLaN and the FF-NN perform comparable but significantly better than the analytic model. These simulation results show that DeLaN can learn an accurate model of the WAM, when the underlying assumptions of the physics prior hold. The tracking performance on the physical system and the simulation indicate that DeLaN can learn a model within the model class of the physics prior but also inherits the limitations of the physics prior. For this specific experiment the FF-NN can locally learn correlations of the torques w.r.t. $\mathbf{q}$ , $\dot{\mathbf{q}}$ and $\ddot{\mathbf{q}}$ while such correlation cannot be represented by the network topology of DeLaN because such correlation should, by definition of the physics prior, not exist.

When extrapolating to the identical trajectories with higher velocities (gray area of Figure 6) the tracking error of the FF-NN deteriorates much faster compared to DeLaN, because the FF-NN overfits to the training data. The tracking error of the analytic model remains constant and demonstrates the guaranteed extrapolation of the analytic models. When comparing the simulated results, the FF-NN cannot extrapolate to the new velocities and the tracking error deteriorates similarly to the performance on the physical robot. In contrast to the FF-NN, DeLaN can extrapolate to the higher velocities and maintains a good tracking error. Even further, DeLaN obtains a better tracking error compared the analytic model on all velocity scales. This low tracking error on all test trajectories highlights the improved extrapolation of DeLaN compared to other model learning approaches.

6 Conclusion

We introduced the concept of incorporating a physics prior within the deep learning framework to achieve lower sample complexity and better extrapolation. In particular, we proposed Deep Lagrangian Networks (DeLaN), a deep network on which Lagrangian Mechanics is imposed. This specific network topology enabled us to learn the system dynamics using end-to-end training while maintaining physical plausibility. We showed that DeLaN is able to learn the underlying physics from a super-imposed signal, as DeLaN can recover the contribution of the inertial-, gravitational and centripetal forces from sensor data. The quantitative evaluation within a real-time control loop assessing the tracking error showed that DeLaN can learn the system dynamics online, obtains lower sample complexity and better generalization compared to a feed-forward neural network. DeLaN can extrapolate to new trajectories as well as to increased velocities, where the performance of the feed-forward network deteriorates due to the overfitting to the training data. When applied to a physical systems with complex dynamics the bounded representational power of the physics prior can be limiting. However, this limited representational power enforces the physical plausibility and obtains the lower sample complexity and substantially better generalization. In future work the physics prior should be extended to represent a wider system class by introducing additional non-conservative forces within the Lagrangian.

Acknowledgments

This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No #640554 (SKILLS4ROBOTS). Furthermore, this research was also supported by grants from ABB, NVIDIA and the NVIDIA DGX Station.

References

Albu-Schäffer (2002) Alin Albu-Schäffer. Regelung von Robotern mit elastischen Gelenken am Beispiel der DLR-Leichtbauarme. PhD thesis, Technische Universität München, 2002.
Atkeson et al. (1986) Christopher G Atkeson, Chae H An, and John M Hollerbach. Estimation of inertial parameters of manipulator loads and links. The International Journal of Robotics Research, 5(3):101–119, 1986.
Book (1984) Wayne J Book. Recursive lagrangian dynamics of flexible manipulator arms. The International Journal of Robotics Research, 3(3):87–101, 1984.
Calinon et al. (2010) Sylvain Calinon, Florent D’halluin, Eric L Sauser, Darwin G Caldwell, and Aude G Billard. Learning and reproduction of gestures by imitation. IEEE Robotics & Automation Magazine, 17(2):44–54, 2010.
Camacho & Alba (2013) Eduardo F Camacho and Carlos Bordons Alba. Model predictive control. Springer Science & Business Media, Berlin, Heidelberg, 2013.
Choi et al. (2007) Younggeun Choi, Shin-Young Cheong, and Nicolas Schweighofer. Local online support vector regression for learning control. In International Symposium on Computational Intelligence in Robotics and Automation, pp. 13–18. IEEE, 2007.
Coumans & Bai (2016–2018) Erwin Coumans and Yunfei Bai. Pybullet, a python module for physics simulation for games, robotics and machine learning. http://pybullet.org, 2016–2018.
de Wit et al. (2012) Carlos Canudas de Wit, Bruno Siciliano, and Georges Bastin. Theory of robot control. Springer Science & Business Media, 2012.
Dheeru & Karra Taniskidou (2017) Dua Dheeru and Efi Karra Taniskidou. UCI machine learning repository, 2017. URL http://archive.ics.uci.edu/ml.
Featherstone (2007) Roy Featherstone. Rigid Body Dynamics Algorithms. Springer-Verlag, Berlin, Heidelberg, 2007. ISBN 0387743146.
Ferreira et al. (2007) Joao P Ferreira, Manuel Crisostomo, A Paulo Coimbra, and Bernardete Ribeiro. Simulation control of a biped robot with support vector regression. In IEEE International Symposium on Intelligent Signal Processing, pp. 1–6. IEEE, 2007.
Geng et al. (1992) Zheng Geng, Leonard S Haynes, James D Lee, and Robert L Carroll. On the dynamic model and kinematic analysis of a class of stewart platforms. Robotics and autonomous systems, 9(4):237–254, 1992.
Golliday & Hemami (1977) C. Leslie Golliday and Hooshang Hemami. An approach to analyzing biped locomotion dynamics and designing robot locomotion controls. IEEE Transactions on Automatic Control, 22(6):963–972, December 1977. ISSN 0018-9286. doi: 10.1109/TAC.1977.1101650.
Greenwood (2006) Donald T Greenwood. Advanced dynamics. Cambridge University Press, 2006.
Haruno et al. (2001) Masahiko Haruno, Daniel M Wolpert, and Mitsuo Kawato. Mosaic model for sensorimotor learning and control. Neural computation, 13(10):2201–2220, 2001.
Hemami & Wyman (1979) Hooshang Hemami and Bostwick Wyman. Modeling and control of constrained dynamic systems with application to biped locomotion in the frontal plane. IEEE Transactions on Automatic Control, 24(4):526–535, August 1979. ISSN 0018-9286. doi: 10.1109/TAC.1979.1102105.
Hobbs & Hepenstal (1989) Benjamin F Hobbs and Ann Hepenstal. Is optimization optimistically biased? Water Resources Research, 25(2):152–160, 1989.
Ioannou & Sun (1996) Petros A Ioannou and Jing Sun. Robust adaptive control, volume 1. Prentice-Hall, 1996.
Jansen (1994) M Jansen. Learning an accurate neural model of the dynamics of a typical industrial robot. In International Conference on Artificial Neural Networks, pp. 1257–1260, 1994.
JHU LCSR (2018) JHU LCSR JHU LCSR. Barrett model containing the 7-dof urdf, 2018. URL https://github.com/jhu-lcsr/barrett_model.
Khansari-Zadeh & Billard (2011) S Mohammad Khansari-Zadeh and Aude Billard. Learning stable nonlinear dynamical systems with gaussian mixture models. IEEE Transactions on Robotics, 27(5):943–957, 2011.
Kocijan et al. (2004) Juš Kocijan, Roderick Murray-Smith, Carl Edward Rasmussen, and Agathe Girard. Gaussian process model based predictive control. In American Control Conference, volume 3, pp. 2214–2219. IEEE, 2004.
Krizhevsky et al. (2012) Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, pp. 1097–1105, 2012.
Lagaris et al. (1998) Isaac E Lagaris, Aristidis Likas, and Dimitrios I Fotiadis. Artificial neural networks for solving ordinary and partial differential equations. IEEE Transactions on Neural Networks, 9(5):987–1000, 1998.
Lagaris et al. (2000) Isaac E Lagaris, Aristidis C Likas, and Dimitris G Papageorgiou. Neural-network methods for boundary value problems with irregular boundaries. IEEE Transactions on Neural Networks, 11(5):1041–1049, 2000.
Ledezma & Haddadin (2017) Fernando Díaz Ledezma and Sami Haddadin. First-order-principles-based constructive network topologies: An application to robot inverse dynamics. In IEEE-RAS International Conference on Humanoid Robotics, 2017, pp. 438–445. IEEE, 2017.
Lenz et al. (2015) Ian Lenz, Ross A Knepper, and Ashutosh Saxena. Deepmpc: Learning deep latent features for model predictive control. In Robotics: Science and Systems, 2015.
Liu et al. (1993) Kai Liu, Frank Lewis, Guy Lebret, and David Taylor. The singularities and dynamics of a stewart platform manipulator. Journal of Intelligent and Robotic Systems, 8(3):287–308, 1993.
Long et al. (2017) Zichao Long, Yiping Lu, Xianzhong Ma, and Bin Dong. Pde-net: Learning pdes from data. arXiv preprint arXiv:1710.09668, 2017.
Luh et al. (1980) John YS Luh, Michael W Walker, and Richard PC Paul. On-line computational scheme for mechanical manipulators. Journal of Dynamic Systems, Measurement, and Control, 102(2):69–76, 1980.
Miller (1992) K Miller. The lagrange-based model of delta-4 robot dynamics. Robotersysteme, 8:49–54, 1992.
Mnih et al. (2015) Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015.
Nguyen-Tuong & Peters (2010) Duy Nguyen-Tuong and Jan Peters. Using model knowledge for learning inverse dynamics. In International Conference on Robotics and Automation, pp. 2677–2682, 2010.
Nguyen-Tuong & Peters (2011) Duy Nguyen-Tuong and Jan Peters. Model learning for robot control: a survey. Cognitive Processing, 12(4):319–340, 2011.
Nguyen-Tuong et al. (2009) Duy Nguyen-Tuong, Matthias Seeger, and Jan Peters. Model learning with local gaussian process regression. Advanced Robotics, 23(15):2015–2034, 2009.
Raissi & Karniadakis (2018) Maziar Raissi and George Em Karniadakis. Hidden physics models: Machine learning of nonlinear partial differential equations. Journal of Computational Physics, 357:125–141, 2018.
Raissi et al. (2017) Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Physics informed deep learning (part i): Data-driven solutions of nonlinear partial differential equations. arXiv preprint arXiv:1711.10561, 2017.
Rueckert et al. (2017) Elmar Rueckert, Moritz Nakatenus, Samuele Tosatto, and Jan Peters. Learning inverse dynamics models in o (n) time with lstm networks. In IEEE-RAS International Conference on Humanoid Robotics, pp. 811–816. IEEE, 2017.
Sanchez-Gonzalez et al. (2018) Alvaro Sanchez-Gonzalez, Nicolas Heess, Jost Tobias Springenberg, Josh Merel, Martin Riedmiller, Raia Hadsell, and Peter Battaglia. Graph networks as learnable physics engines for inference and control. arXiv preprint arXiv:1806.01242, 2018.
Schaal et al. (2002) Stefan Schaal, Christopher G Atkeson, and Sethu Vijayakumar. Scalable techniques from nonparametric statistics for real time robot learning. Applied Intelligence, 17(1):49–60, 2002.
Siciliano & Khatib (2016) Bruno Siciliano and Oussama Khatib. Springer handbook of robotics. Springer, 2016.
Silver et al. (2017) David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human knowledge. Nature, 550(7676):354, 2017.
Sirignano & Spiliopoulos (2017) Justin Sirignano and Konstantinos Spiliopoulos. Dgm: A deep learning algorithm for solving partial differential equations. arXiv preprint arXiv:1708.07469, 2017.
Spong (1987) Mark W Spong. Modeling and control of elastic joint robots. Journal of dynamic systems, measurement, and control, 109(4):310–318, 1987.
Ting et al. (2006) Jo-Anne Ting, Michael Mistry, Jan Peters, Stefan Schaal, and Jun Nakanishi. A bayesian approach to nonlinear parameter identification for rigid body dynamics. In Robotics: Science and Systems, pp. 32–39, 2006.
Williams et al. (2008) Ben Williams, Marc Toussaint, and Amos J Storkey. Modelling motion primitives and their timing in biologically executed movements. In Advances in Neural Information Processing Systems 20, pp. 1609–1616. 2008.
Zhou et al. (1996) Kemin Zhou, John Comstock Doyle, Keith Glover, et al. Robust and optimal control, volume 40. Prentice Hall, New Jersey, 1996.

Appendix A: Offline Benchmarks

To evaluate the performance of DeLaN without the control task, DeLaN was trained offline on previously collected data and evaluated using the mean squared error (MSE) on the test and training set. For comparison, DeLaN is compared to the system identification approach (SI) described by Atkeson et al. (1986), a feed-forward neural network (FF-NN) and the Recursive Newton Euler algorithm (RNE) using an analytic model. For this comparison, one must point out that the system identification approach relies on the availability of the kinematics, as the Jacobians and transformations w.r.t. to every link must be known to compute the necessary features. In contrast, neither DeLaN nor the FF-NN require this knowledge and must implicitly also learn the kinematics.

Figure 7 shows the MSE averaged over 20 seeds on the character data set executed on the two-joint robot. For this data set, the models are trained using noisy samples and evaluated on the noise-free and previously unseen characters. The FF-NN performs the best on the training set, but overfits to the training data. Therefore, the FF-NN does not generalize to unseen characters. In contrast, the SI approach does not overfit to the noise and extrapolates to previously unseen characters. In comparison, the structure of DeLaN regularizes the training and prevents the overfitting to the corrupted training data. Therefore, DeLaN extrapolates better than the FF-NN but not as good as the SI approach. Similar results can be observed on the cosine data set using the Barrett WAM simulated in SL (Figure 8 a, b). The FF-NN performs best on the training trajectory but the performance deteriorates when this network extrapolates to higher velocities. SI performs worse on the training trajectory but extrapolates to higher velocities. In comparison, DeLaN performs comparable to the SI approach on the training trajectory, extrapolates significantly better than the FF-NN but does not extrapolate as good as the SI approach. For the physical system (Figure 8 c, d), the results differ from the results in simulation. On the physical system the SI approach only achieves the same performance as RNE, which is significantly worse compared to the performance of DeLaN and the FF-NN. When evaluating the extrapolation to higher velocities, the analytic model and the SI approach extrapolate to higher velocities, while the MSE for the FF-NN significantly increases. In comparison, DeLaN extrapolates better compared to the FF-NN but not as good as the analytic model or the SI approach.

This performance difference between the simulation and physical system can be explained by the underlying model assumptions and the robustness to noise. While DeLaN only assumes rigid-body dynamics, the SI approach also assumes the exact knowledge of the kinematic structure. For simulation both assumptions are valid. However, for the physical system, the exact kinematics are unknown due to production imperfections and the direct cable drives applying torques to flexible joints violate the rigid-body assumption. Therefore, the SI approach performs significantly worse on the physical system. Furthermore, the noise robustness becomes more important for the physical system due to the inherent sensor noise. While the linear regression of the SI approach is easily corrupted by noise or outliers, the gradient based optimization of the networks is more robust to noise. This robustness can be observed in Figure 9, which shows the correlation between the variance of Gaussian noise corrupting the training data and the MSE of the simulated and noise-free cosine trajectories. With increasing noise levels, the MSE of the SI approach increases significantly faster compared to the models learned using gradient descent.

Concluding, the extrapolation of DeLaN to unseen trajectories and higher velocities is not as good as the SI approach but significantly better than the generic FF-NN. This increased extrapolation compared to the generic network is achieved by the Lagrangian Mechanics prior of DeLaN. Even though this prior promotes extrapolation, the prior also hinders the performance on the physical robot, because the prior cannot represent the dynamics of the direct cable drives. Therefore, DeLaN performs worse than the FF-NN, which does not assume any model structure. However, DeLaN outperforms the SI approach on the physical system, which also assumes rigid-body dynamics and requires the exact knowledge of the kinematics.

Deep Lagrangian Networks: Using Physics as Model Prior for Deep Learning深度拉格朗日网络： 使用物理学作为深度学习的模型先验