DeepCAD: A Deep Generative Network for Computer-Aided Design Models
DeepCAD: 用于计算机辅助设计模型的深度生成网络

Rundi Wu Chang Xiao Changxi Zheng
吴昌晓郑长喜
Columbia University
{rundi, chang, cxz}@cs.columbia.edu
哥伦比亚大学 {rundi, chang, cxz}@cs.columbia.edu

Abstract 摘要

Deep generative models of 3D shapes have received a great deal of research interest. Yet, almost all of them generate discrete shape representations, such as voxels, point clouds, and polygon meshes. We present the first 3D generative model for a drastically different shape representation—describing a shape as a sequence of computer-aided design (CAD) operations. Unlike meshes and point clouds, CAD models encode the user creation process of 3D shapes, widely used in numerous industrial and engineering design tasks. However, the sequential and irregular structure of CAD operations poses significant challenges for existing 3D generative models. Drawing an analogy between CAD operations and natural language, we propose a CAD generative network based on the Transformer. We demonstrate the performance of our model for both shape autoencoding and random shape generation. To train our network, we create a new CAD dataset consisting of 178,238 models and their CAD construction sequences. We have made this dataset publicly available to promote future research on this topic.
三维形状的深度生成模型已引起大量研究兴趣。然而，几乎所有模型都生成离散形状表示，如体素、点云和多边形网格。我们提出了首个针对截然不同的形状表示的三维生成模型——将形状描述为一系列计算机辅助设计（CAD）操作。与网格和点云不同，CAD 模型编码了三维形状的用户创建过程，广泛应用于众多工业和工程设计任务。然而，CAD 操作的序列化和不规则结构给现有的三维生成模型带来了重大挑战。我们将 CAD 操作与自然语言进行类比，提出了基于 Transformer 的 CAD 生成网络。我们展示了该模型在形状自动编码和随机形状生成方面的性能。为训练我们的网络，我们创建了一个新的 CAD 数据集，包含 178,238 个模型及其 CAD 构造序列。我们已将此数据集公开以促进该主题的未来研究。

1 Introduction 1 引言

It is our human nature to imagine and invent, and to express our invention in 3D shapes. This is what the paper and pencil were used for when Leonardo da Vinci sketched his mechanisms; this is why such drawing tools as the parallel bar, the French curve, and the divider were devised; and this is wherefore, in today’s digital era, the computer aided design (CAD) software have been used for 3D shape creation in a myriad of industrial sectors, ranging from automotive and aerospace to manufacturing and architectural design.
想象和创造是我们人类的本性，而将我们的创造以三维形状表达出来。这就是当列奥纳多·达·芬奇绘制他的机械装置时，纸和笔所起的作用；这就是为什么人们发明了平行尺、法曲线规和分规等绘图工具；这也是在当今数字时代，计算机辅助设计（CAD）软件被广泛应用于从汽车和航空航天到制造业和建筑设计等众多工业领域进行三维形状创建的原因。

Refer to caption — Figure 1: A gallery of generated CAD designs. Our generative network is able to produce a diverse range of CAD designs. Each CAD model consists of a sequence of CAD operations with specific parameters. The resulting 3D shapes are clean, have sharp geometric features, and can be readily user-edited.
图 1：生成的 CAD 设计画廊。我们的生成网络能够产生多样化的 CAD 设计。每个 CAD 模型由一系列具有特定参数的 CAD 操作组成。生成的三维形状干净、具有清晰的几何特征，并且可以方便地进行用户编辑。

Can the machine also invent 3D shapes? Leveraging the striking advance in generative models of deep learning, lots of recent research efforts have been directed to the generation of 3D models. However, existing 3D generative models merely create computer discretization of 3D shapes: 3D point clouds [6, 52, 53, 8, 30], polygon meshes [17, 42, 31], and levelset fields [12, 33, 29, 50, 11]. Still missing is the ability to generate the very nature of 3D shape design—the drawing process.
机器也能发明三维形状吗？借助深度学习生成模型的显著进步，大量近期研究工作都集中在三维模型的生成上。然而，现有的三维生成模型仅仅创建三维形状的计算机离散化：三维点云[6, 52, 53, 8, 30]、多边形网格[17, 42, 31]和水平集场[12, 33, 29, 50, 11]。仍然缺失的是生成三维形状设计本质——绘制过程的能力。

We propose a deep generative network that outputs a sequence of operations used in CAD tools (such as SolidWorks and AutoCAD) to construct a 3D shape. Generally referred as a CAD model, such an operational sequence represents the “drawing” process of shape creation. Today, almost all the industrial 3D designs start with CAD models. Only until later in the production pipeline, if needed, they are discretized into polygon meshes or point clouds.
我们提出一个深度生成网络，该网络输出一系列在 CAD 工具（如 SolidWorks 和 AutoCAD）中用于构建三维形状的操作。通常称为 CAD 模型，这种操作序列代表了形状创建的“绘制”过程。如今，几乎所有工业三维设计都从 CAD 模型开始。只有在生产流程后期，如果需要，它们才会被离散化为多边形网格或点云。

To our knowledge, this is the first work toward a generative model of CAD designs. The challenge lies in the CAD design’s sequential and parametric nature. A CAD model consists of a series of geometric operations (e.g., curve sketch, extrusion, fillet, boolean, chamfer), each controlled by certain parameters. Some of the parameters are discrete options; others have continuous values (more discussion in Sec. 3.1). These irregularities emerge from the user creation process of 3D shapes, and thus contrast starkly to the discrete 3D representations (i.e., voxels, point clouds, and meshes) used in existing generative models. In consequence, previously developed 3D generative models are unsuited for CAD model generation.
据我们所知，这是首个面向 CAD 设计生成模型的研究工作。挑战在于 CAD 设计的顺序性和参数化特性。一个 CAD 模型由一系列几何操作（例如，曲线绘制、拉伸、圆角、布尔运算、倒角）组成，每个操作都由特定参数控制。其中一些参数是离散选项；另一些参数具有连续值（详见第 3.1 节）。这些不规则性源于用户创建三维形状的过程，因此与现有生成模型中使用的离散三维表示（即体素、点云和网格）形成鲜明对比。因此，先前开发的 3D 生成模型不适用于 CAD 模型生成。

Technical contributions.
技术贡献。

To overcome these challenges, we seek a representation that reconciles the irregularities in CAD models. We consider the most frequently used CAD operations (or commands), and unify them in a common structure that encodes their command types, parameters, and sequential orders. Next, drawing an analogy between CAD command sequences and natural languages, we propose an autoencoder based on the Transformer network [40]. It embeds CAD models into a latent space, and later decode a latent vector into a CAD command sequence. To train our autoencoder, we further create a new dataset of CAD command sequences, one that is orders of magnitude larger than the existing dataset of the same type. We have also made this dataset publicly available¹¹1Code and data are available here. to promote future research on learning-based CAD designs.
为了克服这些挑战，我们寻求一种能够协调 CAD 模型中不规则性的表示方法。我们考虑了最常用的 CAD 操作（或命令），并将它们统一到一个共同的结构中，该结构编码了它们的命令类型、参数和顺序。接着，我们将 CAD 命令序列与自然语言进行类比，提出了一种基于 Transformer 网络的自动编码器[40]。它将 CAD 模型嵌入到潜在空间中，并随后将一个潜在向量解码为一个 CAD 命令序列。为了训练我们的自动编码器，我们进一步创建了一个新的 CAD 命令序列数据集，其规模比现有同类数据集大几个数量级。我们还公开了这个数据集 ¹¹1Code and data are available here. ，以促进基于学习的 CAD 设计未来的研究。

Our method is able to generate plausible and diverse CAD designs (see Fig. 1). We carefully evaluate its generation quality through a series of ablation studies. Lastly, we end our presentation with an outlook on useful applications enabled by our CAD autoencoder.
我们的方法能够生成合理且多样的 CAD 设计（见图 1）。我们通过一系列消融研究仔细评估了其生成质量。最后，我们以 CAD 自动编码器带来的有用应用前景作为本次介绍的结尾。

2 Related work 2 相关工作

Parametric shape inference.
参数化形状推理。

Advance in deep learning has enabled neural network models that analyze geometric data and infer parametric shapes. ParSeNet [38] decomposes a 3D point cloud into a set of parametric surface patches. PIE-NET [43] extracts parametric boundary curves from 3D point clouds. UV-Net [19] and BrepNet [24] focus on encoding a parametric model’s boundary curves and surfaces. Li et al. [25] trained a neural network on synthetic data to convert 2D user sketches into CAD operations. Recently, Xu et al. [51] applied neural-guided search to infer CAD modeling sequence from parametric solid shapes.
深度学习的进步使得神经网络模型能够分析几何数据并推断参数化形状。ParSeNet [ 38] 将 3D 点云分解为一系列参数化表面块。PIE-NET [ 43] 从 3D 点云中提取参数化边界曲线。UV-Net [ 19] 和 BrepNet [ 24] 专注于编码参数化模型的边界曲线和表面。Li 等人 [ 25] 在合成数据上训练神经网络，将 2D 用户草图转换为 CAD 操作。最近，Xu 等人 [ 51] 将神经引导搜索应用于从参数化实体形状中推断 CAD 建模序列。

Generative models of 3D shapes.
三维形状的生成模型。

Recent years have also witnessed increasing research interests on deep generative models for 3D shapes. Most existing methods generate 3D shapes in discrete forms, such as voxelized shapes [49, 16, 27, 26], point clouds [6, 52, 53, 8, 30], polygon meshes [17, 42, 31], and implicit signed distance fields [12, 33, 29, 50, 11]. The resulting shapes may still suffer from noise, lack sharp geometric features, and are not directly user editable.
近年来，深度生成模型在三维形状方面的研究兴趣也日益增加。大多数现有方法以离散形式生成三维形状，例如体素化形状 [ 49, 16, 27, 26]、点云 [ 6, 52, 53, 8, 30]、多边形网格 [ 17, 42, 31] 和隐式符号距离场 [ 12, 33, 29, 50, 11]。生成的形状可能仍然存在噪声、缺乏尖锐的几何特征，并且不能直接进行用户编辑。

Therefore, more recent works have sought neural network models that generate 3D shape as a series of geometric operations. CSGNet [37] infers a sequence of Constructive Solid Geometry (CSG) operations based on voxelized shape input; and UCSG-Net [21] further advances the inference with no supervision from ground truth CSG trees. Other than using CSG operations, several works synthesize 3D shapes using their proposed domain specific languages (DSLs) [39, 41, 30, 20]. For example, Jones et al. [20] proposed ShapeAssembly, a DSL that constructs 3D shapes by structuring cuboid proxies in a hierarchical and symmetrical fashion, and this structure can be generated through a variational autoencoder.
因此，近期的研究工作寻求神经网络模型来生成一系列几何操作构成的 3D 形状。CSGNet [ 37] 基于体素化的形状输入推断出一系列构造实体几何（CSG）操作；而 UCSG-Net [ 21] 则进一步推进了这一推断，无需来自真实 CSG 树的监督。除了使用 CSG 操作外，一些研究工作使用其提出的特定领域语言（DSLs） [ 39, 41, 30, 20] 来合成 3D 形状。例如，Jones 等人 [ 20] 提出了 ShapeAssembly，这是一种通过分层和对称方式构建立方体代理来构造 3D 形状的 DSL，这种结构可以通过变分自编码器生成。

In contrast to all these works, our autoencoder network outputs CAD models specified as a sequence of CAD operations. CAD models have become the standard shape representation in almost every sectors of industrial production. Thus, the output from our network can be readily imported into any CAD tools [1, 2, 3] for user editing. It can also be directly converted into other shape formats such as point clouds and polygon meshes. To our knowledge, this is the first generative model directly producing CAD designs.
与所有这些工作不同，我们的自动编码器网络输出的是一系列 CAD 操作指定的 CAD 模型。CAD 模型已成为工业生产几乎所有领域的标准形状表示。因此，我们网络的输出可以方便地导入任何 CAD 工具[1, 2, 3]供用户编辑。它也可以直接转换为点云、多边形网格等其他形状格式。据我们所知，这是第一个直接生成 CAD 设计的生成模型。

Transformer-based models.
基于 Transformer 的模型。

Technically, our work is related to the Transformer network [40], which was introduced as an attention-based building block for many natural language processing tasks [13]. The success of the Transformer network has also inspired its use in image processing tasks [34, 9, 14] and for other types of data [31, 10, 44]. Concurrent works [47, 32, 15] on constrained CAD sketches generation also rely on Transformer network.
从技术上讲，我们的工作与 Transformer 网络[40]相关，该网络被引入作为基于注意力的自然语言处理任务[13]的基本构建块。Transformer 网络的成功也启发了其在图像处理任务[34, 9, 14]和其他类型数据[31, 10, 44]中的应用。同时进行的约束 CAD 草图生成工作[47, 32, 15]也依赖于 Transformer 网络。

Also related to our work is DeepSVG [10], a Transformer-based network for the generation of Scalable Vector Graphic (SVG) images. SVG images are described by a collection of parametric primitives (such as lines and curves). Apart from limited in 2D, those primitives are grouped with no specific order or dependence. In contrast, CAD commands are described in 3D; they can be interdependent (e.g., through CSG boolean operations) and must follow a specific order. We therefore seek a new way to encode CAD commands and their sequential order in a Transformer-based autoencoder.
与我们的工作相关的是 DeepSVG [10]，这是一个基于 Transformer 的网络，用于生成可缩放矢量图形（SVG）图像。SVG 图像由一组参数化基元（如线条和曲线）描述。除了在二维中受限外，这些基元被分组，且没有特定的顺序或依赖关系。相比之下，CAD 命令在三维中描述；它们可以是相互依赖的（例如，通过 CSG 布尔运算），并且必须遵循特定的顺序。因此，我们寻求一种新的方法来编码 CAD 命令及其顺序，并将其输入到基于 Transformer 的自动编码器中。

3 Method 3 方法

We now present our DeepCAD model, which revolves around a new representation of CAD command sequences (Sec. 3.1.2). Our CAD representation is specifically tailored, for feeding into neural networks such as the proposed Transformer-based autoencoder (Sec. 3.2). It also leads to a natural objective function for training (Sec. 3.4). To train our network, we create a new dataset, one that is significantly larger than existing datasets of the same type (Sec. 3.3), and one that itself can serve beyond this work for future research.
我们现在介绍 DeepCAD 模型，该模型围绕 CAD 命令序列的新表示（第 3.1.2 节）。我们的 CAD 表示是专门为输入到神经网络（如所提出的基于 Transformer 的自动编码器，第 3.2 节）而设计的。它还自然地导出了一个用于训练的目标函数（第 3.4 节）。为了训练我们的网络，我们创建了一个新的数据集，该数据集比同类型现有数据集大得多（第 3.3 节），并且本身也可以为未来的研究工作服务。

3.1 CAD Representation for Neural Networks
3.1 用于神经网络的 CAD 表示

The CAD model offers two levels of representation. At the user-interaction level, a CAD model is described as a sequence of operations that the user performs (in CAD software) to create a solid shape—for example, a user may $\mathtt{sketch}$ a closed curve profile on a 2D plane, and then $\mathtt{extrude}$ it into a 3D solid shape, which is further processed by other operations such as a boolean $\mathtt{union}$ with another already created solid shape (see Fig. 2). We refer to such a specification as a CAD command sequence.
CAD 模型提供两个级别的表示。在用户交互层面，CAD 模型被描述为用户在 CAD 软件中执行的一系列操作，以创建一个实体形状——例如，用户可以在二维平面上 $\mathtt{sketch}$ 一个闭合曲线轮廓，然后 $\mathtt{extrude}$ 它为一个三维实体形状，该形状再通过其他操作（如与另一个已创建的实体形状进行布尔 $\mathtt{union}$ 运算）进行进一步处理（见图 2）。我们将此类规范称为 CAD 命令序列。

Behind the command sequence is the CAD model’s kernel representation, widely known as the boundary representation (or B-rep) [45, 46]. Provided a command sequence, its B-rep is automatically computed (often through the industry standard library Parasolid). It consists of topological components (i.e., vertices, parametric edges and faces) and the connections between them to form a solid shape.
在命令序列背后是 CAD 模型的内核表示，即广为人知的边界表示（或 B-rep）[ 45, 46]。给定一个命令序列，其 B-rep 会自动计算（通常通过行业标准库 Parasolid）。它由拓扑组件（即顶点、参数化边和面）及其之间的连接构成，形成实体形状。

In this work, we aim for a generative model of CAD command sequences, not B-reps. This is because the B-rep is an abstraction from the command sequence: a command sequence can be easily converted into a B-rep, but the converse is hard, as different command sequences may result in the same B-rep. Moreover, a command sequence is human-interpretable; it can be readily edited (e.g., by importing them into CAD tools such as AutoCAD and Onshape), allowing them to be used in various downstream applications.
在本工作中，我们旨在构建 CAD 命令序列的生成模型，而非 B-rep。这是因为 B-rep 是对命令序列的抽象：命令序列可以轻易转换为 B-rep，但反之则困难，因为不同的命令序列可能导致相同的 B-rep。此外，命令序列是可人类解释的；它可以被轻易编辑（例如，通过导入到 AutoCAD 和 Onshape 等 CAD 工具中），使其能在各种下游应用中使用。

3.1.1 Specification of CAD Commands
3.1.1 CAD 命令规范

Full-fledged CAD tools support a rich set of commands, although in practice only a small fraction of them are commonly used. Here, we consider a subset of the commands that are of frequent use (see Table 1). These commands fall into two categories, namely sketch and extrusion. While conceptually simple, they are sufficiently expressive to generate a wide variety of shapes, as has been demonstrated in [48].
完整的 CAD 工具支持丰富的命令集，但在实际应用中只有一小部分被常用。这里我们考虑常用命令的子集（见表 1）。这些命令分为两类，即草图和拉伸。虽然概念上简单，但它们足够表达以生成各种形状，正如[48]中所示。

Commands 命令	Parameters 参数
$\langle\mathtt{SOL}\rangle$	$\emptyset$
$\mathtt{L}$ (Line) (行)	$\begin{aligned} \quad\quad\;x,y:\text{line end-point}\end{aligned}$
$\mathtt{A}$ (Arc)	$\begin{aligned} \quad\quad\;x,y&:\text{arc end-point}\\ \alpha&:\text{sweep angle}\\ f&:\text{counter-clockwise flag}\end{aligned}$
$\mathtt{R}$ (Circle)	$\begin{aligned} \quad\quad\;x,y&:\text{center}\\ r&:\text{radius}\end{aligned}$
$\mathtt{E}$ (Extrude) (拉伸)	$\begin{aligned} \theta,\phi,\gamma&:\text{sketch plane orientation}\\ p_{x},p_{y},p_{z}&:\text{sketch plane origin}\\ s&:\text{scale of associated sketch profile}\\ e_{1},e_{2}&:\text{extrude distances toward both sides}\\ b&:\text{boolean type},\quad u:\text{extrude type}\end{aligned}$
$\langle\mathtt{EOS}\rangle$	$\emptyset$

Table 1: CAD commands and their parameters.

\langle\mathtt{SOL}\rangle

indicates the start of a loop;

\langle\mathtt{EOS}\rangle

indicates the end of the whole sequence.
表 1：CAD 命令及其参数。

\langle\mathtt{SOL}\rangle

表示循环的开始；

\langle\mathtt{EOS}\rangle

表示整个序列的结束。

Sketch. 草图。

Sketch commands are used to specify closed curves on a 2D plane in 3D space. In CAD terminology, each closed curve is referred as a loop, and one or more loops form a closed region called a profile (see “Sketch 1” in Fig. 2). In our representation, a profile is described by a list of loops on its boundary; a loop always starts with an indicator command $\langle\mathtt{SOL}\rangle$ followed by a series of curve commands $C_{i}$ . We list all the curves on the loop in counter-clockwise order, beginning with the curve whose starting point is at the most bottom-left; and the loops in a profile are sorted according to the bottom-left corners of their bounding boxes. Figure 2 illustrates two sketch profiles.
草图命令用于在 3D 空间中的 2D 平面上指定闭合曲线。在 CAD 术语中，每个闭合曲线被称为一个循环，一个或多个循环形成一个称为轮廓的闭合区域（参见图 2 中的“草图 1”）。在我们的表示中，轮廓由其边界上的循环列表描述；一个循环总是以指示命令 $\langle\mathtt{SOL}\rangle$ 开头，后跟一系列曲线命令 $C_{i}$ 。我们按逆时针顺序列出循环上的所有曲线，从起点在最左下方的曲线开始；轮廓中的循环根据其边界框的左下角进行排序。图 2 说明了两个草图轮廓。

In practice, we consider three kinds of curve commands that are the most widely used: draw a $\mathtt{line}$ , an $\mathtt{arc}$ , and a $\mathtt{circle}$ . While other curve commands can be easily added (see Sec. 5), statistics from our large-scale real-world dataset (described in Sec. 3.3) show that these three types of commands constitute 92% of the cases.
在实际应用中，我们考虑了三种最广泛使用的曲线命令：绘制 $\mathtt{line}$ 、 $\mathtt{arc}$ 和 $\mathtt{circle}$ 。虽然其他曲线命令可以轻松添加（见第 5 节），但来自我们大规模真实世界数据集（在第 3.3 节中描述）的统计数据表明，这三种类型的命令构成了 92% 的情况。

Each curve command $C_{i}$ is described by its curve type $t_{i}\in\{\langle\mathtt{SOL}\rangle,\mathtt{L},\mathtt{A},\mathtt{R}\}$ and its parameters listed in Table 1. Curve parameters specify the curve’s 2D location in the sketch plane’s local frame of reference, whose own position and orientation in 3D will be described shortly in the associated extrusion command. Since the curves in each loop are concatenated one after another, for the sake of compactness we exclude the curve’s starting position from its parameter list; each curve always starts from the ending point of its predecessor in the loop. The first curve always starts from the origin of the sketch plane, and the world-space coordinate of the origin is specified in the extrusion command.
每条曲线命令 $C_{i}$ 都由其曲线类型 $t_{i}\in\{\langle\mathtt{SOL}\rangle,\mathtt{L},\mathtt{A},\mathtt{R}\}$ 和表 1 中列出的参数描述。曲线参数指定曲线在草图平面局部参考系中的 2D 位置，其在 3D 中的位置和方向将在相关的拉伸命令中 shortly 描述。由于每个循环中的曲线是依次连接的，为了简洁起见，我们将其起始位置从参数列表中排除；每个曲线总是从循环中前一个曲线的结束点开始。第一条曲线总是从草图平面的原点开始，原点的世界空间坐标在拉伸命令中指定。

In short, a sketch profile $S$ is described by a list of loops $S=[Q_{1},\dots,Q_{N}]$ , where each loop $Q_{i}$ consists of a series of curves starting from the indicator command $\langle\mathtt{SOL}\rangle$ (i.e., $Q_{i}=[\langle\mathtt{SOL}\rangle,C_{1},\dots,C_{n_{i}}]$ ), and each curve command $C_{j}=(t_{j},\bm{p}_{j})$ specifies the curve type $t_{i}$ and its shape parameters $\bm{p}_{j}$ (see Fig. 2).
简而言之，草图轮廓 $S$ 由一系列环 $S=[Q_{1},\dots,Q_{N}]$ 描述，其中每个环 $Q_{i}$ 由一系列从指示命令 $\langle\mathtt{SOL}\rangle$ （即 $Q_{i}=[\langle\mathtt{SOL}\rangle,C_{1},\dots,C_{n_{i}}]$ ）开始的曲线组成，每个曲线命令 $C_{j}=(t_{j},\bm{p}_{j})$ 指定曲线类型 $t_{i}$ 及其形状参数 $\bm{p}_{j}$ （见图 2）。

Extrusion. 拉伸。

The extrusion command serves two purposes. 1) It extrudes a sketch profile from a 2D plane into a 3D body, and the extrusion type can be either one-sided, symmetric, or two-sided with respect to the profile’s sketch plane. 2) The command also specifies (through the parameter $b$ in Table 1) how to merge the newly extruded 3D body with the previously created shape by one of the boolean operations: either creating a new body, or joining, cutting or intersecting with the existing body.
挤压命令有两个作用。1) 它将二维平面上的草图轮廓挤压成三维实体，挤压类型可以是相对于轮廓草图平面的单侧、对称或双侧。2) 该命令还通过表 1 中的参数 $b$ 指定如何通过布尔运算将新挤压的三维实体与先前创建的形状合并：创建新实体，或与现有实体连接、切割或相交。

The extruded profile—which consists of one or more curve commands—is always referred to the one described immediately before the extrusion command. The extrusion command therefore needs to define the 3D orientation of that profile’s sketch plane and its 2D local frame of reference. This is defined by a rotational matrix, determined by $(\theta,\gamma,\phi)$ parameters in Table 1. This matrix is to align the world frame of reference to the plane’s local frame of reference, and to align $z$ -axis to the plane’s normal direction. In addition, the command parameters include a scale factor $s$ of the extruded profile; the rationale behind this scale factor will be discussed in Sec. 3.1.2.
挤压轮廓——由一个或多个曲线命令组成——始终参考挤压命令之前立即描述的那个轮廓。因此，挤压命令需要定义该轮廓草图平面的三维方向及其二维局部参考框架。这由一个旋转矩阵定义，该矩阵由表 1 中的 $(\theta,\gamma,\phi)$ 参数确定。该矩阵用于将世界参考框架与平面的局部参考框架对齐，并将 $z$ 轴与平面的法线方向对齐。此外，命令参数还包括挤压轮廓的缩放因子 $s$ ；该缩放因子的原理将在第 3.1.2 节中讨论。

With these commands, we describe a CAD model $M$ as a sequence of curve commands interleaved with extrusion commands (see Fig. 2). In other words, $M$ is a command sequence $M=[C_{1},\dots,C_{N_{c}}]$ , where each $C_{i}$ has the form $(t_{i},\bm{p}_{i})$ specifying the command type $t_{i}$ and parameters $\bm{p}_{i}$ .
使用这些命令，我们将 CAD 模型 $M$ 描述为一系列交替的曲线命令和挤压命令的序列（见图 2）。换句话说， $M$ 是一个命令序列 $M=[C_{1},\dots,C_{N_{c}}]$ ，其中每个 $C_{i}$ 都具有形式 $(t_{i},\bm{p}_{i})$ ，指定命令类型 $t_{i}$ 和参数 $\bm{p}_{i}$ 。

3.1.2 Network-friendly Representation
3.1.2 网络友好表示

Our specification of a CAD model $M$ is akin to natural language. The vocabulary consists of individual CAD commands expressed sequentially to form sentences. The subject of a sentence is the sketch profile; the predicate is the extrusion. This analogy suggests that we may leverage the network structures, such as the Transformer network [40], succeeded in natural language processing to fulfill our goal.
我们为 CAD 模型 $M$ 的规范类似于自然语言。词汇由单个 CAD 命令按顺序表达形成句子。句子的主语是草图轮廓；谓语是拉伸。这种类比表明我们可以利用在自然语言处理中成功的网络结构，如 Transformer 网络[40]，来实现我们的目标。

However, the CAD commands also differ from natural language in several aspects. Each command has a different number of parameters. In some commands (e.g., the extrusion), the parameters are a mixture of both continuous and discrete values, and the parameter values span over different ranges (recall Table 1). These traits render the command sequences ill-posed for direct use in neural networks.
然而，CAD 命令在多个方面与自然语言存在差异。每个命令具有不同数量的参数。在某些命令（例如挤压命令）中，参数是连续值和离散值的混合，且参数值跨越不同的范围（回顾表 1）。这些特性使得命令序列对于直接用于神经网络是不良定义的。

To overcome this challenge, we regularize the dimensions of command sequences. First, for each command, its parameters are stacked into a 16 $\times$ 1 vector, whose elements correspond to the collective parameters of all commands in Table 1 (i.e., $\bm{p}_{i}=[x,y,\alpha,f,r,\theta,\phi,\gamma,p_{x},p_{y},p_{z},s,e_{1},e_{2},b,u]$ ). Unused parameters for each command are simply set to be $-1$ . Next, we fix the total number $N_{c}$ of commands in every CAD model $M$ . This is done by padding the CAD model’s command sequence with the empty command $\langle\mathtt{EOS}\rangle$ until the sequence length reaches $N_{c}$ . In practice, we choose $N_{c}=60$ , the maximal command sequence length appeared in our training dataset.
为了克服这一挑战，我们对命令序列的维度进行正则化。首先，对于每个命令，其参数被堆叠成一个 16 维向量，其元素对应于表 1 中所有命令的集体参数（即 $\bm{p}_{i}=[x,y,\alpha,f,r,\theta,\phi,\gamma,p_{x},p_{y},p_{z},s,e_{1},e_{2},b,u]$ ）。每个命令未使用的参数简单地设置为 $-1$ 。接下来，我们固定每个 CAD 模型中的命令总数 $N_{c}$ 。这是通过用空命令 $\langle\mathtt{EOS}\rangle$ 填充 CAD 模型的命令序列，直到序列长度达到 $N_{c}$ 来完成的。在实践中，我们选择 $N_{c}=60$ ，即训练数据集中出现的最大命令序列长度。

Furthermore, we unify continuous and discrete parameters by quantizing the continuous parameters. To this end, we normalize every CAD model within a $2\times 2\times 2$ cube; we also normalize every sketch profile within its bounding box, and include a scale factor $s$ (in extrusion command) to restore the normalized profile into its original size. The normalization restricts the ranges of continuous parameters, allowing us to quantize their values into 256 levels and express them using 8-bit integers. As a result, all the command parameters possess only discrete sets of values.
此外，我们通过量化连续参数来统一连续和离散参数。为此，我们将每个 CAD 模型在 $2\times 2\times 2$ 立方体内进行归一化；同时，将每个草图轮廓在其边界框内进行归一化，并包含一个缩放因子 $s$ （在拉伸命令中）以将归一化轮廓恢复到原始大小。归一化限制了连续参数的范围，使我们能够将它们的值量化为 256 级，并使用 8 位整数表示。因此，所有命令参数都只具有离散的值集合。

Not simply is the parameter quantization a follow-up of the common practice for training Transformer-based networks [36, 31, 44]. Particularly for CAD models, it is crucial for improving the generation quality (as we empirically confirm in Sec. 4.1). In CAD designs, certain geometric relations—such as parallel and perpendicular sketch lines—must be respected. However, if a generative model directly generates continuous parameters, their values, obtained through parameter regression, are prone to errors that will break these strict relations. Instead, parameter quantization allows the network to “classify” parameters into specific levels, and thereby better respect learned geometric relations.
参数量化不仅仅是基于 Transformer 网络的常规训练方法[36, 31, 44]的后续步骤。特别是对于 CAD 模型，它对于提高生成质量至关重要（我们在第 4.1 节中通过实验验证了这一点）。在 CAD 设计中，必须遵守某些几何关系——例如平行和垂直的草图线。然而，如果生成模型直接生成连续参数，通过参数回归获得的值容易产生错误，从而破坏这些严格的关系。相反，参数量化允许网络将参数“分类”到特定级别，从而更好地尊重学习到的几何关系。

In Sec. 4.1, we will present ablation studies that empirically justify our choices of CAD command representation.
在第 4.1 节，我们将展示消融研究，通过实验证明我们对 CAD 命令表示的选择。

3.2 Autoencoder for CAD Models
3.2 CAD 模型的自动编码器

We now introduce an autoencoder network that leverages our representation of CAD commands. Figure 3 illustrates its structure, and more details are provided in Sec. C of supplementary document. Once trained, the decoder part of the network will serve naturally as a CAD generative model.
我们现在介绍一个利用我们的 CAD 命令表示的自动编码器网络。图 3 展示了其结构，更多细节在补充文档的 C 节中提供。一旦训练完成，网络中的解码部分自然地可以作为 CAD 生成模型使用。

Our autoencoder is based on the Transformer network, inspired by its success for processing sequential data [40, 13, 28]. Our autoencoder takes as input a CAD command sequence $M=[C_{1},\cdots,C_{N_{c}}]$ , where $N_{c}$ is a fixed number (recall Sec. 3.1.2). First, each command $C_{i}$ is projected separately onto a continuous embedding space of dimension $d_{\textrm{E}}=256$ . Then, all the embeddings are put together to feed into an encoder $E$ , which in turn outputs a latent vector $\bm{z}\in\mathbb{R}^{256}$ . The decoder takes the latent vector $\bm{z}$ as input, and outputs a generated CAD command sequence $\hat{M}$ .
我们的自动编码器基于 Transformer 网络，灵感来源于其在处理序列数据方面的成功[40, 13, 28]。我们的自动编码器以 CAD 命令序列 $M=[C_{1},\cdots,C_{N_{c}}]$ 作为输入，其中 $N_{c}$ 是一个固定数值（回顾第 3.1.2 节）。首先，每个命令 $C_{i}$ 被单独投影到一个维度为 $d_{\textrm{E}}=256$ 的连续嵌入空间中。然后，所有嵌入被组合在一起输入到一个编码器 $E$ ，编码器输出一个潜在向量 $\bm{z}\in\mathbb{R}^{256}$ 。解码器以潜在向量 $\bm{z}$ 作为输入，并输出一个生成的 CAD 命令序列 $\hat{M}$ 。

Embedding. 嵌入。

Similar in spirit to the approach in natural language processing [40], we first project every command $C_{i}$ onto a common embedding space. Yet, different from words in natural languages, a CAD command $C_{i}=(t_{i},\bm{p}_{i})$ has two distinct parts: its command type $t_{i}$ and parameters $\bm{p}_{i}$ . We therefore formulate a different way of computing the embedding of $C_{i}$ : take it as a sum of three embeddings, that is, $\bm{e}(C_{i})=\bm{e}_{i}^{\text{cmd}}+\bm{e}_{i}^{\text{param}}+\bm{e}_{i}^{\text{pos}}\in\mathbb{R}^{d_{\textrm{E}}}$ .
与自然语言处理中的方法[40]精神相似，我们首先将每个命令 $C_{i}$ 投影到公共嵌入空间。然而，与自然语言中的单词不同，一个 CAD 命令 $C_{i}=(t_{i},\bm{p}_{i})$ 有两个不同的部分：其命令类型 $t_{i}$ 和参数 $\bm{p}_{i}$ 。因此，我们提出了一种不同的计算 $C_{i}$ 嵌入的方法：将其视为三个嵌入的总和，即 $\bm{e}(C_{i})=\bm{e}_{i}^{\text{cmd}}+\bm{e}_{i}^{\text{param}}+\bm{e}_{i}^{\text{pos}}\in\mathbb{R}^{d_{\textrm{E}}}$ 。

The first embedding $\bm{e}_{i}^{\text{cmd}}$ accounts for the command type $t_{i}$ , given by $\bm{e}_{i}^{\text{cmd}}=\mathsf{W}_{\text{cmd}}\bm{\delta}_{i}^{\textrm{c}}$ . Here $\mathsf{W}_{\text{cmd}}\in\mathbb{R}^{d_{\textrm{E}}\times 6}$ is a learnable matrix and $\bm{\delta}_{i}^{\textrm{c}}\in\mathbb{R}^{6}$ is a one-hot vector indicating the command type $t_{i}$ among the six command types.
第一个嵌入 $\bm{e}_{i}^{\text{cmd}}$ 考虑了命令类型 $t_{i}$ ，由 $\bm{e}_{i}^{\text{cmd}}=\mathsf{W}_{\text{cmd}}\bm{\delta}_{i}^{\textrm{c}}$ 给出。这里 $\mathsf{W}_{\text{cmd}}\in\mathbb{R}^{d_{\textrm{E}}\times 6}$ 是一个可学习的矩阵，而 $\bm{\delta}_{i}^{\textrm{c}}\in\mathbb{R}^{6}$ 是一个表示在六种命令类型中命令类型 $t_{i}$ 的独热向量。

The second embedding $\bm{e}_{i}^{\text{param}}$ considers the command parameters. As introduced in Sec. 3.1.2, every command has 16 parameters, each of which is quantized into an 8-bit integer. We convert each of these integers into a one-hot vector $\bm{\delta}_{i,j}^{\textrm{p}}$ ( $j=1..16$ ) of dimension $2^{8}+1=257$ ; the additional dimension is to indicate that the parameter is unused in that command. Stacking all the one-hot vectors into a matrix $\bm{\delta}_{i}^{\textrm{p}}\in\mathbb{R}^{257\times 16}$ , we embed each parameter separately using another learnable matrix $\mathsf{W}_{\text{param}}^{b}\in\mathbb{R}^{d_{\textrm{E}}\times 257}$ , and then combine the individual embeddings through a linear layer $\mathsf{W}_{\text{param}}^{a}\in\mathbb{R}^{d_{\textrm{E}}\times 16d_{\textrm{E}}}$ , namely,
第二个嵌入 $\bm{e}_{i}^{\text{param}}$ 考虑了命令参数。如 Sec. 3.1.2 中所述，每个命令有 16 个参数，每个参数都被量化为 8 位整数。我们将这些整数中的每一个转换为一个维度为 $2^{8}+1=257$ 的独热向量 $\bm{\delta}_{i,j}^{\textrm{p}}$ ( $j=1..16$ )；额外的维度用于指示该参数在该命令中未使用。将所有独热向量堆叠成一个矩阵 $\bm{\delta}_{i}^{\textrm{p}}\in\mathbb{R}^{257\times 16}$ ，我们使用另一个可学习的矩阵 $\mathsf{W}_{\text{param}}^{b}\in\mathbb{R}^{d_{\textrm{E}}\times 257}$ 分别嵌入每个参数，然后通过一个线性层 $\mathsf{W}_{\text{param}}^{a}\in\mathbb{R}^{d_{\textrm{E}}\times 16d_{\textrm{E}}}$ 组合这些独立的嵌入，具体为：

\bm{e}_{i}^{\text{param}}=\mathsf{W}_{\text{param}}^{a}\text{flat}(\mathsf{W}_{\text{param}}^{b}\bm{\delta}_{i}^{\textrm{p}}),

(1)

where $\text{flat}(\cdot)$ flattens the input matrix to a vector.
其中 $\text{flat}(\cdot)$ 将输入矩阵展平为向量。

Lastly, similar to [40], the positional embedding $\bm{e}_{i}^{\text{pos}}$ is to indicate the index of the command $C_{i}$ in the whole command sequence, defined as $\bm{e}_{i}^{\text{pos}}=\mathsf{W}_{\text{pos}}\bm{\delta}_{i}$ , where $\mathsf{W}_{\text{pos}}\in\mathbb{R}^{d_{\textrm{E}}\times N_{c}}$ is a learnable matrix and $\bm{\delta}_{i}\in\mathbb{R}^{N_{c}}$ is the one-hot vector filled with $1$ at index $i$ and $0$ otherwise.
最后，类似于 [ 40] ，位置嵌入 $\bm{e}_{i}^{\text{pos}}$ 用于指示命令 $C_{i}$ 在整个命令序列中的索引，定义为 $\bm{e}_{i}^{\text{pos}}=\mathsf{W}_{\text{pos}}\bm{\delta}_{i}$ ，其中 $\mathsf{W}_{\text{pos}}\in\mathbb{R}^{d_{\textrm{E}}\times N_{c}}$ 是一个可学习的矩阵，而 $\bm{\delta}_{i}\in\mathbb{R}^{N_{c}}$ 是一个独热向量，在索引 $i$ 处填充 $1$ ，其余位置为 $0$ 。

Encoder. 编码器。

Our encoder $E$ is composed of four layers of Transformer blocks, each with eight attention heads and feed-forward dimension of $512$ . The encoder takes the embedding sequence $[\bm{e}_{1},..,\bm{e}_{N_{c}}]$ as input, and outputs vectors $[\bm{e}^{\prime}_{1},..,\bm{e}^{\prime}_{N_{c}}]$ ; each has the same dimension $d_{\textrm{E}}=256$ . The output vectors are finally averaged to produce a single $d_{\textrm{E}}$ -dimensional latent vector $\bm{z}$ .
我们的编码器 $E$ 由四个 Transformer 模块层组成，每个模块层有八个注意力头，前馈维度为 $512$ 。编码器以嵌入序列 $[\bm{e}_{1},..,\bm{e}_{N_{c}}]$ 为输入，输出向量 $[\bm{e}^{\prime}_{1},..,\bm{e}^{\prime}_{N_{c}}]$ ；每个向量具有相同的维度 $d_{\textrm{E}}=256$ 。输出向量最终被平均以产生一个 $d_{\textrm{E}}$ 维度的潜在向量 $\bm{z}$ 。

Decoder. 解码器。

Also built on Transformer blocks, our decoder $D$ has the same hyper-parameter settings as the encoder. It takes as input learned constant embeddings while also attending to the latent vector $\bm{z}$ —similar input structure has been used in [9, 10]. Output from the last Transformer block is fed into a linear layer to predict a CAD command sequence $\hat{M}=[\hat{C}_{1},..,\hat{C}_{N_{c}}]$ , including both the command type $\hat{t}_{i}$ and parameters $\hat{\bm{p}}_{i}$ for each command. As opposed to the autoregressive strategy commonly used in natural language processing [40], we adopt the feed-forward strategy [9, 10], and the prediction of our model can be factorized as
我们的解码器 $D$ 也是基于 Transformer 模块构建的，其超参数设置与编码器相同。它以学习到的常量嵌入作为输入，同时关注潜在向量 $\bm{z}$ —这种类似的输入结构在 [ 9, 10] 中已被使用。最后一个 Transformer 模块的输出被输入到一个线性层，以预测 CAD 命令序列 $\hat{M}=[\hat{C}_{1},..,\hat{C}_{N_{c}}]$ ，包括每个命令的类型 $\hat{t}_{i}$ 和参数 $\hat{\bm{p}}_{i}$ 。与自然语言处理中常用的自回归策略 [ 40] 不同，我们采用了前馈策略 [ 9, 10]，我们模型的预测可以被分解为

p(\hat{M}|z,\Theta)=\overset{N_{c}}{\underset{i=1}{\prod}}p(\hat{t}_{i},\hat{\bm{p}}_{i}|z,\Theta),

(2)

where $\Theta$ denotes network parameters of the decoder.
其中 $\Theta$ 表示解码器的网络参数。

3.3 Creation of CAD Dataset
3.3 创建 CAD 数据集

Several datasets of CAD designs exist, but none of them suffice for our training. In particular, the ABC dataset [23] collects about $1$ million CAD designs from Onshape, a web-based CAD tool and repository [3]. Although this is a large-scale dataset, its CAD designs are provided in B-rep format, with no sufficient information to recover how the designs are constructed by CAD operations. The recent Fusion 360 Gallery dataset [48] offers CAD designs constructed by profile sketches and extrusions, and it provides the CAD command sequence for each design. However, this dataset has only $\sim 8000$ CAD designs, not enough for training a well generalized generative model.
目前存在多个 CAD 设计数据集，但它们都无法满足我们的训练需求。特别是 ABC 数据集[23]从基于网络的 CAD 工具和存储库 Onshape[3]中收集了约 $1$ 百万个 CAD 设计。尽管这是一个大规模数据集，但其提供的 CAD 设计是以 B-rep 格式存储的，缺乏足够的信息来恢复设计是如何通过 CAD 操作构建的。最近的 Fusion 360 Gallery 数据集[48]提供了通过轮廓草图和拉伸构建的 CAD 设计，并为每个设计提供了 CAD 命令序列。然而，这个数据集只有 $\sim 8000$ 个 CAD 设计，不足以训练一个泛化能力强的生成模型。

We therefore create a new dataset that is large-scale and provides CAD command sequences. Apart from using it to train our autoencoder network, this dataset may also serve for future research. We have made it publicly available.
因此，我们创建了一个新的数据集，它规模庞大且提供 CAD 命令序列。除了用于训练我们的自动编码器网络外，这个数据集也可能为未来的研究提供用途。我们已经将其公开提供。

To create the dataset, we also leverage Onshape’s CAD repository and its developer API [4] to parse the CAD designs. We start from the ABC dataset. For each CAD model, the dataset provides a link to Onshape’s original CAD design. We then use Onshape’s domain specific language (called FeatureScript [5]) to parse CAD operations and parameters used in that design. For CAD models that use the operations beyond sketch and extrusion, we simply discard them. For the rest of the models, we use a FeatureScript program to extract the sketch profiles and extrusions, and express them using the commands listed in Table 1.
为了创建数据集，我们还利用了 Onshape 的 CAD 仓库及其开发者 API[4]来解析 CAD 设计。我们从 ABC 数据集开始。对于每个 CAD 模型，数据集提供了一个链接指向 Onshape 的原始 CAD 设计。然后我们使用 Onshape 的特定领域语言（称为 FeatureScript[5]）来解析该设计中使用的 CAD 操作和参数。对于使用超出草图和拉伸操作的 CAD 模型，我们直接丢弃它们。对于其余的模型，我们使用一个 FeatureScript 程序来提取草图轮廓和拉伸，并使用表 1 中列出的命令来表示它们。

In the end, we collect a dataset with 178,238 CAD designs all described as CAD command sequences. This is orders of magnitude larger than the existing dataset of the same type [48]. The dataset is further split into training, validation and test sets by 90%-5%-5% in a random fashion, ready to use in training and testing. Figure 9 in the supplementary document samples some CAD models from our dataset.
最终，我们收集了一个包含 178,238 个 CAD 设计的数据集，所有设计都以 CAD 命令序列的形式描述。这与现有同类型数据集相比，规模要大几个数量级[48]。该数据集进一步按 90%-5%-5%的比例随机划分为训练集、验证集和测试集，准备好用于训练和测试。补充文档中的图 9 展示了我们数据集中的一些 CAD 模型。

3.4 Training and Runtime Generation
3.4 训练和运行时生成

Training. 训练。

Leveraging the dataset, we train our autoencoder network using the standard Cross-Entropy loss. Formally, we define the loss between the predicted CAD model $\hat{M}$ and the ground truth model $M$ as
利用数据集，我们使用标准的交叉熵损失训练我们的自动编码器网络。形式上，我们将预测的 CAD 模型 $\hat{M}$ 与真实模型 $M$ 之间的损失定义为

\mathcal{L}=\overset{N_{c}}{\underset{i=1}{\sum}}\ell(\hat{t}_{i},t_{i})+\beta\overset{N_{c}}{\underset{i=1}{\sum}}\overset{N_{P}}{\underset{j=1}{\sum}}\ell(\hat{\bm{p}}_{i,j},\bm{p}_{i,j}),

(3)

where $\ell(\cdot,\cdot)$ denotes the standard Cross-Entropy, $N_{p}$ is the number of parameters ( $N_{p}=16$ in our examples), and $\beta$ is a weight to balance both terms ( $\beta=2$ in our examples). Note that in the ground-truth command sequence, some commands are empty (i.e., the padding command $\langle\mathtt{EOS}\rangle$ ) and some command parameters are unused (i.e., labeled as $-1$ ). In those cases, their corresponding contributions to the summation terms in (3) are simply ignored.
其中 $\ell(\cdot,\cdot)$ 表示标准的交叉熵， $N_{p}$ 是参数数量（在我们的示例中为 $N_{p}=16$ ）， $\beta$ 是平衡两个项的权重（在我们的示例中为 $\beta=2$ ）。请注意，在真实命令序列中，有些命令为空（即填充命令 $\langle\mathtt{EOS}\rangle$ ），有些命令参数未使用（即标记为 $-1$ ）。在这些情况下，它们对（3）中求和项的贡献简单地被忽略。

The training process uses the Adam optimizer [22] with a learning rate $0.001$ and a linear warm-up period of $2000$ initial steps. We set a dropout rate of $0.1$ for all Transformer blocks and apply gradient clipping of $1.0$ in back-propagation. We train the network for $1000$ epochs with a batch size of $512$ .
训练过程使用 Adam 优化器[22]，学习率为 $0.001$ ，并有一个线性预热期为 $2000$ 初始步。我们为所有 Transformer 块设置 dropout 率为 $0.1$ ，并在反向传播中应用梯度裁剪 $1.0$ 。我们使用批量大小 $512$ 训练网络 $1000$ 个 epoch。

CAD generation. CAD 生成。

Once the autoencoder is well trained, we can represent a CAD model using a $256$ -dimensional latent vector $\bm{z}$ . For automatic generation of CAD models, we employ the latent-GAN technique [6, 12, 50] on our learned latent space. The generator and discriminator are both as simple as a multilayer perceptron (MLP) network with four hidden layers, and they are trained using Wasserstein-GAN training strategy with gradient penalty [7, 18]. In the end, to generate a CAD model, we sample a random vector from a multivariate Gaussian distribution and feeding it into the GAN’s generator. The output of the GAN is a latent vector $\bm{z}$ input to our Transformer-based decoder.
一旦自动编码器训练良好，我们可以使用一个 $256$ 维潜在向量 $\bm{z}$ 来表示一个 CAD 模型。为了自动生成 CAD 模型，我们在我们的学习潜在空间中采用了潜在 GAN 技术[6, 12, 50]。生成器和判别器都像是一个具有四个隐藏层的多层感知器（MLP）网络，并且它们使用 Wasserstein-GAN 训练策略和梯度惩罚进行训练[7, 18]。最后，为了生成一个 CAD 模型，我们从多元高斯分布中采样一个随机向量并将其输入到 GAN 的生成器中。GAN 的输出是一个潜在向量 $\bm{z}$ ，该向量输入到我们的基于 Transformer 的解码器中。

4 Experiments 4 实验

In this section, we evaluate our autoencoder network from two perspectives: the autoencoding of CAD models (Sec. 4.1) and latent-space shape generation (Sec. 4.2). We also discuss possible applications that can benefit from our CAD generative model (Sec. 4.3).
在本节中，我们从两个角度评估我们的自动编码器网络：CAD 模型的自动编码（第 4.1 节）和潜在空间形状生成（第 4.2 节）。我们还讨论了可以从我们的 CAD 生成模型中受益的可能应用（第 4.3 节）。

There exist no previous generative models for CAD designs, and thus no methods for our model to direct compare with. Our goal here is to understand the performance of our model under different metrics, and justify the algorithmic choices in our model through a series of ablation studies.
目前还没有针对 CAD 设计的生成模型，因此我们的模型没有可以与之直接比较的方法。我们的目标是在不同的指标下理解模型的性能，并通过一系列消融研究来证明我们模型中算法选择的有效性。

Method 方法	$\text{ACC}_{\text{cmd}}\uparrow$	$\text{ACC}_{\text{param}}\uparrow$	median 中位数 CD $\downarrow$	Invalid 无效 Ratio $\downarrow$ 比率 $\downarrow$
$\mathtt{Ours}\text{+}\mathtt{Aug}$	99.50	97.98	0.752	2.72
$\mathtt{Ours}$	99.36	97.47	0.787	3.30
$\mathtt{Alt}\text{-}\mathtt{ArcMid}$	99.34	97.31	0.790	3.26
$\mathtt{Alt}\text{-}\mathtt{Trans}$	99.33	97.56	0.792	3.30
$\mathtt{Alt}\text{-}\mathtt{Rel}$	99.33	97.66	0.863	3.51
$\mathtt{Alt}\text{-}\mathtt{Regr}$	-	-	2.142	4.32

Table 2: Quantitative evaluation of autoencoding.

\text{ACC}_{\text{cmd}}

and

\text{ACC}_{\text{param}}

are both multiplied by

100\%

, and CD is multiplied by

10^{3}

\uparrow

: a higher metric value indicates better autoencoding quality.

\downarrow

: a lower metric value is better. ACC values for

\mathtt{Alt}\text{-}\mathtt{Regr}

are not available since

\mathtt{Alt}\text{-}\mathtt{Regr}

does not use quantized parameters.
表 2：自动编码的定量评估。

\text{ACC}_{\text{cmd}}

和

\text{ACC}_{\text{param}}

都乘以了

100\%

，CD 乘以了

10^{3}

。

\uparrow

：指标值越高表示自动编码质量越好。

\downarrow

：指标值越低越好。

\mathtt{Alt}\text{-}\mathtt{Regr}

的 ACC 值不可用，因为

\mathtt{Alt}\text{-}\mathtt{Regr}

不使用量化参数。

4.1 Autoencoding of CAD Models
4.1CAD 模型的自编码

The autoencoding performance has often been used to indicate the extent to which the generative model can express the target data distribution [6, 12, 17]. Here we use our autoencoder network to encode a CAD model $M$ absent from the training dataset; we then decode the resulting latent vector into a CAD model $\hat{M}$ . The autoencoder is evaluated by the difference between $M$ and $\hat{M}$ .
自编码性能通常被用来表示生成模型能够表达目标数据分布的程度[6, 12, 17]。在这里，我们使用我们的自编码器网络对训练数据集中不存在的 CAD 模型 $M$ 进行编码；然后我们将得到的潜在向量解码成 CAD 模型 $\hat{M}$ 。通过 $M$ 和 $\hat{M}$ 之间的差异来评估自编码器。

Metrics. 指标。

To thoroughly understand our autoencoder’s performance, we measure the difference between $M$ and $\hat{M}$ in terms of both the CAD commands and the resulting 3D geometry. We propose to evaluate command accuracy using two metrics, namely Command Accuracy ( $\text{ACC}_{\text{cmd}}$ ) and Parameter Accuracy ( $\text{ACC}_{\text{param}}$ ). The former measures the correctness of the predicted CAD command type, defined as
为了全面了解我们的自编码器的性能，我们分别从 CAD 指令和最终 3D 几何形状两个方面测量 $M$ 和 $\hat{M}$ 之间的差异。我们提出使用两个指标来评估指令精度，即指令精度（ $\text{ACC}_{\text{cmd}}$ ）和参数精度（ $\text{ACC}_{\text{param}}$ ）。前者测量预测 CAD 指令类型的正确性，定义为

\text{ACC}_{\text{cmd}}=\frac{1}{N_{c}}\sum_{i=1}^{N_{c}}\mathbb{I}[t_{i}=\hat{t}_{i}].

(4)

Here the notation follows those in Sec. 3. $N_{c}$ denote the total number of CAD commands, and $t_{i}$ and $\hat{t}_{i}$ are the ground-truth and recovered command types, respectively. $\mathbb{I}[\cdot]$ is the indicator function (0 or 1).
此处符号遵循第 3 节中的符号。 $N_{c}$ 表示 CAD 命令的总数， $t_{i}$ 和 $\hat{t}_{i}$ 分别是真实命令类型和恢复的命令类型。 $\mathbb{I}[\cdot]$ 是指示函数（0 或 1）。

Once the command type is correctly recovered, we also evaluate the correctness of the command parameters. This is what Parameter Accuracy ( $\text{ACC}_{\text{param}}$ ) is meant to measure:
一旦命令类型被正确恢复，我们还评估命令参数的正确性。这就是参数准确率（ $\text{ACC}_{\text{param}}$ ）所要衡量的内容：

\text{ACC}_{\text{param}}=\frac{1}{K}\sum_{i=1}^{N_{c}}\sum_{j=1}^{|\hat{\bm{p}}_{i}|}\mathbb{I}[|\bm{p}_{i,j}-\hat{\bm{p}}_{i,j}|<\eta]\mathbb{I}[t_{i}=\hat{t}_{i}],

(5)

where $K=\sum_{i=1}^{N_{c}}\mathbb{I}[t_{i}=\hat{t}_{i}]|\bm{p}_{i}|$ is the total number of parameters in all correctly recovered commands. Note that $\bm{p}_{i,j}$ and $\hat{\bm{p}}_{i,j}$ are both quantized into 8-bit integers. $\eta$ is chosen as a tolerance threshold accounting for the parameter quantization. In practice, we use $\eta=3$ (out of 256 levels).
其中 $K=\sum_{i=1}^{N_{c}}\mathbb{I}[t_{i}=\hat{t}_{i}]|\bm{p}_{i}|$ 是所有正确恢复的命令中的参数总数。请注意， $\bm{p}_{i,j}$ 和 $\hat{\bm{p}}_{i,j}$ 都量化为 8 位整数。 $\eta$ 被选为考虑参数量化的容差阈值。在实践中，我们使用 $\eta=3$ （256 个级别中的某个级别）。

To measure the quality of recovered 3D geometry, we use Chamfer Distance (CD), the metric used in many previous generative models of discretized shapes (such as point clouds) [6, 17, 12]. Here, we evaluate CD by uniformly sampling $2000$ points on the surfaces of reference shape and recovered shape, respectively; and measure CD between the two sets of points. Moreover, it is not guaranteed that the output CAD command sequence always produces a valid 3D shape. In rare cases, the output commands may lead to an invalid topology, and thus no point cloud can be extracted from that CAD model. We therefore also report the Invalid Ratio, the percentage of the output CAD models that fail to be converted to point clouds.
为了衡量恢复的 3D 几何质量，我们使用 Chamfer Distance（CD），这是许多先前用于离散形状（如点云）的生成模型中使用的度量标准 [ 6, 17, 12]。在这里，我们通过在参考形状和恢复形状的表面上均匀采样 $2000$ 个点来评估 CD；并测量这两组点之间的 CD。此外，输出 CAD 命令序列并不总是能生成有效的 3D 形状。在罕见的情况下，输出命令可能导致无效的拓扑结构，因此无法从该 CAD 模型中提取点云。因此，我们还报告无效比例，即无法转换为点云的输出 CAD 模型的比例。

Comparison methods. 比较方法。

Due to the lack of existing CAD generative models, we compare our model with several variants in order to justify our data representation and training strategy. In particular, we consider the following variants.
由于缺乏现有的 CAD 生成模型，我们将我们的模型与几种变体进行比较，以证明我们的数据表示和训练策略。特别是，我们考虑以下几种变体。

$\mathtt{Alt}\text{-}\mathtt{Rel}$ represents curve positions relative to the position of its predecessor curve in the loop. It contrasts to our model, which uses absolute positions in curve specification.
$\mathtt{Alt}\text{-}\mathtt{Rel}$ 表示曲线相对于其在环中前一个曲线的位置。这与我们的模型形成对比，我们的模型在曲线规范中使用绝对位置。

$\mathtt{Alt}\text{-}\mathtt{Trans}$ includes in the extrusion command the starting point position of the loop (in addition to the origin of the sketch plane). Here the starting point position and the plane’s origin are in the world frame of reference of the CAD model. In contrast, our proposed method includes only the sketch plane’s origin, and the origin is translated to the loop’s starting position—it is therefore more compact.
$\mathtt{Alt}\text{-}\mathtt{Trans}$ 在拉伸命令中包含环路的起始点位置（除了草图平面的原点）。在这里，起始点位置和平面的原点位于 CAD 模型的世界参考系中。相比之下，我们提出的方法仅包含草图平面的原点，并将原点平移到环路的起始位置——因此更加紧凑。

$\mathtt{Alt}\text{-}\mathtt{ArcMid}$ specifies an arc using its ending and middle point positions, but not the sweeping angle and the counter-clockwise flag used in Table 1.
$\mathtt{Alt}\text{-}\mathtt{ArcMid}$ 使用弧线的终点和中点位置来指定弧线，但未指定表 1 中使用的扫掠角度和逆时针标志。

$\mathtt{Alt}\text{-}\mathtt{Regr}$ regresses all parameters of the CAD commands using the standard mean-squared error in the loss function. Unlike the model we propose, there is no need to quantize continuous parameters in this approach.
$\mathtt{Alt}\text{-}\mathtt{Regr}$ 使用损失函数中的标准均方误差回归所有 CAD 命令参数。与我们所提出的模型不同，这种方法无需量化连续参数。

$\mathtt{Ours}\text{+}\mathtt{Aug}$ uses the same data representation and training objective as our proposed solution, but it augment the training dataset by including randomly composed CAD command sequences (although the augmentation may be an invalid CAD sequence in few cases).
$\mathtt{Ours}\text{+}\mathtt{Aug}$ 使用与我们所提出的解决方案相同的数据表示和训练目标，但通过包含随机组合的 CAD 命令序列来增强训练数据集（尽管在少数情况下，这种增强可能构成无效的 CAD 序列）。

More details about these variants are described in Sec. D of the supplementary document.
这些变体的更多细节在补充文档的 D 节中有描述。

Method 方法	COV $\uparrow$	MMD $\downarrow$	JSD $\downarrow$
Ours 我们的	78.13	1.45	3.76
l-GAN	77.73	1.27	5.02

Table 3: Shape generation measured under point-cloud metrics. We use the metrics in l-GAN [6]. Both MMD and JSD are multiplied by

10^{2}

\uparrow

: the higher the better,

\downarrow

: the lower the better.
表 3：基于点云指标的形状生成结果。我们使用了 l-GAN [6]中的指标。MMD 和 JSD 均乘以

10^{2}

。

\uparrow

：越高越好，

\downarrow

：越低越好。

Discussion of results. 结果讨论。

The quantitative results are report in Table 2, and more detailed CD scores are given in Table 4 of the supplementary document. In general, $\mathtt{Ours}\text{+}\mathtt{Aug}$ (i.e., training with synthetic data augmentation) achieves the best performance, suggesting that randomly composed data can improve the network’s generalization ability. The performance of $\mathtt{Alt}\text{-}\mathtt{ArcMid}$ is similar to $\mathtt{Ours}$ . This means that middle-point representation is a viable alternative to represent arcs. Moreover, $\mathtt{Alt}\text{-}\mathtt{Trans}$ performs slightly worse in terms of CD than $\mathtt{Ours}$ (e.g., see the green model in Fig. 4).
定量结果报告在表 2 中，补充文档的表 4 给出了更详细的 CD 分数。总的来说， $\mathtt{Ours}\text{+}\mathtt{Aug}$ （即使用合成数据增强进行训练）取得了最佳性能，这表明随机组合的数据可以提高网络的泛化能力。 $\mathtt{Alt}\text{-}\mathtt{ArcMid}$ 的性能与 $\mathtt{Ours}$ 相似。这意味着中间点表示是表示弧线的可行替代方案。此外， $\mathtt{Alt}\text{-}\mathtt{Trans}$ 在 CD 方面比 $\mathtt{Ours}$ 稍差（例如，参见图 4 中的绿色模型）。

Perhaps more interestingly, while $\mathtt{Alt}\text{-}\mathtt{Rel}$ has high parameter accuracy ( $\text{ACC}_{\text{param}}$ ), even higher than $\mathtt{Ours}$ , it has a relatively large CD score and sometimes invalid topology: for example, the yellow model in the second row of Fig. 4 has two triangle loops intersecting with each other, resulting in invalid topology. This is caused by the errors of the predicted curve positions. In $\mathtt{Alt}\text{-}\mathtt{Rel}$ , curve positions are specified with respect to its predecessor curve, and thus the error accumulates along the loop.
或许更有趣的是，虽然 $\mathtt{Alt}\text{-}\mathtt{Rel}$ 具有高参数精度（ $\text{ACC}_{\text{param}}$ ），甚至高于 $\mathtt{Ours}$ ，但它具有相对较大的 CD 分数，并且有时会出现无效拓扑：例如，图 4 第二行的黄色模型有两个三角形环相互交叉，导致无效拓扑。这是由预测曲线位置的误差引起的。在 $\mathtt{Alt}\text{-}\mathtt{Rel}$ 中，曲线位置相对于其前一个曲线指定，因此误差沿着环累积。

Lastly, $\mathtt{Alt}\text{-}\mathtt{Regr}$ , not quantizing continuous parameters, suffers from larger errors that may break curcial geometric relations such as parallel and perpendicular edges (e.g., see the orange model in Fig. 4).
最后， $\mathtt{Alt}\text{-}\mathtt{Regr}$ ，不量化连续参数会导致更大的误差，可能会破坏关键的几何关系，例如平行和垂直的边缘（例如，参见图 4 中的橙色模型）。

Cross-dataset generalization.
跨数据集泛化。

We also verify the generalization of our autoencoder: we take our autoencoder trained on our created dataset and evaluate it on the smaller dataset provided in [48]. These datasets are constructed from different sources: ours is based on models from Onshape repository, while theirs is produced from designs in Autodesk Fusion 360. Nonetheless, our network generalizes well on their dataset, achieving comparable quantitative performance (see Sec. E in supplementary document).
我们也验证了我们的自编码器的泛化能力：我们使用在自建数据集上训练的自编码器，并在[48]中提供的较小数据集上评估它。这些数据集来自不同来源：我们的数据集基于 Onshape 仓库中的模型，而他们的数据集则来自 Autodesk Fusion 360 中的设计。尽管如此，我们的网络在他们数据集上表现良好，实现了相当相当的定量性能（见补充文档的 E 节）。

4.2 Shape Generation 4.2 形状生成

Next, we evaluate CAD model generation from latent vectors (described in Sec. 3.4). Some examples of our generated CAD models are shown in Fig. 1, and more results are presented in Fig. 14 of the supplementary document.
接下来，我们评估从潜在向量（如第 3.4 节所述）生成的 CAD 模型。我们生成的 CAD 模型的一些示例显示在图 1 中，更多结果在补充文档的图 14 中呈现。

Since there are no existing generative models for CAD designs, we choose to compare our model with l-GAN [6], a widely studied point-cloud 3D shape generative model. We note that our goal is not to show the superiority one over another, as the two generative models have different application areas. Rather, we demonstrate our model’s ability to generate comparable shape quality even under the metrics for point-cloud generative models. Further, shapes from our model, as shown in Fig. 5, have much sharper geometric details, and they can be easily user edited (Fig. 7).
由于目前不存在用于 CAD 设计的生成模型，我们选择将我们的模型与 l-GAN [6]进行比较，l-GAN 是一种广泛研究的点云 3D 形状生成模型。我们注意到，我们的目标并非证明哪一个模型更优越，因为这两种生成模型的应用领域不同。相反，我们展示了我们的模型即使在点云生成模型的指标下也能生成相当形状质量的能力。此外，如图 5 所示，我们模型生成的形状具有更清晰的几何细节，并且可以轻松进行用户编辑（图 7）。

Metrics. 指标。

For quantitative comparison with point-cloud generative models, we follow the metrics used in l-GAN [6]. Those metrics measure the discrepancy between two sets of 3D point-cloud shapes, the set $\mathcal{S}$ of ground-truth shapes and the set $\mathcal{G}$ of generated shapes. In particular, Coverage (COV) measures what percentage of shapes in $\mathcal{S}$ can be well approximated by shapes in $\mathcal{G}$ . Minimum Matching Distance (MMD) measures the fidelity of $\mathcal{G}$ through the minimum matching distance between two point clouds from $\mathcal{S}$ and $\mathcal{G}$ . Jensen-Shannon Divergence (JSD) is the standard statistical distance, measuring the similarity between the point-cloud distributions of $\mathcal{S}$ and $\mathcal{G}$ . Details of computing these metrics are present in the supplement (Sec. G).
为了与点云生成模型进行定量比较，我们遵循 l-GAN [6]中使用的指标。这些指标测量两组 3D 点云形状之间的差异，即真实形状的集合 $\mathcal{S}$ 和生成形状的集合 $\mathcal{G}$ 。具体来说，覆盖率（COV）衡量集合 $\mathcal{S}$ 中有多少形状可以被集合 $\mathcal{G}$ 中的形状很好地逼近。最小匹配距离（MMD）通过来自 $\mathcal{S}$ 和 $\mathcal{G}$ 的两个点云之间的最小匹配距离来衡量 $\mathcal{G}$ 的保真度。Jensen-Shannon 散度（JSD）是标准的统计距离，用于测量 $\mathcal{S}$ 和 $\mathcal{G}$ 点云分布之间的相似性。这些指标的计算细节在补充材料（第 G 节）中有所介绍。

Discussion of results. 结果讨论。

Figure 5 illustrates some output examples from our CAD generative model and l-GAN. We then convert ground-truth and generated CAD models into point clouds, and evaluate the metrics. The results are reported in Table 3, indicating that our method has comparable performance as l-GAN in terms of the point-cloud metrics. Nevertheless, CAD models, thanks to their parametric representation, have much smoother surfaces and sharper geometric features than point clouds.
图 5 展示了我们 CAD 生成模型和 l-GAN 的一些输出示例。然后我们将真实 CAD 模型和生成的 CAD 模型转换为点云，并评估指标。结果报告在表 3 中，表明我们的方法在点云指标方面与 l-GAN 具有可比的性能。然而，CAD 模型由于其参数化表示，比点云具有更平滑的表面和更清晰的几何特征。

4.3 Future Applications 4.3 未来应用

The CAD generative model can serve as a fundamental algorithmic block in many applications. While our work focuses on the generative model itself, not the downstream applications, here we discuss its use in two scenarios.
CAD 生成模型可作为许多应用中的基础算法模块。虽然我们的工作专注于生成模型本身，而非下游应用，但在此我们讨论其在两种场景中的使用。

With the CAD generative model, one can take a point cloud (e.g., acquired through 3D scanning) and reconstruct a CAD model. As a preliminary demonstration, we use our autoencoder to encode a CAD model $M$ into a latent vector $\bm{c}$ . We then leverage the PointNet++ encoder [35], training it to encode the point-cloud representation of $M$ into the same latent vector $\bm{c}$ . At inference time, provided a point cloud, we use PointNet++ encoder to map it into a latent vector, followed by our auotoencocder to decode into a CAD model. We show some visual examples in Fig. 6 and quan-titative results in the supplementary document (Table 6).
借助 CAD 生成模型，用户可以将点云（例如通过 3D 扫描获取）重建为 CAD 模型。作为初步演示，我们使用自动编码器将 CAD 模型 $M$ 编码为潜在向量 $\bm{c}$ 。然后我们利用 PointNet++编码器[35]，训练其将 $M$ 的点云表示编码到相同的潜在向量 $\bm{c}$ 。在推理时，给定一个点云，我们使用 PointNet++编码器将其映射到潜在向量，随后使用自动编码器解码为 CAD 模型。我们在图 6 中展示了部分视觉示例，并在补充材料（表 6）中展示了定量结果。

Furthermore, the generated CAD model can be directly imported into CAD tools for user editing (see Fig. 7). This is a unique feature enabled by the CAD generative model, as the user editing on point clouds or polygon meshes would be much more troublesome.
此外，生成的 CAD 模型可以直接导入 CAD 工具供用户编辑（见图 7）。这是 CAD 生成模型带来的独特功能，因为用户在点云或多边形网格上进行编辑将更加繁琐。

5 Discussion and Conclusion
5 讨论与结论

Toward the CAD generative model, there are several limitations in our approach. At this point, we have considered three most widely used types of curve commands (line, arc, circle), but other curve commands can be easily added as well. For example, a cubic Bézier curve can be specified by three control points together with the starting point from the ending position of its predecessor. These parameters can be structured in the same way as described in Sec. 3.1. Other operations, such as revolving a sketch, can be encoded in a way similar to the extrusion command. However, certain CAD operations such as fillet operate on parts of the shape boundary, and thus they require a reference to the model’s B-rep, not just other commands. To incorporate those commands in the generative model is left for future research.
对于 CAD 生成模型，我们的方法存在一些局限性。目前，我们已考虑了三种最广泛使用的曲线命令（直线、圆弧、圆形），但其他曲线命令也可以轻松添加。例如，三次贝塞尔曲线可以通过三个控制点以及其前驱的结束位置作为起点来指定。这些参数可以按照第 3.1 节中描述的方式结构化。其他操作，如旋转草图，可以像拉伸命令一样进行编码。然而，某些 CAD 操作（如倒角）作用于形状边界的部分，因此它们需要参考模型的 B-Rep，而不仅仅是其他命令。将这些命令纳入生成模型是未来研究的课题。

Not every CAD command sequence can produce topologically valid shape. Our generative network cannot guarantee topological soundness of its output CAD sequences. In practice, the generated CAD command sequence rarely fails. The failure becomes more likely as the command sequence becomes quite long. We present and analyze some failure cases in Sec. F of the supplementary document, providing some fodder for future research.
并非每个 CAD 命令序列都能生成拓扑有效的形状。我们的生成网络无法保证其输出 CAD 序列的拓扑正确性。在实践中，生成的 CAD 命令序列很少失败。随着命令序列变得相当长，失败的可能性会更高。我们在补充文档的 F 节中展示了并分析了一些失败案例，为未来的研究提供了一些素材。

In summary, we have presented DeepCAD, a deep generative model for CAD designs. Almost all previous 3D generative models produce discrete 3D shapes such as voxels, point clouds, and meshes. This work, to our knowledge, is the first generative model for CAD designs. To this end, we also introduce a large dataset of CAD models, each represented as a CAD command sequence.
总之，我们提出了 DeepCAD，一个用于 CAD 设计的深度生成模型。几乎所有的先前 3D 生成模型都产生离散的 3D 形状，如体素、点云和网格。据我们所知，这是首个用于 CAD 设计的生成模型。为此，我们还引入了一个大型 CAD 模型数据集，每个模型都表示为 CAD 命令序列。

Acknowledgements. 致谢。

We thank the anonymous reviewers for their constructive feedback. This work was partially supported by the National Science Foundation (1910839 and 1816041).
我们感谢匿名审稿人提出的建设性意见。这项工作得到了美国国家科学基金会（1910839 和 1816041）的部分资助。

References

[1] Autocad. https://www.autodesk.com/products/autocad.
[2] Fusion 360. https://www.autodesk.com/products/fusion-360.
[3] Onshape. http://http://onshape.com.
[4] Onshape developer documentation. https://onshape-public.github.io/docs/.
[5] Onshape featurescript. https://cad.onshape.com/FsDoc/.
[6] Panos Achlioptas, Olga Diamanti, Ioannis Mitliagkas, and Leonidas Guibas. Learning representations and generative models for 3D point clouds. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 40–49, Stockholmsmässan, Stockholm Sweden, 10–15 Jul 2018. PMLR.
[7] Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein generative adversarial networks. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 214–223, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR.
[8] Ruojin Cai, Guandao Yang, Hadar Averbuch-Elor, Zekun Hao, Serge Belongie, Noah Snavely, and Bharath Hariharan. Learning gradient fields for shape generation. In Proceedings of the European Conference on Computer Vision (ECCV), 2020.
[9] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. In European Conference on Computer Vision, pages 213–229. Springer, 2020.
[10] Alexandre Carlier, Martin Danelljan, Alexandre Alahi, and Radu Timofte. Deepsvg: A hierarchical generative network for vector graphics animation. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 16351–16361. Curran Associates, Inc., 2020.
[11] Zhiqin Chen, Andrea Tagliasacchi, and Hao Zhang. Bsp-net: Generating compact meshes via binary space partitioning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 45–54, 2020.
[12] Zhiqin Chen and Hao Zhang. Learning implicit fields for generative shape modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5939–5948, 2019.
[13] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
[14] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
[15] Yaroslav Ganin, Sergey Bartunov, Yujia Li, Ethan Keller, and Stefano Saliceti. Computer-aided design as language. arXiv preprint arXiv:2105.02769, 2021.
[16] Rohit Girdhar, David F Fouhey, Mikel Rodriguez, and Abhinav Gupta. Learning a predictable and generative vector representation for objects. In European Conference on Computer Vision, pages 484–499. Springer, 2016.
[17] Thibault Groueix, Matthew Fisher, Vladimir G Kim, Bryan C Russell, and Mathieu Aubry. A papier-mâché approach to learning 3d surface generation. pages 216–224, 2018.
[18] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville. Improved training of wasserstein gans. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pages 5769–5779, USA, 2017. Curran Associates Inc.
[19] Pradeep Kumar Jayaraman, Aditya Sanghi, Joseph Lambourne, Thomas Davies, Hooman Shayani, and Nigel Morris. Uv-net: Learning from curve-networks and solids. arXiv preprint arXiv:2006.10211, 2020.
[20] R. Kenny Jones, Theresa Barton, Xianghao Xu, Kai Wang, Ellen Jiang, Paul Guerrero, Niloy J. Mitra, and Daniel Ritchie. Shapeassembly: Learning to generate programs for 3d shape structure synthesis. ACM Transactions on Graphics (TOG), Siggraph Asia 2020, 39(6):Article 234, 2020.
[21] Kacper Kania, Maciej Zięba, and Tomasz Kajdanowicz. Ucsg-net–unsupervised discovering of constructive solid geometry tree. arXiv preprint arXiv:2006.09102, 2020.
[22] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2014.
[23] Sebastian Koch, Albert Matveev, Zhongshi Jiang, Francis Williams, Alexey Artemov, Evgeny Burnaev, Marc Alexa, Denis Zorin, and Daniele Panozzo. Abc: A big cad model dataset for geometric deep learning. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
[24] Joseph G Lambourne, Karl DD Willis, Pradeep Kumar Jayaraman, Aditya Sanghi, Peter Meltzer, and Hooman Shayani. Brepnet: A topological message passing system for solid models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12773–12782, 2021.
[25] Changjian Li, Hao Pan, Adrien Bousseau, and Niloy J. Mitra. Sketch2cad: Sequential cad modeling by sketching in context. ACM Trans. Graph. (Proceedings of SIGGRAPH Asia 2020), 39(6):164:1–164:14, 2020.
[26] Jun Li, Kai Xu, Siddhartha Chaudhuri, Ersin Yumer, Hao Zhang, and Leonidas Guibas. Grass: Generative recursive autoencoders for shape structures. ACM Transactions on Graphics (Proc. of SIGGRAPH 2017), 36(4):to appear, 2017.
[27] Yiyi Liao, Simon Donne, and Andreas Geiger. Deep marching cubes: Learning explicit surface representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2916–2925, 2018.
[28] Christoph Lüscher, Eugen Beck, Kazuki Irie, Markus Kitza, Wilfried Michel, Albert Zeyer, Ralf Schlüter, and Hermann Ney. Rwth asr systems for librispeech: Hybrid vs attention. Interspeech 2019, Sep 2019.
[29] Lars Mescheder, Michael Oechsle, Michael Niemeyer, Sebastian Nowozin, and Andreas Geiger. Occupancy networks: Learning 3d reconstruction in function space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4460–4470, 2019.
[30] Kaichun Mo, Paul Guerrero, Li Yi, Hao Su, Peter Wonka, Niloy Mitra, and Leonidas J Guibas. Structurenet: Hierarchical graph networks for 3d shape generation. 2019.
[31] Charlie Nash, Yaroslav Ganin, S. M. Ali Eslami, and Peter Battaglia. PolyGen: An autoregressive generative model of 3D meshes. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 7220–7229. PMLR, 13–18 Jul 2020.
[32] Wamiq Reyaz Para, Shariq Farooq Bhat, Paul Guerrero, Tom Kelly, Niloy Mitra, Leonidas Guibas, and Peter Wonka. Sketchgen: Generating constrained cad sketches. arXiv preprint arXiv:2106.02711, 2021.
[33] Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 165–174, 2019.
[34] Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Noam Shazeer, Alexander Ku, and Dustin Tran. Image transformer. In International Conference on Machine Learning, pages 4055–4064. PMLR, 2018.
[35] Charles R Qi, Li Yi, Hao Su, and Leonidas J Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. arXiv preprint arXiv:1706.02413, 2017.
[36] Tim Salimans, Andrej Karpathy, Xi Chen, and Diederik P. Kingma. Pixelcnn++: A pixelcnn implementation with discretized logistic mixture likelihood and other modifications. In ICLR, 2017.
[37] Gopal Sharma, Rishabh Goyal, Difan Liu, Evangelos Kalogerakis, and Subhransu Maji. Csgnet: Neural shape parser for constructive solid geometry. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5515–5523, 2018.
[38] Gopal Sharma, Difan Liu, Subhransu Maji, Evangelos Kalogerakis, Siddhartha Chaudhuri, and Radomír Měch. Parsenet: A parametric surface fitting network for 3d point clouds. In European Conference on Computer Vision, pages 261–276. Springer, 2020.
[39] Yonglong Tian, Andrew Luo, Xingyuan Sun, Kevin Ellis, William T Freeman, Joshua B Tenenbaum, and Jiajun Wu. Learning to infer and execute 3d shape programs. arXiv preprint arXiv:1901.02875, 2019.
[40] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
[41] Homer Walke, R Kenny Jones, and Daniel Ritchie. Learning to infer shape programs using latent execution self training. arXiv preprint arXiv:2011.13045, 2020.
[42] Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, and Yu-Gang Jiang. Pixel2mesh: Generating 3d mesh models from single rgb images. In Proceedings of the European Conference on Computer Vision (ECCV), pages 52–67, 2018.
[43] Xiaogang Wang, Yuelang Xu, Kai Xu, Andrea Tagliasacchi, Bin Zhou, Ali Mahdavi-Amiri, and Hao Zhang. Pie-net: Parametric inference of point cloud edges. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 20167–20178. Curran Associates, Inc., 2020.
[44] Xinpeng Wang, Chandan Yeshwanth, and Matthias Nießner. Sceneformer: Indoor scene generation with transformers. arXiv preprint arXiv:2012.09793, 2020.
[45] K. Weiler. Edge-based data structures for solid modeling in curved-surface environments. IEEE Computer Graphics and Applications, 5(1):21–40, 1985.
[46] Kevin Weiler. Topological structures for geometric modeling. 1986.
[47] Karl DD Willis, Pradeep Kumar Jayaraman, Joseph G Lambourne, Hang Chu, and Yewen Pu. Engineering sketch generation for computer-aided design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2105–2114, 2021.
[48] Karl D. D. Willis, Yewen Pu, Jieliang Luo, Hang Chu, Tao Du, Joseph G. Lambourne, Armando Solar-Lezama, and Wojciech Matusik. Fusion 360 gallery: A dataset and environment for programmatic cad reconstruction. arXiv preprint arXiv:2010.02392, 2020.
[49] Jiajun Wu, Chengkai Zhang, Tianfan Xue, Bill Freeman, and Josh Tenenbaum. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In Advances in Neural Information Processing Systems, pages 82–90, 2016.
[50] Rundi Wu, Yixin Zhuang, Kai Xu, Hao Zhang, and Baoquan Chen. Pq-net: A generative part seq2seq network for 3d shapes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 829–838, 2020.
[51] Xianghao Xu, Wenzhe Peng, Chin-Yi Cheng, Karl DD Willis, and Daniel Ritchie. Inferring cad modeling sequences using zone graphs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6062–6070, 2021.
[52] Guandao Yang, Xun Huang, Zekun Hao, Ming-Yu Liu, Serge Belongie, and Bharath Hariharan. Pointflow: 3d point cloud generation with continuous normalizing flows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4541–4550, 2019.
[53] Yaoqing Yang, Chen Feng, Yiru Shen, and Dong Tian. Foldingnet: Point cloud auto-encoder via deep grid deformation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 206–215, 2018.

Supplementary Document 补充文档
DeepCAD: A Deep Generative Network for Computer-Aided Design Models
DeepCAD: 用于计算机辅助设计模型的深度生成网络

Appendix A CAD dataset 附录 ACAD 数据集

We create our dataset by parsing the command sequences of CAD models in Onshape’s online repository. Some example models from our dataset are shown in Fig. 9. Unlike other 3D shape datasets which have specific categories (such as chairs and cars), this dataset are mostly user-created mechanical parts, and they have diverse shapes.
我们通过解析 Onshape 在线仓库中 CAD 模型的命令序列来创建我们的数据集。我们数据集中的一些示例模型显示在图 9 中。与其他具有特定类别（如椅子和汽车）的 3D 形状数据集不同，这个数据集主要是用户创建的机械部件，并且它们具有多样的形状。

Our dataset is derived from the ABC dataset [23], which contains several duplicate shapes. So for each shape in the test set, we find its nearest neighbor in the training set based on chamfer distance; we discard this test shape if the nearest distance is below a threshold.
我们的数据集源自 ABC 数据集[23]，其中包含一些重复的形状。因此，对于测试集中的每个形状，我们基于切线距离在训练集中找到其最近邻；如果最近距离低于阈值，我们将丢弃该测试形状。

We further examine the data distribution in terms of CAD command sequence length and the number of extrusions (see Fig. 8). Most CAD command sequences are no longer than $40$ or use less than $8$ extrusions, as these CAD models are all manually created by the user. A similar command length distribution is also reported in Fusion 360 Gallery [48].
我们进一步检查了关于 CAD 命令序列长度和挤压次数的数据分布（见图 8）。大多数 CAD 命令序列的长度不超过 $40$ ，或者使用的挤压次数少于 $8$ ，因为这些 CAD 模型都是由用户手动创建的。Fusion 360 Gallery [ 48]中也报告了类似的命令长度分布。

Appendix B Command Parameter Representation
附录 B 命令参数表示

Recall that our list of the full command parameters is $\bm{p}_{i}=[x,y,\alpha,f,r,\theta,\phi,\gamma,p_{x},p_{y},p_{z},s,e_{1},e_{2},b,u]$ in Table 1. As described in Sec. 3.1.2, we normalize and quantize these parameters.
回想一下，我们的完整命令参数列表在表 1 中是 $\bm{p}_{i}=[x,y,\alpha,f,r,\theta,\phi,\gamma,p_{x},p_{y},p_{z},s,e_{1},e_{2},b,u]$ 。正如第 3.1.2 节所述，我们对这些参数进行归一化和量化。

First, we scale every CAD model within a $2\times 2\times 2$ cube (without translation) such that all parameters stay bounded: the sketch plane origin $(p_{x},p_{y},p_{z})$ and the two-side extrusion distances $(e_{1},e_{2})$ range in $[-1,1]$ ; the scale of associated sketch profile $s$ is within $[0,2]$ ; and the sketch orientation $(\theta,\phi,\gamma)$ have the range $[-\pi,\pi]$ .
首先，我们将 $2\times 2\times 2$ 立方体内的每个 CAD 模型（不进行平移）缩放，使所有参数保持有界：草图平面原点 $(p_{x},p_{y},p_{z})$ 和两侧拉伸距离 $(e_{1},e_{2})$ 的范围在 $[-1,1]$ ；相关联的草图轮廓 $s$ 的缩放比例在 $[0,2]$ ；草图方向 $(\theta,\phi,\gamma)$ 的范围为 $[-\pi,\pi]$ 。

Next, we normalize every sketch profile into a unit square such that its starting point (i.e., the bottom-left point) locates at the center $(0.5,0.5)$ . As a result, the curve’s ending position $(x,y)$ and the radius $r$ of a circle stay within $[0,1]$ . The arc’s sweeping angle $\alpha$ by definition is in $[0,2\pi]$ .
接下来，我们将每个草图轮廓归一化到一个单位正方形中，使其起始点（即左下点）位于中心 $(0.5,0.5)$ 。因此，曲线的结束位置 $(x,y)$ 和圆的半径 $r$ 都保持在 $[0,1]$ 内。根据定义，弧的扫掠角度 $\alpha$ 在 $[0,2\pi]$ 内。

Afterwards, we quantize all continuous parameters into $256$ levels and express them using $8$ -bit integers.
随后，我们将所有连续参数量化为 $256$ 级，并用 $8$ 位整数表示。

For discrete parameters, we directly use their values. The arc’s counter-clockwise flag $f$ is a binary sign: $0$ indicates clockwise arc and $1$ indicates counter-clockwise arc. The CSG operation type $b\in\{0,1,2,3\}$ indicates new body, join, cut and intersect, respectively. Lastly, the extrusion type $u\in\{0,1,2\}$ indicates one-sided, symmetric and two-sided, respectively.
对于离散参数，我们直接使用它们的值。弧的逆时针标志 $f$ 是一个二进制符号： $0$ 表示顺时针弧， $1$ 表示逆时针弧。CSG 操作类型 $b\in\{0,1,2,3\}$ 分别表示新主体、合并、切割和相交。最后，拉伸类型 $u\in\{0,1,2\}$ 分别表示单侧、对称和双侧。

Appendix C Network Architecture and Training Details
附录 C 网络架构和训练细节

Autoencoder. 自编码器。

Our Transformer-based encoder and decoder are both composed of four layers of Transformer blocks, each with eight attention heads and a feed-forward dimension of $512$ . We adopt standard layer normalization and a dropout rate of $0.1$ for each Transformer block.
我们的基于 Transformer 的编码器和解码器都由四个 Transformer 模块层组成，每个模块层有八个注意力头，前馈维度为 $512$ 。我们采用标准的层归一化，每个 Transformer 模块层的 dropout 率为 $0.1$ 。

The last Transformer block in the decoder is followed by two separate linear layers, one for predicting command type (with weights $W_{1}\in\mathbb{R}^{256\times 6}$ ), and another for predicting command parameters (with weights $W_{2}\in\mathbb{R}^{256\times 4096}$ ). The output, a $4096$ -dimensional vector, from the second linear layer is further reshaped into a matrix of shape $16\times 256$ , which indicates each of the total $16$ parameters.
解码器中最后一个 Transformer 模块后面跟着两个独立的线性层，一个用于预测命令类型（权重为 $W_{1}\in\mathbb{R}^{256\times 6}$ ），另一个用于预测命令参数（权重为 $W_{2}\in\mathbb{R}^{256\times 4096}$ ）。第二个线性层的输出是一个 $4096$ 维向量，该向量进一步重塑为一个形状为 $16\times 256$ 的矩阵，该矩阵表示总共 $16$ 个参数。

Latent-GAN. Latent-GAN。

Section 3.4 describes the use of latent-GAN technique on our learned latent space for CAD generation. In our GAN model, the generator and discriminator are both MLP networks, each with four hidden layers. Every hidden layer has a dimension of $512$ . The input dimension (or the noise dimension) is $64$ , and the output dimension is $256$ . We use WGAN-gp strategy [7, 18] to train the network: the number of critic iterations is set to $5$ and the weight factor for gradient penalty is set to $10$ . The training lasts for $200,000$ iterations with a batch size of $256$ . In this process, Adam optimizer is used with a learning rate of $2\times 10^{-4}$ and $\beta_{1}=0.5$ .
第 3.4 节描述了在学习的潜在空间中使用潜在 GAN 技术进行 CAD 生成的应用。在我们的 GAN 模型中，生成器和判别器都是 MLP 网络，每个网络有四个隐藏层。每个隐藏层的维度为 $512$ 。输入维度（或噪声维度）为 $64$ ，输出维度为 $256$ 。我们使用 WGAN-gp 策略[ 7, 18]来训练网络：将判别器迭代次数设置为 $5$ ，梯度惩罚的权重因子设置为 $10$ 。训练持续 $200,000$ 次迭代，批次大小为 $256$ 。在此过程中，我们使用 Adam 优化器，学习率为 $2\times 10^{-4}$ 和 $\beta_{1}=0.5$ 。

Appendix D Autoencoding CAD models
附录 D 自动编码 CAD 模型

Comparison methods. 比较方法。

Here we describe in details the variants of our method used in Sec. 4.1 for comparison.
这里详细描述了我们在第 4.1 节中用于比较的方法的变体。

$\mathtt{Alt}\text{-}\mathtt{Rel}$ represents curve positions relative to the position of its previous curve in the loop. As a result, the ending positions of a line and a arc and the center of a circle differ from those in our method, but the representation of other curve parameters (i.e., $\alpha,f,r$ in Table 1) stays the same.
$\mathtt{Alt}\text{-}\mathtt{Rel}$ 表示曲线相对于环中其前一条曲线的位置。因此，线和圆弧的结束位置以及圆的中心与我们方法中的不同，但其他曲线参数（即表 1 中的 $\alpha,f,r$ ）的表示保持不变。

$\mathtt{Alt}\text{-}\mathtt{Trans}$ includes in the extrusion command the starting position ( $s_{x},s_{y}$ ) of the loop, in addition to the origin of the sketch plane. The origin ( $p_{x},p_{y},p_{z}$ ) is in the world frame of reference; the loop’s starting position ( $s_{x},s_{y}$ ) is described in the local frame of the sketch plane. In our proposed approach, however, we translate the sketch plane’s origin to the loop’s starting position. Thereby, there is no need to specify the parameters ( $s_{x},s_{y}$ ) explicitly.
$\mathtt{Alt}\text{-}\mathtt{Trans}$ 在拉伸命令中包含循环的起始位置（ $s_{x},s_{y}$ ）以及草图平面的原点。原点（ $p_{x},p_{y},p_{z}$ ）位于世界参考系中；循环的起始位置（ $s_{x},s_{y}$ ）在草图平面的局部参考系中描述。然而，在我们的提出方法中，我们将草图平面的原点平移到循环的起始位置。因此，无需明确指定参数（ $s_{x},s_{y}$ ）。

$\mathtt{Alt}\text{-}\mathtt{ArcMid}$ specifies an arc using its ending and middle positions. As a result, the representation of an arc becomes into $(x,y,m_{x},m_{y})$ , where $(x,y)$ indicates the ending position (as in our method), but $(m_{x},m_{y})$ is used to indicate the arc’s middle point.
$\mathtt{Alt}\text{-}\mathtt{ArcMid}$ 使用其结束位置和中点位置来指定一个弧。因此，弧的表示变为 $(x,y,m_{x},m_{y})$ ，其中 $(x,y)$ 表示结束位置（如我们的方法所示），但 $(m_{x},m_{y})$ 用于表示弧的中点。

$\mathtt{Alt}\text{-}\mathtt{Regr}$ regresses all parameters of the CAD commands using the standard mean-squared error in the loss function. The Cross-Entropy loss for discrete parameters (such as command types) stays the same as our proposed approach. But in this variant, continuous parameters are not quantized, although they are still normalized into the range $[-1,1]$ in order to balance the mean-squared errors introduced by different parameters.
$\mathtt{Alt}\text{-}\mathtt{Regr}$ 使用损失函数中的标准均方误差来回归 CAD 命令的所有参数。离散参数（如命令类型）的交叉熵损失与我们的提出方法相同。但在这种变体中，连续参数不会被量化，尽管它们仍然被归一化到范围 $[-1,1]$ 以平衡不同参数引入的均方误差。

$\mathtt{Ours}\text{+}\mathtt{Aug}$ includes randomly composed CAD command sequences in its training process. This is a way of data augmentation. When we randomly choose a CAD model from the dataset during training, there is $50\%$ chance that the sampled CAD sequence will be mixed with another randomly sampled CAD sequence. The mixture of the two CAD command sequences is done by randomly switching one or more pairs of sketch and extrusion (in their commands). CAD sequences that contain only one pair of sketch and extrusion are not involved in this process.
$\mathtt{Ours}\text{+}\mathtt{Aug}$ 在其训练过程中包含随机组合的 CAD 命令序列，这是一种数据增强方法。在训练过程中，当我们从数据集中随机选择一个 CAD 模型时， $50\%$ 有可能被采样到的 CAD 序列会与另一个随机采样的 CAD 序列混合。这两个 CAD 命令序列的混合是通过随机切换一对或多对草图和拉伸（在其命令中）来完成的。只包含一对草图和拉伸的 CAD 序列不参与此过程。

Method 方法	mean 均值 CD	trimmed mean 截断均值 CD	median 中位数 CD
$\mathtt{Ours}\text{+}\mathtt{Aug}$	6.14	0.974	0.752
$\mathtt{Ours}$	7.16	1.08	0.787
$\mathtt{Alt}\text{-}\mathtt{ArcMid}$	6.90	1.09	0.790
$\mathtt{Alt}\text{-}\mathtt{Trans}$	7.14	1.09	0.792
$\mathtt{Alt}\text{-}\mathtt{Rel}$	9.24	1.38	0.863
$\mathtt{Alt}\text{-}\mathtt{Regr}$	12.61	3.87	2.14

Table 4: Mean, trimmed mean and median chamfer distances for shape autoencoding. Numerical values are multiplied by

10^{3}

.
表 4：形状自动编码的均值、截断均值和中位数斜切距离。数值乘以

10^{3}

。

Full statistics for CD scores.
CD 得分的完整统计数据。

In Table 4, we report the mean, trimmed mean, and median chamfer distance (CD) scores for our CAD autoencoding study. “Trimmed mean” CD is computed by removing $10\%$ largest and $10\%$ smallest scores. The mean CD scores are significantly higher than the trimmed mean and median CD scores. This is because the prediction of CAD sequence in some cases may be sensitive to small perturbations: a small change in command sequence may lead to a large change of shape topology and may even invalidate the topology (e.g., the gray shape in Fig. 11). Those cases happen rarely, but when they happen, the CD scores become significantly large. It is those outliers that make the mean CD scores much higher.
在表 4 中，我们报告了我们的 CAD 自动编码研究中，平均、截尾平均和中位数切线距离（CD）得分。“截尾平均”CD 是通过移除最大 $10\%$ 个和最小 $10\%$ 个得分计算得出的。平均 CD 得分显著高于截尾平均和中位数 CD 得分。这是因为 CAD 序列的预测在某些情况下可能对小的扰动很敏感：命令序列的微小变化可能导致形状拓扑的巨大变化，甚至可能使拓扑无效（例如图 11 中的灰色形状）。这些情况很少发生，但当它们发生时，CD 得分会显著增大。正是这些异常值使得平均 CD 得分显著偏高。

$\text{ACC}_{\text{cmd}}\uparrow$	$\text{ACC}_{\text{param}}\uparrow$	median 中位数 CD $\downarrow$	Invalid 无效 Ratio $\downarrow$ 比率 $\downarrow$
97.90	96.45	0.796	1.62

Table 5: Quantitative evaluation for shape autoencoding on Fusion 360 Gallery [48] test data. The model is only trained on our proposed dataset.

\uparrow

: the higher the better,

\downarrow

: the lower the better.
表 5：在 Fusion 360 Gallery [ 48] 测试数据上对形状自动编码的定量评估。该模型仅在我们提出的数据集上进行了训练。

\uparrow

：越高越好，

\downarrow

：越低越好。

Accuracies of individual parameter types.
单个参数类型的准确率。

We also examine the accuracies for individual types of parameters. The accuracy is defined in Sec. 4.1, and the results are shown in Fig. 13. While all the parameter are treated equally in the loss function, their accuracies have some differences. Most notably, the recovery of arc’s sweeping angle $\alpha$ has lower accuracy than other parameters. By examining the dataset, we find that the values of sweeping angle $\alpha$ span over its value range (i.e., $[0,2\pi]$ ) more evenly than other parameters, but the arc command is much less frequently used than other commands. Thus, in comparison to other parameters, it is harder to learn the recovery of the arc sweeping angle.
我们还考察了不同类型参数的准确性。准确性在 4.1 节中定义，结果如图 13 所示。虽然所有参数在损失函数中受到同等对待，但它们的准确性存在一些差异。最值得注意的是，圆弧的扫掠角度 $\alpha$ 的恢复准确性低于其他参数。通过检查数据集，我们发现扫掠角度 $\alpha$ 的值在其值范围（即 $[0,2\pi]$ ）内分布比其他参数更均匀，但圆弧命令的使用频率远低于其他命令。因此，与其他参数相比，学习圆弧扫掠角度的恢复更加困难。

Method 方法	$\text{ACC}_{\text{cmd}}\uparrow$	$\text{ACC}_{\text{param}}\uparrow$	median 中位数 CD $\downarrow$	Invalid 无效 Ratio $\downarrow$ 比率 $\downarrow$
$\mathtt{Ours}$	85.95	74.22	10.30	12.08
$\mathtt{Ours\text{-}noise}$	84.65	74.23	10.44	13.82

Table 6: Quantitative results for CAD reconstruction from point clouds.

\mathtt{Ours\text{-}noise}

corresponds to noisy inputs (uniform noise in

[-0.02,0.02]

along normal direction). We use the same metrics as in autoencoding task;

\text{ACC}_{\text{cmd}}

and

\text{ACC}_{\text{param}}

are both multiplied by

100\%

, and CD is multiplied by

10^{3}

.
表 6：从点云进行 CAD 重建的定量结果。

\mathtt{Ours\text{-}noise}

对应于噪声输入（在

[-0.02,0.02]

沿法线方向均匀分布）。我们使用与自动编码任务相同的指标；

\text{ACC}_{\text{cmd}}

和

\text{ACC}_{\text{param}}

都乘以

100\%

，CD 乘以

10^{3}

。

Appendix E Generalization on Fusion 360 Gallery [48]
附录 E 在 Fusion 360 画廊上的泛化能力 [ 48]

To validate the generalization ability of our autoencoder, we perform a cross-dataset test. In particular, for shape autoencoding tasks, we take the model trained on our proposed dataset and evaluate it using a different dataset provided by Fusion 360 Gallery [48]. These two datasets are constructed from different sources: ours is based on models from Onshape repository, whereas theirs is created from designs in Autodesk Fusion 360. The qualitative and quantitative results are shown in Fig. 10 and Table 5, respectively, showing that our trained model performs well on shape distributions that are different from the training dataset.
为了验证我们自编码器的泛化能力，我们进行了跨数据集的测试。具体来说，对于形状自编码任务，我们使用在提出的训练数据集上训练的模型，并使用由 Fusion 360 画廊[ 48]提供的不同数据集进行评估。这两个数据集来自不同的来源：我们的数据集基于 Onshape 仓库中的模型，而他们的数据集则由 Autodesk Fusion 360 中的设计创建。定性和定量结果分别显示在图 10 和表 5 中，表明我们训练的模型在不同于训练数据集的形状分布上表现良好。

Appendix F Failure Cases 附录 F 失败案例

Not every CAD command sequence is valid. Our method is more likely to produce invalid CAD commands when the command length becomes long. Figure 11 shows a few failed results. The produced gray shape has invalid topology, and the yellow shape suffers from misplacement of small sketches. Figure 12 plots the median CD scores and the parameter accuracies with respect to CAD command sequence length. The difficulties for generating long-sequence CAD models are twofold. As the CAD sequence becomes longer, it is harder to ensure valid topology. Meanwhile, as shown in Fig. 8, the data distribution in terms of the sequence length has a long tail; the dataset provides much more short sequences than long sequences. This data imbalance may cause the network model to bias toward short sequences.
并非所有 CAD 命令序列都是有效的。当命令长度变长时，我们的方法更容易生成无效的 CAD 命令。图 11 展示了一些失败的结果。生成的灰色形状具有无效的拓扑结构，而黄色形状的小草图放置不当。图 12 绘制了与 CAD 命令序列长度相关的中位数 CD 分数和参数精度。生成长序列 CAD 模型的难度有两方面。随着 CAD 序列变长，确保有效拓扑结构变得更加困难。同时，如图 8 所示，序列长度的数据分布具有长尾；数据集提供了远比长序列更多的短序列。这种数据不平衡可能导致网络模型偏向于短序列。

Appendix G Metrics for Shape Generation
附录 GMetrics 形状生成

We follow the three metrics used in [6] to evaluate the quality of our shape generation. In [6], these metrics are motivated for evaluating the point-cloud generation. Therefore, for computing these metrics for CAD models, we first convert them into point clouds. Then, these metrics are defined by comparing a set of reference shapes $\mathcal{S}$ with a set of generated shapes $\mathcal{G}$ .
我们遵循[6]中使用的三个指标来评估我们形状生成的质量。在[6]中，这些指标是为了评估点云生成而提出的。因此，为了计算这些指标用于 CAD 模型，我们首先将它们转换为点云。然后，这些指标通过将一组参考形状 $\mathcal{S}$ 与一组生成形状 $\mathcal{G}$ 进行比较来定义。

Coverage (COV) measures the diversity of generated shapes by computing the fraction of shapes in the reference set $\mathcal{S}$ that are matched by at least one shape in the generated set $\mathcal{G}$ . Formally, COV is defined as
覆盖率（COV）通过计算参考集中 $\mathcal{S}$ 至少被生成集中 $\mathcal{G}$ 的一个形状匹配的形状的比例来衡量生成形状的多样性。形式上，COV 定义为

\text{COV}(\mathcal{S},\mathcal{G})=\frac{|\{\arg\min_{Y\in\mathcal{S}}d^{\text{CD}}(X,Y)|X\in\mathcal{G}\}|}{|\mathcal{S}|},

(6)

where $d^{\text{CD}}(X,Y)$ denote the chamfer distance between two point clouds $X$ and $Y$ .
其中 $d^{\text{CD}}(X,Y)$ 表示两个点云 $X$ 和 $Y$ 之间的斜角距离。

Minimum matching distance (MMD) measures the fidelity of generated shapes. For each shape in the reference set $\mathcal{S}$ , the chamfer distance to its nearest neighbor in the generated set $\mathcal{G}$ is computed. MMD is defined as the average over all the nearest distances:
最小匹配距离（MMD）衡量生成形状的保真度。对于参考集中 $\mathcal{S}$ 的每个形状，计算其到生成集中 $\mathcal{G}$ 最近邻的斜角距离。MMD 定义为所有最近距离的平均值：

\text{MMD}(\mathcal{S},\mathcal{G})=\frac{1}{|\mathcal{S}|}\underset{Y\in\mathcal{S}}{\sum}\underset{X\in\mathcal{G}}{\min}d^{\text{CD}}(X,Y).

(7)

Jensen-Shannon Divergence (JSD) is a statistical distance metric between two data distributions. Here, it measures the similarity between the reference set $\mathcal{S}$ and the generated set $\mathcal{G}$ by computing the marginal point distributions:
Jensen-Shannon 散度（JSD）是两个数据分布之间的统计距离度量。在这里，它通过计算边缘点分布来衡量参考集 $\mathcal{S}$ 和生成集 $\mathcal{G}$ 之间的相似性：

\text{JSD}(P_{\mathcal{S}},P_{\mathcal{G}})=\frac{1}{2}D_{\text{KL}}(P_{\mathcal{S}}||M)+\frac{1}{2}D_{\text{KL}}(P_{\mathcal{G}}||M),

(8)

where $M=\frac{1}{2}(P_{\mathcal{S}}+P_{\mathcal{G}})$ and $D_{\text{KL}}$ is the standard KL-divergence. $P_{\mathcal{S}}$ and $P_{\mathcal{G}}$ are marginal distributions of points in the reference and generated sets, approximated by discretizing the space into $28^{3}$ voxel grids and assigning each point from the point cloud to one of them.
其中 $M=\frac{1}{2}(P_{\mathcal{S}}+P_{\mathcal{G}})$ 和 $D_{\text{KL}}$ 是标准的 KL 散度。 $P_{\mathcal{S}}$ 和 $P_{\mathcal{G}}$ 是参考集和生成集中点的边缘分布，通过将空间离散化为 $28^{3}$ 体素网格，并将点云中的每个点分配到其中一个来近似。

Since our full test set is relatively large, we randomly sample a reference set of $1000$ shapes and generate $3000$ shapes using our method to compute the metric scores. To reduce the sampling bias, we repeat this evaluation process for three times and report the average scores.
由于我们的完整测试集相对较大，我们随机采样一个包含 $1000$ 形状的参考集，并使用我们的方法生成 $3000$ 形状来计算指标分数。为了减少采样偏差，我们重复进行三次评估过程，并报告平均分数。

DeepCAD: A Deep Generative Network for Computer-Aided Design ModelsDeepCAD: 用于计算机辅助设计模型的深度生成网络

Abstract 摘要

1 Introduction 1 引言

Technical contributions.技术贡献。

2 Related work 2 相关工作

Parametric shape inference.参数化形状推理。

Generative models of 3D shapes.三维形状的生成模型。

Transformer-based models.基于 Transformer 的模型。

3 Method 3 方法

3.1 CAD Representation for Neural Networks3.1 用于神经网络的 CAD 表示

3.1.1 Specification of CAD Commands3.1.1 CAD 命令规范

Sketch. 草图。

Extrusion. 拉伸。

3.1.2 Network-friendly Representation3.1.2 网络友好表示

3.2 Autoencoder for CAD Models3.2 CAD 模型的自动编码器

Embedding. 嵌入。

Encoder. 编码器。

Decoder. 解码器。

3.3 Creation of CAD Dataset3.3 创建 CAD 数据集

3.4 Training and Runtime Generation3.4 训练和运行时生成

Training. 训练。

CAD generation. CAD 生成。

4 Experiments 4 实验

4.1 Autoencoding of CAD Models4.1CAD 模型的自编码

Metrics. 指标。

Comparison methods. 比较方法。

Discussion of results. 结果讨论。

Cross-dataset generalization.跨数据集泛化。

4.2 Shape Generation 4.2 形状生成

Metrics. 指标。

Discussion of results. 结果讨论。

4.3 Future Applications 4.3 未来应用

5 Discussion and Conclusion5 讨论与结论

Acknowledgements. 致谢。

References

Appendix A CAD dataset 附录 ACAD 数据集

Appendix B Command Parameter Representation附录 B 命令参数表示

Appendix C Network Architecture and Training Details附录 C 网络架构和训练细节

Autoencoder. 自编码器。

Latent-GAN. Latent-GAN。

Appendix D Autoencoding CAD models附录 D 自动编码 CAD 模型

Comparison methods. 比较方法。

Full statistics for CD scores.CD 得分的完整统计数据。

Accuracies of individual parameter types.单个参数类型的准确率。

Appendix E Generalization on Fusion 360 Gallery [48]附录 E 在 Fusion 360 画廊上的泛化能力 [ 48]

Appendix F Failure Cases 附录 F 失败案例

Appendix G Metrics for Shape Generation附录 GMetrics 形状生成

DeepCAD: A Deep Generative Network for Computer-Aided Design Models
DeepCAD: 用于计算机辅助设计模型的深度生成网络

Technical contributions.
技术贡献。

Parametric shape inference.
参数化形状推理。

Generative models of 3D shapes.
三维形状的生成模型。

Transformer-based models.
基于 Transformer 的模型。

3.1 CAD Representation for Neural Networks
3.1 用于神经网络的 CAD 表示

3.1.1 Specification of CAD Commands
3.1.1 CAD 命令规范

3.1.2 Network-friendly Representation
3.1.2 网络友好表示

3.2 Autoencoder for CAD Models
3.2 CAD 模型的自动编码器

3.3 Creation of CAD Dataset
3.3 创建 CAD 数据集

3.4 Training and Runtime Generation
3.4 训练和运行时生成

4.1 Autoencoding of CAD Models
4.1CAD 模型的自编码

Cross-dataset generalization.
跨数据集泛化。

5 Discussion and Conclusion
5 讨论与结论

Appendix B Command Parameter Representation
附录 B 命令参数表示

Appendix C Network Architecture and Training Details
附录 C 网络架构和训练细节

Appendix D Autoencoding CAD models
附录 D 自动编码 CAD 模型

Full statistics for CD scores.
CD 得分的完整统计数据。

Accuracies of individual parameter types.
单个参数类型的准确率。

Appendix E Generalization on Fusion 360 Gallery [48]
附录 E 在 Fusion 360 画廊上的泛化能力 [ 48]

Appendix G Metrics for Shape Generation
附录 GMetrics 形状生成