NIR (pronounced “ner”) is a new IR (internal representation) for the
Mesa shader compiler that will sit between the old IR (GLSL IR) and
back-end compilers. The primary purpose of NIR is to be more efficient
for doing optimizations and generate better code for the back-ends. We
have a lot of optimizations in implemented GLSL IR right now. However,
they still generate fairly bad code primarily because its tree-based
structure makes writing good optimizations difficult. For this reason,
we have implemented a lot of optimizations in the i965 back-end
compilers just to fix up the code we get from GLSL IR. The “proper fix”
to this is to implement a better high-level IR; enter NIR.
NIR(发音为“ner”)是 Mesa 着色器编译器中的一个新 IR(中间表示形式),它将位于旧的 IR(GLSL IR)和后端编译器之间。NIR 的主要目的是更高效地进行优化,并为后端生成更好的代码。我们当前在实现的 GLSL IR 中有很多优化。然而,它们仍然生成相当糟糕的代码,主要是因为其基于树的结构使得编写良好的优化变得困难。出于这个原因,我们在 i965 后端编译器中实现了很多优化,只是为了修复从 GLSL IR 获得的代码。“正确的解决方法”是实现一个更好的高级 IR;于是有了 NIR。
Most of the initial work on NIR including setting up common data
structures, helper methods, and a few basic passes was done over the
summer by our intern, Connor Abbot. Connor did a fantastic job, but
there is still a lot left to be done. I’ve spent the last two months
trying to fill in the pieces that we need in order to get NIR off the
ground. At this point, we’re at zero piglit regressions and the
shader-db numbers aren’t terrible.
NIR 的初期工作,包括设置通用数据结构、辅助方法以及几个基本的 pass,主要由我们的实习生 Connor Abbot 在暑期完成。Connor 做得非常出色,但仍有很多工作需要完成。我花了过去两个月的时间来填补我们所需的部分,以便让 NIR 顺利启动。目前,我们没有出现任何 piglit 回退,shader-db 的数字也不算糟糕。
A few key points about NIR:
关于 NIR 的一些关键点:
It’s worth spending a minute or two talking about SSA.
值得花一两分钟来谈谈 SSA。
Non-SSA form: 非 SSA 形式:
int a;
if (foo) {
a = 1;
} else {
a = bar();
}
return a;
SSA form: 静态单赋值形式:
if (foo) {
a_1 = 1;
} else {
a_2 = bar();
}
a_3 = phi(a_1, a_2);
return a_3;
Non-SSA form: 非 SSA 形式:
int a;
while (foo) {
a = bar();
}
return a;
SSA form: 静态单赋值形式:
a_1 = ssa_undef;
while (foo) {
a_2 = bar();
}
a_3 = phi(a_1, a_2);
return a_3;
When writing optimizations you frequently want to ask the question,
“At point P in the program, what value is stored in variable X?” This
question is hard to answer because it involves asking a lot of
subquestions:
在编写优化代码时,你经常会问这样一个问题:“在程序的 P 点,变量 X 中存储的值是什么?” 这个问题很难回答,因为它涉及许多子问题:
We also want to ask the question “When is the result of instruction I
used?” If the result of I is assigned to X, there are again, a lot of
subquestions:
我们还想问“指令 I 的结果何时被使用?”如果 I 的结果被赋值给 X,又会出现很多子问题:
SSA for makes all of these questions much simpler because each value
is only assigned once and you can go directly from the assignment of a
value to the instruction that assigned it.
SSA 使得所有这些问题变得简单多了,因为每个值只被赋值一次,你可以直接从值的赋值处追踪到赋值它的指令。
The variable a is always defined in the example below.
Think about how you would figure that out with the program in the form
given.
在下面的例子中,变量 a 总是被定义的。思考一下,你会如何通过给定形式的程序来判断这一点。
int a;
do {
if (foo) {
a = 5;
break;
}
if (bar)
continue;
a = 6;
} while (!done);
return a;
In SSA form, this is trivial because we can trace the SSA definitions
and see that we never hit an undef:
在 SSA 形式中,这是显而易见的,因为我们能够追踪 SSA 定义,并且可以看到我们从未遇到未定义的情况:
a_1 ssa_undef;
loop {
if (foo) {
a_2 = 5;
break;
}
if (bar)
continue; /* Goes to the if at the end */
a_3 = 6;
if (done)
break;
}
a_4 = phi(a_2, a_3, a_1);
return a_3;
By putting a program into SSA form, we effectively move all of the
control-flow analysis of variable definitions into the into-SSA pass and
we can stop thinking about it in every optimization. We also have a pass
to put it back into a regular non-SSA form before we hand it to the
back-end compiler.
通过将程序转换为 SSA 形式,我们实际上将所有变量定义的控制流分析转移到了进入 SSA 的阶段,这样在每次优化时我们都不必再考虑这个问题。我们还有一个阶段将程序重新转换为普通的非 SSA 形式,然后再交给后端编译器。
All of the basic data structures in NIR are typedef’d C structs:
NIR 中的所有基本数据结构都是用 typedef 定义的 C 结构体:
typedef struct {
/* stuff */
} nir_foo;
Inheritance is done in the usual C way with structure inclusion. If a
structure has subtypes, it has a type enum and each child type includes
its parent type as a field. For each subtype, a casting function is
created using the NIR_DEFINE_CAST macro for casting from parent to
child.
继承是通过结构体包含以通常的 C 语言方式实现的。如果一个结构体有子类型,它会有一个类型枚举,每个子类型都会将其父类型作为字段包含进来。对于每个子类型,都会使用 NIR_DEFINE_CAST 宏创建一个类型转换函数,用于从父类型转换到子类型。
typedef enum {
nir_foo_type_bar,
/* Other types */
} nir_foo_type;
typedef struct {
nir_foo_type foo_type;
/* stuff */
} nir_foo;
typedef struct {
nir_foo foo;
/* stuff */
} nir_foo_bar;
/* NIR_DEFINE_CAST(nir_foo_as_bar, nir_foo, nir_foo_bar, foo) */
static inline nir_foo_bar *
nir_foo_as_bar(nir_foo *parent)
{
return exec_node_data(nir_foo_bar, parent, foo);
}
Note that the casting function does not first check if
parent is, in fact, a nir_foo_bar before
casting. This could be changed easily enough if we want “dynamic cast”
behavior. Up to this point, it hasn’t been too much of a problem.
请注意,转换函数在转换之前不会检查 parent 是否实际上是 nir_foo_bar 。如果我们想要“动态转换”行为,这一点可以很容易地进行更改。到目前为止,这还没有成为一个大问题。
What follows is a (mostly complete) list of the core NIR
data-structures with subtypes listed in sub-bullets.
以下是一个(基本完整的)NIR 核心数据结构列表,子类型以子项目列出。
nir_variable: Represents a GLSL variable. Mostly a
copy-and-paste of ir_variable
nir_variable :表示一个 GLSL 变量。主要是 ir_variable 的复制粘贴。
nir_deref: Represents a dereference chain for a
variable. The nir_deref objects form a singly linked list
starting at the variable itself and working its way down the chain of
structure and array dereferences.
nir_deref :表示一个变量的解引用链。 nir_deref 对象形成一个单链表,从变量本身开始,沿着结构体和数组解引用链向下延伸。
nir_deref_var: A variable dereference
nir_deref_var : 一个变量的解引用
nir_deref_array: A dereference of an array. An array
dereference can be one of: “direct”, “indirect”, or “wildcard”. A
“wildcard” dereference refers to the every element of the array at the
same time and is only allowed in nir_intrinsic_copy_var
instructions.
nir_deref_array : 一个数组的解引用。数组解引用可以是以下之一:“直接”、“间接”或“通配符”。“通配符”解引用指的是同时引用数组中的所有元素,并且仅在 nir_intrinsic_copy_var 指令中允许使用。
nir_deref_struct: A dereference of an element of a
structure. The element to be dereferenced is denoted by its index in the
structure.
nir_deref_struct : 结构体中一个元素的解引用。要解引用的元素由其在结构体中的索引表示。
nir_register: Non-SSA temporary storage. Registers
can be written to multiple times and with write-masks.
nir_register : 非 SSA 临时存储。寄存器可以多次写入,并且可以使用写掩码。
nir_ssa_def: An SSA definition. This contains a
pointer to the nir_instr that defines the value for easy
crawling of expression trees.
nir_ssa_def : 一个 SSA 定义。这包含一个指向 nir_instr 的指针,该指针定义了值,便于遍历表达式树。
nir_src: A source value for an instruction. Can be a
register (with a potentially indirect offset) or an SSA definition.
nir_src : 指令的源值。可以是一个寄存器(可能带有间接偏移)或一个 SSA 定义。
nir_alu_src: A source for an ALU instruction. ALU
sources have source modifiers and swizzles in addition to the regular
nir_src.nir_alu_src : ALU 指令的源。除了常规的 nir_src ,ALU 源还具有源修饰符和 swizzle。nir_dest: An instruction destination. Can be a
register (with a potentially indirect offset) or an SSA definition. In
the case of an SSA definition, the nir_ssa_def is actually
embedded in the nir_dest.
nir_dest : 指令的目标。可以是一个寄存器(可能带有间接偏移)或一个 SSA 定义。如果是 SSA 定义, nir_ssa_def 实际上嵌入在 nir_dest 中。
nir_alu_dest: The destination of an ALU instruction.
ALU destinations can have the “saturate” destination modifier as well as
a write-mask if the destination is a register.nir_alu_dest : ALU 指令的目标。ALU 目标可以具有“饱和”目标修饰符,如果目标是寄存器的话还可以具有写掩码。nir_instr: An instruction nir_instr : 一条指令
nir_alu_instr: An ALU operation such as
fmul, iadd, or fsin.
nir_alu_instr :如 fmul 、 iadd 或 fsin 这样的 ALU 操作。
nir_call_instr: A function call. Not currently
used.
nir_call_instr : 一个函数调用。当前未使用。
nir_jump_instr: A jump instruction such as return,
break, or continue.
nir_jump_instr : 一个跳转指令,如返回、中断或继续。
nir_tex_instr: A texturing operation. This structure
contains all of the various bits of data required to figure out
sampling, types (int or float), etc.
nir_tex_instr : 一个纹理操作。此结构包含确定采样、类型(整数或浮点数)等所需的各种数据。
nir_intrinsic_instr: An intrinsic. This is anything
that isn’t really “special” but isn’t just an ALU operation. Intrinsics
can’t, in general, be reordered or eliminated (there are flags to let
you know when they can). Also, anything that has anything to do with a
nir_variable must be an intrinsic.
nir_intrinsic_instr :一个内置函数。这包括任何不是真正“特殊”的,但又不仅仅是 ALU 操作的内容。一般来说,内置函数不能被重新排序或消除(有一些标志可以告诉你何时可以这样做)。此外,任何与 nir_variable 有关的内容都必须是内置函数。
nir_load_const_instr: An instruction that just
assigns some piece of constant data to its destination
nir_load_const_instr : 将某些常量数据分配给目标位置的指令
nir_ssa_undef_instr: An instruction for creating
“fake” definitions of SSA values that aren’t actually defined in the
code.
nir_ssa_undef_instr : 创建 SSA 值的“虚假”定义的指令,这些值实际上并未在代码中定义。
nir_phi_instr: A phi node.
nir_phi_instr : 一个 phi 节点。
nir_parallel_copy_instr: A parallel copy
instruction. This crucial for going out of SSA form but you should never
see one of these in the wild.
nir_parallel_copy_instr : 并行复制指令。这对于退出 SSA 形式至关重要,但你绝不会在实际代码中看到这种指令。
nir_cf_node: A node in the control flow graph. The
CFG is explicitly maintained in NIR and each node has an
exec_node embedded in it so that it can be placed in a list
of control flow nodes.
nir_cf_node : 控制流图中的一个节点。CFG 在 NIR 中显式维护,每个节点中嵌入了一个 exec_node ,以便将其放置在控制流节点列表中。
nir_block: A basic block. Contains a list of
instructions none of which are allowed to be a jump instruction except
possibly the last one.
nir_block : 基本块。包含一个指令列表,其中不允许有任何跳转指令,除非可能是最后一个。
nir_if: An if statement. Each nir_if
must be immediately preceded and succeded by a
nir_block. (This ensures that there are no critical edges in the CFG.) Anir_ifcontains two lists of CF nodes: One for the then case and one for the else case. Thenir_ifalso contains anir_src`
that is the condition for the if statement. Currently, there is no if
instruction; it is built into the CFG.
nir_if : 一个 if 语句。每个 nir_if 必须由一个 nir_block. (This ensures that there are no critical edges in the CFG.) A nir_if contains two lists of CF nodes: One for the then case and one for the else case. The nir_if also contains a nir_src` 紧接其前和其后,该指令是 if 语句的条件。目前没有 if 指令;它是内置在 CFG 中的。
nir_loop: An endless loop. Each
nir_loop must be immediately preceded and succeded by a
nir_block. (This ensures that there are no critical edges in the CFG.) Anir_loop`
contains a list of CF nodes that is repeated until you hit a break
instruction.
nir_loop : 一个无限循环。每个 nir_loop 必须由一个 nir_block. (This ensures that there are no critical edges in the CFG.) A nir_loop` 紧接其前和其后,该指令包含一个 CF 节点列表,该列表会重复执行直到遇到 break 指令。
nir_function_impl: A function implementation. Not to
be confused with a nir_function which may have multiple
overloads not all of which are implemented. The reason for all of the
indirection is to support shader subroutine and linking when the time
comes. Fortunately, there are helpers in place that make it not too bad
to work with. A nir_function_impl also contains
function-local stuff such as local variables and registers and other
metadata.
nir_function_impl :一个函数实现。不要与 nir_function 混淆,后者可能有多个重载,但并非所有都已实现。所有间接引用的原因是为了在适当的时候支持着色器子例程和链接。幸运的是,有一些辅助工具,使得处理起来不会太困难。 nir_function_impl 还包含函数局部的内容,如局部变量、寄存器和其他元数据。
nir_shader: A shader. This may contain one or more
functions as well as global registers and variables and other
whole-shader type information. Right now, a nir_shader
usually only contains one function called “main”, but that may
change.
nir_shader :一个着色器。这可能包含一个或多个函数以及全局寄存器和变量以及其他整个着色器类型的信息。目前,一个 nir_shader 通常只包含一个名为“main”的函数,但这一点可能会改变。
There are a few other data structures but they are mostly for
handling function calls and linking which we don’t currently do in
NIR.
还有一些其他的数据结构,但它们主要用于处理函数调用和链接,这是我们当前在 NIR 中不做的事情。
For both ALU operations and intrinsics, NIR uses a system of macros
and C includes for declaring opcodes and the metadata that goes along
with them. The nir_opcodes.h and
nir_intrinsics.h header files each have a comment at the
top of them describing a macro (not defined in the file) for declaring
opcodes. Each opcode is then declared in the file using the given
macro.
对于 ALU 操作和固有函数,NIR 使用宏和 C 包含文件系统来声明操作码及其相关元数据。 nir_opcodes.h 和 nir_intrinsics.h 头文件的顶部都有一个注释,描述了一个用于声明操作码的宏(未在文件中定义)。然后使用给定的宏在文件中声明每个操作码。
Then, in nir.h the opcode declaration macro is defined and
nir_opcodes.h (or nir_intrinsics.h) is
included to generate the enum of opcodes. The nir_opcodes.c
and nir_intrinsics.c files contain a different definition
of the macro and includes it again to declare the constant run-time
metadata structures.
然后,在 nir.h 中定义了操作码声明宏,并包含 nir_opcodes.h (或 nir_intrinsics.h )以生成操作码的枚举。 nir_opcodes.c 和 nir_intrinsics.c 文件包含了宏的不同定义,并再次包含它以声明常量运行时元数据结构。
Before we go into the different transformations in detail, it is good
to look at how the code gets from GLSL IR, through NIR, to the back-end
code. This example will be taken from the i965 NIR back-end, but it
should look fairly similar in other back-ends.
在详细探讨不同的转换之前,先了解一下代码是如何从 GLSL IR 通过 NIR 到达后端代码的。这个例子将来自 i965 NIR 后端,但在其他后端中看起来应该也差不多。
lower_output_reads(): In NIR, output variables are
write-only. We could handle this specially, or there’s a pass that
GLSL-to-TGSI also uses that lowers outputs to temporary variables with a
final write at the end into the output variable. We could have our own
pass, but it’s easier to just use the one that already exists.
lower_output_reads() : 在 NIR 中,输出变量是只写的。我们可以特别处理这种情况,或者使用 GLSL-to-TGSI 也使用的一个阶段,将输出降低为临时变量,并在最后写入输出变量。我们可以有自己的阶段,但使用已存在的阶段更简单。
glsl_to_nir(): Converts the GLSL IR code into NIR
code. At this point, all of the information in the program is now in NIR
data structures and the GLSL IR could, in theory, be thrown away. The
code generated by the glsl_to_nir pass is technically in
SSA form. However, it contains a lot of variable load/store intrinsics
which we will eventually want to eliminate.
glsl_to_nir() : 将 GLSL IR 代码转换为 NIR 代码。此时,程序中的所有信息现在都在 NIR 数据结构中,理论上可以丢弃 GLSL IR。 glsl_to_nir 阶段生成的代码在技术上是 SSA 形式。然而,它包含了许多我们最终想要消除的变量加载/存储固有函数。
nir_lower_global_vars_to_local: For each global
variable, it looks at all of the uses of that variable in every function
implementation in the shader. If it is only ever used in one function
implementation, it is lowered to a local variable.
nir_lower_global_vars_to_local : 对于每个全局变量,它查看着色器中每个函数实现中该变量的所有使用情况。如果它仅在一个函数实现中使用,它将被降低为局部变量。
nir_split_var_copies: Implements “copy splitting”
which is similar to structure splitting only it works on copy operations
rather than the datatypes themselves. The GLSL language allows you to
copy one variable to another an entire structure (which may contain
arrays or other structures) at a time. Normally, in a language such as C
this would be handled by a “structure splitting” pass that breaks up the
structures. Unfortunately for us, structures used in inputs or outputs
can’t be split. Therefore, regardlesss of what we do, we have to be able
to copy to/from structures.
nir_split_var_copies : 实现了“复制拆分”,这与结构拆分类似,但它作用于复制操作而不是数据类型本身。GLSL 语言允许你一次性将一个变量复制到另一个变量,包括整个结构体(可能包含数组或其他结构体)。通常,在 C 这样的语言中,这将通过一个“结构拆分”阶段来处理,该阶段会将结构体拆分。不幸的是,对于输入或输出中使用的结构体,我们无法进行拆分。因此,无论我们做什么,都必须能够复制到/从结构体复制。
Optimization loop: Performs a bunch of different optimizations in
a loop. Each optimization pass returns a boolean value to indicate if it
made any “progress”. The loop continues to repeat until it goes through
a full pass of the loop without making any “progress”. I won’t go every
optimization, but some of the key ones are as follows:
优化循环:在一个循环中执行多种不同的优化。每个优化阶段都会返回一个布尔值,以指示是否取得了任何“进展”。循环会一直重复,直到完整地通过一次循环而没有取得任何“进展”。我不会详细说明每一种优化,但其中一些关键的优化如下:
nir_lower_variables(): This pass (which probably
needs a better name) lowers variable load/store intrinsics to SSA values
whenever possible. In particular, it analizes whether or not a
particular value is ever aliased by an indirect and, if not, lowers it
to a SSA value inserting phi nodes where necessary.
nir_lower_variables() : 此阶段(可能需要一个更好的名称)在可能的情况下将变量加载/存储固有函数降低为 SSA 值。特别是,它分析某个值是否曾经被间接别名引用,如果没有,则将其降低为 SSA 值,并在必要时插入 phi 节点。
nir_opt_algebraic(): This pass is capable of doing a
variety of algebraic simplifications of expressions. One trivial example
is a + 0 -> a. A more complicated example is
(-|a| > 0) -> (a == 0) which shows up all the time in
GLSL code that has been generated by a DirectX-to-OpenGL translation
layer.
nir_opt_algebraic() : 此阶段能够对表达式进行多种代数简化。一个简单的例子是 a + 0 -> a 。一个更复杂的例子是 (-|a| > 0) -> (a == 0) ,这在由 DirectX 到 OpenGL 的转换层生成的 GLSL 代码中经常出现。
nir_opt_constant_folding(): Looks for ALU operations
whose arguments are constants, computes the resulting constant value,
and replaces the expression with a single load_const instruction. The
constant folding pass is also capable of folding constants into variable
dereference indirects. This way, after the constant folding pass, things
that were once indirect variable dereferences are now direct and we may
be able to lower them to SSA.
nir_opt_constant_folding() : 寻找其参数为常量的 ALU 操作,计算出结果常量值,并用单一的 load_const 指令替换该表达式。常量折叠阶段还能够将常量折叠到变量解引用间接操作中。这样,在常量折叠阶段之后,曾经是间接变量解引用的操作现在变成了直接的,我们可能能够将其降低到 SSA 形式。
Lowering passes: These lower NIR concepts to things that are
easier for the backend compiler to handle. The ones used by the i965
backend are:
降低阶段:这些阶段将 NIR 概念降低为后端编译器更容易处理的形式。i965 后端使用的有:
nir_lower_locals_to_regs(): This pass lowers local
variables to nir_register values. Again, this keeps the
backend from having to worry about deref chains.
nir_lower_locals_to_regs() : 此阶段将局部变量降低为 nir_register 值。同样,这使得后端不必担心解引用链。
Dereference lowering. These passes all do basically the same
thing: lower an instruction that uses a variable dereference chain to
one that doesn’t. This way the backends don’t have to worry about
crawling dereferences and can just work with indices into buffers and
the occational indirect offset. These passes include:
解引用降级。这些阶段基本上做同样的事情:将使用变量解引用链的指令转换为不使用解引用链的指令。这样,后端就不必担心遍历解引用,可以直接处理缓冲区中的索引和偶尔的间接偏移。这些阶段包括:
nir_lower_io()nir_lower_samplers()nir_lower_system_values()nir_lower_atomics()nir_lower_to_source_mods(): Because it makes
algebraic optimizations easier, we don’t actually emit source or
destination modifiers in glsl_to_nir. Instead, we emit
[fi]neg, [fi]abs, and fsat ALU
instructions and work with those during optimization. After all the
optimizations are done, we then lower these to source/destination
modifiers.
nir_lower_to_source_mods() : 由于这使得代数优化变得更加容易,我们实际上并不会在 glsl_to_nir 中生成源或目标修饰符。相反,我们生成 [fi]neg 、 [fi]abs 和 fsat ALU 指令,并在优化过程中使用这些指令。在所有优化完成后,我们再将这些指令转换为源/目标修饰符。
nir_convert_from_ssa(): This pass takes the shader
out of SSA form and into a more conventional form using
nir_register. In general, going out of SSA form is very
hard to do right. If other backends don’t want to be SSA, we’d rather
have the out-of-SSA implementation in one place so they don’t get it
wrong.
nir_convert_from_ssa() : 这个阶段将着色器从 SSA 形式转换为使用 nir_register 的更传统形式。一般来说,从 SSA 形式转换出来非常难以正确实现。如果其他后端不想使用 SSA 形式,我们宁愿将非 SSA 实现放在一个地方,以避免出错。
nir_lower_vec_to_movs(): Lowers nir vecN()
operations to a series of moves with writemask.nir_lower_vec_to_movs() : 将 nir vecN() 操作转换为一系列带有写掩码的移动操作。nir_validate(): This very useful pass does nothing
but check a pile of different invariants of the IR to ensure that
nothing is broken. It should get run after every optimization pass when
built in debug mode to ensure that we aren’t breaking any invariants as
we go.
nir_validate() : 这个非常有用的阶段只是检查 IR 的一系列不同不变性,以确保没有问题。在调试模式下,每次优化阶段后都应该运行它,以确保我们不会破坏任何不变性。
NIR to FS: Finally, we convert to the low-level FS IR that we use
to then create programs to run on the i965 hardware.
NIR 到 FS:最后,我们将转换为低级别的 FS IR,然后使用它来创建在 i965 硬件上运行的程序。
One of the most important part of being able to use SSA form is going
in and out of it efficiently. Going in and out of SSA naively isn’t hard
but it generates terrible code. Going in and out competently is much
harder but necessary if we want decent programs. For example, just going
to and from SSA badly took one shader from 300 generated instructions to
800 with a register pressure of a couple hundred in the middle.
Therefore, how we go to/from SSA is important
能够高效地进出 SSA 形式是最重要的部分之一。天真地进出 SSA 并不难,但会生成糟糕的代码。而熟练地进出 SSA 则要困难得多,但如果我们想要生成良好的程序,这是必要的。例如,仅仅糟糕地进出 SSA 就让一个着色器从生成 300 条指令增加到了 800 条,中间的寄存器压力达到几百。因此,如何进出 SSA 是非常重要的。
In NIR, there are two ways to go into SSA form. One is to first
convert to using registers and call nir_to_ssa(). The other
is what we do in i965 where we generate SSA+load/store and then call
nir_lower_variables() I’ll walk through the
nir_lower_variables() pass as it is the more complicated of
the two, but they both follow the same algorithm for placing phi nodes
etc.
在 NIR 中,有两种方法可以转换为 SSA 形式。一种是先转换为使用寄存器,然后调用 nir_to_ssa() 。另一种是我们 i965 中所做的,即生成 SSA+加载/存储,然后调用 nir_lower_variables() 。我将详细介绍 nir_lower_variables() 过程,因为它比两种方法中的另一种更复杂,但它们都遵循相同的算法来放置 phi 节点等。
Collect information about all of the variable dereference. This
builds a data structure which I have called a “deref forest” which
contains information about every possible variable dereference and where
it is used. This is similar to the use/def information for scalar
registers but is done on a per-deref basis so that you can diferentiate
between different elements of a structure or different offsets in an
array.
收集所有变量解引用的信息。这会构建一个我称之为“解引用森林”的数据结构,其中包含每个可能的变量解引用及其使用位置的信息。这类似于标量寄存器的使用/定义信息,但它是基于每个解引用进行的,因此你可以区分结构的不同元素或数组中的不同偏移量。
Determine what values can be lowered to SSA values. At the
moment, this is a simle heuristic which only lowers a value if it can
never be referenced indirectly. The deref forest makes it fairly easy to
find these.
确定哪些值可以降级为 SSA 值。目前,这是一个简单的启发式方法,仅在某个值永远不会被间接引用时将其降级。通过 deref 森林可以相当容易地找到这些值。
Place phi nodes at the iterated dominance frontier of the
definitions. This follows the algorithm presented by Cytron et. al. in
“Efficiently Computing Static Single Assignment Form and the Control
Dependence Graph.” It places phi nodes at mimal locations where they are
needed in order to resolve merge points in the CFG.
将 phi 节点放置在定义的迭代支配边界上。这遵循 Cytron 等人在“Efficiently Computing Static Single Assignment Form and the Control Dependence Graph”中提出的算法。它将 phi 节点放置在需要它们的最小位置,以解决 CFG 中的合并点。
Variable renumbering. This is a pass that walk the list of
instructions and replaces each store operation with an ssa definition
and each load operation with aread from the closest SSA definition that
reaches that instruction. Since we have already added Phi nodes, this
can be done with a simple stack-based depth-first search
algorithm.
变量重编号。这是一个遍历指令列表的阶段,将每个存储操作替换为 ssa 定义,并将每个加载操作替换为从到达该指令的最近 SSA 定义中读取。由于我们已经添加了 Phi 节点,这可以通过简单的基于栈的深度优先搜索算法完成。
In the nir_to_ssa() pass, we can insert phi nodes where
all of the sources and destinations are the register we are trying to
take into SSA form. Because we can’t use variables as sources or
destinations of phi nodes, we fake it in the
nir_lower_variables() pass with a hash table, but the
principle is the same.
在 nir_to_ssa() 阶段,我们可以在所有源和目标都是我们试图转换为 SSA 形式的寄存器的地方插入 phi 节点。因为我们不能使用变量作为 phi 节点的源或目标,所以在 nir_lower_variables() 阶段我们用哈希表来模拟,但原理是相同的。
Going out of SSA is even trickier than going into SSA. The pass we
have right now would be more-or-less state-of-the-art in a scalar world.
Unfortunately, vectors complicate things so we could probably be doing a
lot better.
退出 SSA 比进入 SSA 还要棘手。我们当前的这个阶段在标量世界中可以说是相当先进的。不幸的是,向量使得问题复杂化,所以我们可能还有很大的改进空间。
The pass we have now is based on “Revisiting Out-of-SSA Translation
for Correctness, Code Quality, and Efficiency” by Boissinot et. al. That
is the only out-of-SSA paper I recommend you ever read as the others
I’ve found are completely bonkers and rely on impossible data
structures. The basic process is as follows:
我们现在使用的优化阶段基于 Boissinot 等人的论文“重新审视 SSA 转换的正确性、代码质量和效率”。这是我推荐你唯一需要阅读的关于 SSA 转换的论文,因为其他我找到的论文完全不可靠,依赖于不可能实现的数据结构。基本过程如下:
Isolate phi nodes. This process inserts a bunch of “parallel
copy” instructions at the beinning and ends of basic blocks that ensure
SSA values used as sources and destinations of phi nodes are only ever
used once. (They are only defined once since they are SSA values.) This
prevents many of the technical difficulties of going out of SSA
form
隔离 phi 节点。此过程在基本块的开头和结尾插入大量“并行复制”指令,以确保用作 phi 节点源和目标的 SSA 值仅使用一次。(由于它们是 SSA 值,因此仅定义一次。)这可以避免许多从 SSA 形式退出的技术难题
Aggressive register coalescing. This process starts by putting
assigning all of the sources and destinations of a given phi node to the
same register. It then tries to, one at a time, eliminate the parallel
copies by assigning the source and destination of the copy to the same
register. The entire time, it keeps track of the interferences and makes
sure that nothing gets clobbered. This is done using a concept called a
“dominance forest” which was first put forth by Budimlic et. al. in
“Fast Copy Coalescing and Live-Range Identification”. Go read the papers
for more info.
激进的寄存器合并。此过程首先将给定 phi 节点的所有源和目标分配到同一个寄存器。然后,它尝试依次通过将复制的源和目标分配到同一个寄存器来消除并行复制。整个过程中,它会跟踪干扰并确保没有任何内容被破坏。这使用了一种称为“支配森林”的概念,该概念最早由 Budimlic 等人在“快速复制合并和活动范围识别”一文中提出。更多详情请参阅相关论文。
Assign registers. Now that we have figured out what registers to
use for the phi nodes, we can assign registers to all of the other SSA
values as well.
分配寄存器。现在我们已经确定了 phi 节点使用的寄存器,我们可以为所有其他 SSA 值分配寄存器。
Resolve the parallel copies. A parallel copy operation takes a
bunch of values and copies them to a bunch of other locations all at the
same time. In this way, a single parallel copy can be used to shuffle
arbitrarily many values around in an atomic fassion. Obviously, no one
implements a parallel copy instruction in hardware, so this has to be
lowered to a sequence of move operations.
解决并行复制。并行复制操作会将一组值同时复制到其他多个位置。通过这种方式,单个并行复制可以以原子方式重新排列任意数量的值。显然,硬件中没有人实现并行复制指令,因此必须将其转换为一系列移动操作。
For doing algebraic optimizations and lowering passes, we have an
infrastructure that allows you to easily perform search/replace
operations on expressions. This consists of two data-structures:
nir_search_expression and nir_search_value and
a function nir_replace_instr() which checks if the given
instruction matches the search expression and, if it does, replaces it
with a new instruction (or chain of instructions) according to the given
replacement value. The framework automatically handles searching
expression trees and dealing with swizzles for you so you don’t have to
think about it.
为了进行代数优化和降级操作,我们有一个基础设施,可以让你轻松地在表达式上执行查找/替换操作。这包括两个数据结构: nir_search_expression 和 nir_search_value 以及一个函数 nir_replace_instr() ,该函数检查给定指令是否与查找表达式匹配,如果匹配,则根据给定的替换值将其替换为新指令(或指令链)。该框架会自动处理表达式树的查找以及处理 swizzles,因此你不必考虑这些问题。
Because creating these data structures is cumbersome, we have a bit
if python/mako magic that auto-generates the structures and a function
that walks the shader doing the search-and-replace as it goes. The
function generated has a first-order switch statement so it only calls
nir_replace_instr() if it knows that at least the
instructions match. The search and replace expressions are written in a
little language using python tuples. From
nir_opt_algebraic.py:
由于创建这些数据结构很繁琐,我们使用了一些 python/mako 魔法来自动生成这些结构和一个函数,该函数在遍历着色器时执行查找和替换操作。生成的函数有一个一阶开关语句,因此它只会在知道至少指令匹配时调用 nir_replace_instr() 。查找和替换表达式使用 python 元组编写的一种小语言来编写。从 nir_opt_algebraic.py :
# Convenience variables
a = 'a'
b = 'b'
c = 'c'
d = 'd'
# Written in the form (<search>, <replace>) where <search> is an expression
# and <replace> is either an expression or a value. An expression is
# defined as a tuple of the form (<op>, <src0>, <src1>, <src2>, <src3>)
# where each source is either an expression or a value. A value can be
# either a numeric constant or a string representing a variable name. For
# constants, you have to be careful to make sure that it is the right type
# because python is unaware of the source and destination types of the
# opcodes.
optimizations = [
(('fadd', a, 0.0), a),
(('iadd', a, 0), a),
(('fmul', a, 0.0), 0.0),
(('imul', a, 0), 0),
(('fmul', a, 1.0), a),
(('imul', a, 1), a),
(('ffma', 0.0, a, b), b),
(('ffma', a, 0.0, b), b),
(('ffma', a, b, 0.0), ('fmul', a, b)),
(('flrp', a, b, 0.0), a),
(('flrp', a, b, 1.0), b),
(('flrp', a, a, b), a),
(('flrp', 0.0, a, b), ('fmul', a, b)),
(('fadd', ('fmul', a, b), c), ('ffma', a, b, c)),
(('fge', ('fneg', ('fabs', a)), 0.0), ('feq', a, 0.0)),
(('fmin', ('fmax', a, 1.0), 0.0), ('fsat', a)),
# This one may not be exact
(('feq', ('fadd', a, b), 0.0), ('feq', a, ('fneg', b))),
]
As you can see, that makes adding new algebraic optimizations stupid
easy. The framework can also be used for writing lowering passes to for
things like lowering sat to min/max.
正如你所见,这使得添加新的代数优化变得非常简单。该框架还可以用于编写降级操作,例如将饱和操作降级为最小值/最大值操作。
Many of the optimization/lowering passes require different bits if
metadata that are provided by different analysis passes. Right now, we
don’t have much of this, but we do have some. As time goes on and we add
things like value numbering, the amount of metadata we have will
increase. In order to manage this, we have a simple metadata system
consisting of an enum and two functions:
许多优化/降级过程需要由不同分析过程提供的不同元数据。目前,我们拥有的不多,但确实有一些。随着时间的推移,当我们添加诸如值编号之类的功能时,我们拥有的元数据量将会增加。为了管理这些元数据,我们有一个简单的元数据系统,由一个枚举和两个函数组成:
nir_metadata_require(): Declares that the given
metadata (an OR of enum values) is required. The function automatically
calls all of the required analysis passes for you and, upon its return,
the requested metadata is available and current.
nir_metadata_require() : 声明给定的元数据(枚举值的或运算)是必需的。该函数会自动调用所有必需的分析传递,并在其返回后,请求的元数据是可用且最新的。
nir_metadata_preserve(): Called to declare what
metadata (if any) was preserved by the given pass. If the pass didn’t
touch anything, it doesn’t need to call this function. However, if it
adds/removes instructions or modifies the CFG in any way, it needs to
call nir_metadata_preserve(). The
nir_metadata_preserve() function takes an OR of all of the
bits of metadata that are preserved. That way as new metadata
gets added, we don’t have to update every optimization pass to dirty
it.
nir_metadata_preserve() : 用于声明给定的优化阶段保留了哪些元数据(如果有)。如果优化阶段没有修改任何内容,则无需调用此函数。然而,如果它添加/删除了指令或以任何方式修改了控制流图(CFG),则需要调用 nir_metadata_preserve() 。 nir_metadata_preserve() 函数接受所有保留的元数据位的按位或运算。这样,当添加新的元数据时,我们不必更新每个优化阶段以将其标记为脏数据。
Unfortunately, we have no way to automatically dirty everything if
you don’t call nir_metadata_preserve(). So shame on you if
you forget it.
不幸的是,如果你不调用 nir_metadata_preserve() ,我们没有办法自动将所有内容标记为脏数据。所以,如果你忘记了调用它,那可就怪你自己了。