这是用户在 2025-7-8 18:52 为 https://www.gfxstrand.net/faith/projects/mesa/nir-notes/ 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

NIR: A new compiler IR for Mesa
NIR:Mesa 的新编译器中间表示

Introduction  介绍

NIR (pronounced “ner”) is a new IR (internal representation) for the Mesa shader compiler that will sit between the old IR (GLSL IR) and back-end compilers. The primary purpose of NIR is to be more efficient for doing optimizations and generate better code for the back-ends. We have a lot of optimizations in implemented GLSL IR right now. However, they still generate fairly bad code primarily because its tree-based structure makes writing good optimizations difficult. For this reason, we have implemented a lot of optimizations in the i965 back-end compilers just to fix up the code we get from GLSL IR. The “proper fix” to this is to implement a better high-level IR; enter NIR.
NIR(发音为“ner”)是 Mesa 着色器编译器中的一个新 IR(中间表示形式),它将位于旧的 IR(GLSL IR)和后端编译器之间。NIR 的主要目的是更高效地进行优化,并为后端生成更好的代码。我们当前在实现的 GLSL IR 中有很多优化。然而,它们仍然生成相当糟糕的代码,主要是因为其基于树的结构使得编写良好的优化变得困难。出于这个原因,我们在 i965 后端编译器中实现了很多优化,只是为了修复从 GLSL IR 获得的代码。“正确的解决方法”是实现一个更好的高级 IR;于是有了 NIR。

Most of the initial work on NIR including setting up common data structures, helper methods, and a few basic passes was done over the summer by our intern, Connor Abbot. Connor did a fantastic job, but there is still a lot left to be done. I’ve spent the last two months trying to fill in the pieces that we need in order to get NIR off the ground. At this point, we’re at zero piglit regressions and the shader-db numbers aren’t terrible.
NIR 的初期工作,包括设置通用数据结构、辅助方法以及几个基本的 pass,主要由我们的实习生 Connor Abbot 在暑期完成。Connor 做得非常出色,但仍有很多工作需要完成。我花了过去两个月的时间来填补我们所需的部分,以便让 NIR 顺利启动。目前,我们没有出现任何 piglit 回退,shader-db 的数字也不算糟糕。

A few key points about NIR:
关于 NIR 的一些关键点:

  1. It is primarily an SSA-based IR.
    它主要是一种基于静态单赋值(SSA)的中间表示。
  2. It supports source/destination-modifiers and swizzles.
    它支持源/目标修饰符和 swizzles。
  3. Standard GPU operations such as sin() and fmad() are first-class ALU operations, not intrinsics.
    标准的 GPU 操作,如 sin()和 fmad(),是第一类 ALU 操作,而不是固有函数。
  4. GLSL concepts like inputs, outputs, uniforms, etc. are built into the IR so we can do proper analysis on them.
    GLSL 概念,如输入、输出、uniforms 等,都被内置到 IR 中,这样我们就可以对它们进行正确的分析。
  5. Even though it’s SSA, it still has a concept of registers and write-masks in the core IR data structures. This means we can generate code that is much closer to what backends want.
    尽管它是静态单赋值(SSA)形式,核心 IR 数据结构中仍然有寄存器和写掩码的概念。这意味着我们可以生成更接近后端需求的代码。

Single Static Assignment (SSA) form
静态单赋值(SSA)形式

It’s worth spending a minute or two talking about SSA.
值得花一两分钟来谈谈 SSA。

  1. Every SSA value is assigned exactly once.
    每个 SSA 值恰好被赋值一次。
  2. Undefined values have a fake assignment.
    未定义的值有一个假的赋值。
  3. Different assignments from divergent control flow are resolved using special instructions called “phi nodes”.
    来自不同控制流的赋值使用称为“phi 节点”的特殊指令来解决。

Example 1  示例 1

Non-SSA form:  非 SSA 形式:

int a;
if (foo) {
   a = 1;
} else {
   a = bar();
}
return a;

SSA form:  静态单赋值形式:

if (foo) {
   a_1 = 1;
} else {
   a_2 = bar();
}
a_3 = phi(a_1, a_2);
return a_3;

Example 2  示例 2

Non-SSA form:  非 SSA 形式:

int a;
while (foo) {
   a = bar();
}
return a;

SSA form:  静态单赋值形式:

a_1 = ssa_undef;
while (foo) {
   a_2 = bar();
}
a_3 = phi(a_1, a_2);
return a_3;

Why SSA?  为什么选择 SSA?

When writing optimizations you frequently want to ask the question, “At point P in the program, what value is stored in variable X?” This question is hard to answer because it involves asking a lot of subquestions:
在编写优化代码时,你经常会问这样一个问题:“在程序的 P 点,变量 X 中存储的值是什么?” 这个问题很难回答,因为它涉及许多子问题:

  1. When was X last assigned before P?
    在 P 点之前,X 最后一次被赋值是什么时候?
  2. Is it possible that X was never assigned at all by the time we get to P?
    到 P 时,X 是否根本就没有被赋值过?
  3. Does the last assignment of X always happen or could it still have the old value?
    X 的最后一次赋值是否总是会发生,还是它仍然可能保持旧值?

We also want to ask the question “When is the result of instruction I used?” If the result of I is assigned to X, there are again, a lot of subquestions:
我们还想问“指令 I 的结果何时被使用?”如果 I 的结果被赋值给 X,又会出现很多子问题:

  1. What instructions use X after I?
    在 I 之后有哪些指令使用了 X?
  2. Is there another instruction that may assign X after I but before P?
    是否有另一条指令在 I 之后但在 P 之前给 X 赋值?
  3. Is there another instruction that always assigns X after I but before P?
    是否有另一条指令总是会在 I 之后但在 P 之前给 X 赋值?
  4. Is X ever used before it is assigned to again?
    X 在重新赋值前是否曾经被使用过?

SSA for makes all of these questions much simpler because each value is only assigned once and you can go directly from the assignment of a value to the instruction that assigned it.
SSA 使得所有这些问题变得简单多了,因为每个值只被赋值一次,你可以直接从值的赋值处追踪到赋值它的指令。

Example 3  示例 3

The variable a is always defined in the example below. Think about how you would figure that out with the program in the form given.
在下面的例子中,变量 a 总是被定义的。思考一下,你会如何通过给定形式的程序来判断这一点。

int a;
do {
   if (foo) {
      a = 5;
      break;
   }

   if (bar)
      continue;

   a = 6;
} while (!done);
return a;

In SSA form, this is trivial because we can trace the SSA definitions and see that we never hit an undef:
在 SSA 形式中,这是显而易见的,因为我们能够追踪 SSA 定义,并且可以看到我们从未遇到未定义的情况:

a_1 ssa_undef;
loop {
   if (foo) {
      a_2 = 5;
      break;
   }

   if (bar)
      continue; /* Goes to the if at the end */

   a_3 = 6;
   if (done)
      break;
}
a_4 = phi(a_2, a_3, a_1);
return a_3;

By putting a program into SSA form, we effectively move all of the control-flow analysis of variable definitions into the into-SSA pass and we can stop thinking about it in every optimization. We also have a pass to put it back into a regular non-SSA form before we hand it to the back-end compiler.
通过将程序转换为 SSA 形式,我们实际上将所有变量定义的控制流分析转移到了进入 SSA 的阶段,这样在每次优化时我们都不必再考虑这个问题。我们还有一个阶段将程序重新转换为普通的非 SSA 形式,然后再交给后端编译器。

Basic Data Structures  基本数据结构

All of the basic data structures in NIR are typedef’d C structs:
NIR 中的所有基本数据结构都是用 typedef 定义的 C 结构体:

typedef struct {
   /* stuff */
} nir_foo;

Inheritance is done in the usual C way with structure inclusion. If a structure has subtypes, it has a type enum and each child type includes its parent type as a field. For each subtype, a casting function is created using the NIR_DEFINE_CAST macro for casting from parent to child.
继承是通过结构体包含以通常的 C 语言方式实现的。如果一个结构体有子类型,它会有一个类型枚举,每个子类型都会将其父类型作为字段包含进来。对于每个子类型,都会使用 NIR_DEFINE_CAST 宏创建一个类型转换函数,用于从父类型转换到子类型。

typedef enum {
   nir_foo_type_bar,
   /* Other types */
} nir_foo_type;

typedef struct {
   nir_foo_type foo_type;
   /* stuff */
} nir_foo;

typedef struct {
   nir_foo foo;
   /* stuff */
} nir_foo_bar;

/* NIR_DEFINE_CAST(nir_foo_as_bar, nir_foo, nir_foo_bar, foo) */
static inline nir_foo_bar *
nir_foo_as_bar(nir_foo *parent)
{
   return exec_node_data(nir_foo_bar, parent, foo);
}

Note that the casting function does not first check if parent is, in fact, a nir_foo_bar before casting. This could be changed easily enough if we want “dynamic cast” behavior. Up to this point, it hasn’t been too much of a problem.
请注意,转换函数在转换之前不会检查 parent 是否实际上是 nir_foo_bar 。如果我们想要“动态转换”行为,这一点可以很容易地进行更改。到目前为止,这还没有成为一个大问题。

NIR datatypes and their children:
NIR 数据类型及其子类型:

What follows is a (mostly complete) list of the core NIR data-structures with subtypes listed in sub-bullets.
以下是一个(基本完整的)NIR 核心数据结构列表,子类型以子项目列出。

There are a few other data structures but they are mostly for handling function calls and linking which we don’t currently do in NIR.
还有一些其他的数据结构,但它们主要用于处理函数调用和链接,这是我们当前在 NIR 中不做的事情。

Declaring NIR opcodes:  声明 NIR 操作码:

For both ALU operations and intrinsics, NIR uses a system of macros and C includes for declaring opcodes and the metadata that goes along with them. The nir_opcodes.h and nir_intrinsics.h header files each have a comment at the top of them describing a macro (not defined in the file) for declaring opcodes. Each opcode is then declared in the file using the given macro.
对于 ALU 操作和固有函数,NIR 使用宏和 C 包含文件系统来声明操作码及其相关元数据。 nir_opcodes.hnir_intrinsics.h 头文件的顶部都有一个注释,描述了一个用于声明操作码的宏(未在文件中定义)。然后使用给定的宏在文件中声明每个操作码。

Then, in nir.h the opcode declaration macro is defined and nir_opcodes.h (or nir_intrinsics.h) is included to generate the enum of opcodes. The nir_opcodes.c and nir_intrinsics.c files contain a different definition of the macro and includes it again to declare the constant run-time metadata structures.
然后,在 nir.h 中定义了操作码声明宏,并包含 nir_opcodes.h (或 nir_intrinsics.h )以生成操作码的枚举。 nir_opcodes.cnir_intrinsics.c 文件包含了宏的不同定义,并再次包含它以声明常量运行时元数据结构。

Basic flow of transformations
变换的基本流程

Before we go into the different transformations in detail, it is good to look at how the code gets from GLSL IR, through NIR, to the back-end code. This example will be taken from the i965 NIR back-end, but it should look fairly similar in other back-ends.
在详细探讨不同的转换之前,先了解一下代码是如何从 GLSL IR 通过 NIR 到达后端代码的。这个例子将来自 i965 NIR 后端,但在其他后端中看起来应该也差不多。

  1. lower_output_reads(): In NIR, output variables are write-only. We could handle this specially, or there’s a pass that GLSL-to-TGSI also uses that lowers outputs to temporary variables with a final write at the end into the output variable. We could have our own pass, but it’s easier to just use the one that already exists.
    lower_output_reads() : 在 NIR 中,输出变量是只写的。我们可以特别处理这种情况,或者使用 GLSL-to-TGSI 也使用的一个阶段,将输出降低为临时变量,并在最后写入输出变量。我们可以有自己的阶段,但使用已存在的阶段更简单。

  2. glsl_to_nir(): Converts the GLSL IR code into NIR code. At this point, all of the information in the program is now in NIR data structures and the GLSL IR could, in theory, be thrown away. The code generated by the glsl_to_nir pass is technically in SSA form. However, it contains a lot of variable load/store intrinsics which we will eventually want to eliminate.
    glsl_to_nir() : 将 GLSL IR 代码转换为 NIR 代码。此时,程序中的所有信息现在都在 NIR 数据结构中,理论上可以丢弃 GLSL IR。 glsl_to_nir 阶段生成的代码在技术上是 SSA 形式。然而,它包含了许多我们最终想要消除的变量加载/存储固有函数。

  3. nir_lower_global_vars_to_local: For each global variable, it looks at all of the uses of that variable in every function implementation in the shader. If it is only ever used in one function implementation, it is lowered to a local variable.
    nir_lower_global_vars_to_local : 对于每个全局变量,它查看着色器中每个函数实现中该变量的所有使用情况。如果它仅在一个函数实现中使用,它将被降低为局部变量。

  4. nir_split_var_copies: Implements “copy splitting” which is similar to structure splitting only it works on copy operations rather than the datatypes themselves. The GLSL language allows you to copy one variable to another an entire structure (which may contain arrays or other structures) at a time. Normally, in a language such as C this would be handled by a “structure splitting” pass that breaks up the structures. Unfortunately for us, structures used in inputs or outputs can’t be split. Therefore, regardlesss of what we do, we have to be able to copy to/from structures.
    nir_split_var_copies : 实现了“复制拆分”,这与结构拆分类似,但它作用于复制操作而不是数据类型本身。GLSL 语言允许你一次性将一个变量复制到另一个变量,包括整个结构体(可能包含数组或其他结构体)。通常,在 C 这样的语言中,这将通过一个“结构拆分”阶段来处理,该阶段会将结构体拆分。不幸的是,对于输入或输出中使用的结构体,我们无法进行拆分。因此,无论我们做什么,都必须能够复制到/从结构体复制。

  5. Optimization loop: Performs a bunch of different optimizations in a loop. Each optimization pass returns a boolean value to indicate if it made any “progress”. The loop continues to repeat until it goes through a full pass of the loop without making any “progress”. I won’t go every optimization, but some of the key ones are as follows:
    优化循环:在一个循环中执行多种不同的优化。每个优化阶段都会返回一个布尔值,以指示是否取得了任何“进展”。循环会一直重复,直到完整地通过一次循环而没有取得任何“进展”。我不会详细说明每一种优化,但其中一些关键的优化如下:

    1. nir_lower_variables(): This pass (which probably needs a better name) lowers variable load/store intrinsics to SSA values whenever possible. In particular, it analizes whether or not a particular value is ever aliased by an indirect and, if not, lowers it to a SSA value inserting phi nodes where necessary.
      nir_lower_variables() : 此阶段(可能需要一个更好的名称)在可能的情况下将变量加载/存储固有函数降低为 SSA 值。特别是,它分析某个值是否曾经被间接别名引用,如果没有,则将其降低为 SSA 值,并在必要时插入 phi 节点。

    2. nir_opt_algebraic(): This pass is capable of doing a variety of algebraic simplifications of expressions. One trivial example is a + 0 -> a. A more complicated example is (-|a| > 0) -> (a == 0) which shows up all the time in GLSL code that has been generated by a DirectX-to-OpenGL translation layer.
      nir_opt_algebraic() : 此阶段能够对表达式进行多种代数简化。一个简单的例子是 a + 0 -> a 。一个更复杂的例子是 (-|a| > 0) -> (a == 0) ,这在由 DirectX 到 OpenGL 的转换层生成的 GLSL 代码中经常出现。

    3. nir_opt_constant_folding(): Looks for ALU operations whose arguments are constants, computes the resulting constant value, and replaces the expression with a single load_const instruction. The constant folding pass is also capable of folding constants into variable dereference indirects. This way, after the constant folding pass, things that were once indirect variable dereferences are now direct and we may be able to lower them to SSA.
      nir_opt_constant_folding() : 寻找其参数为常量的 ALU 操作,计算出结果常量值,并用单一的 load_const 指令替换该表达式。常量折叠阶段还能够将常量折叠到变量解引用间接操作中。这样,在常量折叠阶段之后,曾经是间接变量解引用的操作现在变成了直接的,我们可能能够将其降低到 SSA 形式。

  6. Lowering passes: These lower NIR concepts to things that are easier for the backend compiler to handle. The ones used by the i965 backend are:
    降低阶段:这些阶段将 NIR 概念降低为后端编译器更容易处理的形式。i965 后端使用的有:

    1. nir_lower_locals_to_regs(): This pass lowers local variables to nir_register values. Again, this keeps the backend from having to worry about deref chains.
      nir_lower_locals_to_regs() : 此阶段将局部变量降低为 nir_register 值。同样,这使得后端不必担心解引用链。

    2. Dereference lowering. These passes all do basically the same thing: lower an instruction that uses a variable dereference chain to one that doesn’t. This way the backends don’t have to worry about crawling dereferences and can just work with indices into buffers and the occational indirect offset. These passes include:
      解引用降级。这些阶段基本上做同样的事情:将使用变量解引用链的指令转换为不使用解引用链的指令。这样,后端就不必担心遍历解引用,可以直接处理缓冲区中的索引和偶尔的间接偏移。这些阶段包括:

      • nir_lower_io()
      • nir_lower_samplers()
      • nir_lower_system_values()
      • nir_lower_atomics()
    3. nir_lower_to_source_mods(): Because it makes algebraic optimizations easier, we don’t actually emit source or destination modifiers in glsl_to_nir. Instead, we emit [fi]neg, [fi]abs, and fsat ALU instructions and work with those during optimization. After all the optimizations are done, we then lower these to source/destination modifiers.
      nir_lower_to_source_mods() : 由于这使得代数优化变得更加容易,我们实际上并不会在 glsl_to_nir 中生成源或目标修饰符。相反,我们生成 [fi]neg[fi]absfsat ALU 指令,并在优化过程中使用这些指令。在所有优化完成后,我们再将这些指令转换为源/目标修饰符。

    4. nir_convert_from_ssa(): This pass takes the shader out of SSA form and into a more conventional form using nir_register. In general, going out of SSA form is very hard to do right. If other backends don’t want to be SSA, we’d rather have the out-of-SSA implementation in one place so they don’t get it wrong.
      nir_convert_from_ssa() : 这个阶段将着色器从 SSA 形式转换为使用 nir_register 的更传统形式。一般来说,从 SSA 形式转换出来非常难以正确实现。如果其他后端不想使用 SSA 形式,我们宁愿将非 SSA 实现放在一个地方,以避免出错。

    1. nir_lower_vec_to_movs(): Lowers nir vecN() operations to a series of moves with writemask.
      nir_lower_vec_to_movs() : 将 nir vecN() 操作转换为一系列带有写掩码的移动操作。
  7. nir_validate(): This very useful pass does nothing but check a pile of different invariants of the IR to ensure that nothing is broken. It should get run after every optimization pass when built in debug mode to ensure that we aren’t breaking any invariants as we go.
    nir_validate() : 这个非常有用的阶段只是检查 IR 的一系列不同不变性,以确保没有问题。在调试模式下,每次优化阶段后都应该运行它,以确保我们不会破坏任何不变性。

  8. NIR to FS: Finally, we convert to the low-level FS IR that we use to then create programs to run on the i965 hardware.
    NIR 到 FS:最后,我们将转换为低级别的 FS IR,然后使用它来创建在 i965 硬件上运行的程序。

In and out of SSA
进出 SSA

One of the most important part of being able to use SSA form is going in and out of it efficiently. Going in and out of SSA naively isn’t hard but it generates terrible code. Going in and out competently is much harder but necessary if we want decent programs. For example, just going to and from SSA badly took one shader from 300 generated instructions to 800 with a register pressure of a couple hundred in the middle. Therefore, how we go to/from SSA is important
能够高效地进出 SSA 形式是最重要的部分之一。天真地进出 SSA 并不难,但会生成糟糕的代码。而熟练地进出 SSA 则要困难得多,但如果我们想要生成良好的程序,这是必要的。例如,仅仅糟糕地进出 SSA 就让一个着色器从生成 300 条指令增加到了 800 条,中间的寄存器压力达到几百。因此,如何进出 SSA 是非常重要的。

Going into SSA form
转换为 SSA 形式

In NIR, there are two ways to go into SSA form. One is to first convert to using registers and call nir_to_ssa(). The other is what we do in i965 where we generate SSA+load/store and then call nir_lower_variables() I’ll walk through the nir_lower_variables() pass as it is the more complicated of the two, but they both follow the same algorithm for placing phi nodes etc.
在 NIR 中,有两种方法可以转换为 SSA 形式。一种是先转换为使用寄存器,然后调用 nir_to_ssa() 。另一种是我们 i965 中所做的,即生成 SSA+加载/存储,然后调用 nir_lower_variables() 。我将详细介绍 nir_lower_variables() 过程,因为它比两种方法中的另一种更复杂,但它们都遵循相同的算法来放置 phi 节点等。

  1. Collect information about all of the variable dereference. This builds a data structure which I have called a “deref forest” which contains information about every possible variable dereference and where it is used. This is similar to the use/def information for scalar registers but is done on a per-deref basis so that you can diferentiate between different elements of a structure or different offsets in an array.
    收集所有变量解引用的信息。这会构建一个我称之为“解引用森林”的数据结构,其中包含每个可能的变量解引用及其使用位置的信息。这类似于标量寄存器的使用/定义信息,但它是基于每个解引用进行的,因此你可以区分结构的不同元素或数组中的不同偏移量。

  2. Determine what values can be lowered to SSA values. At the moment, this is a simle heuristic which only lowers a value if it can never be referenced indirectly. The deref forest makes it fairly easy to find these.
    确定哪些值可以降级为 SSA 值。目前,这是一个简单的启发式方法,仅在某个值永远不会被间接引用时将其降级。通过 deref 森林可以相当容易地找到这些值。

  3. Place phi nodes at the iterated dominance frontier of the definitions. This follows the algorithm presented by Cytron et. al. in “Efficiently Computing Static Single Assignment Form and the Control Dependence Graph.” It places phi nodes at mimal locations where they are needed in order to resolve merge points in the CFG.
    将 phi 节点放置在定义的迭代支配边界上。这遵循 Cytron 等人在“Efficiently Computing Static Single Assignment Form and the Control Dependence Graph”中提出的算法。它将 phi 节点放置在需要它们的最小位置,以解决 CFG 中的合并点。

  4. Variable renumbering. This is a pass that walk the list of instructions and replaces each store operation with an ssa definition and each load operation with aread from the closest SSA definition that reaches that instruction. Since we have already added Phi nodes, this can be done with a simple stack-based depth-first search algorithm.
    变量重编号。这是一个遍历指令列表的阶段,将每个存储操作替换为 ssa 定义,并将每个加载操作替换为从到达该指令的最近 SSA 定义中读取。由于我们已经添加了 Phi 节点,这可以通过简单的基于栈的深度优先搜索算法完成。

In the nir_to_ssa() pass, we can insert phi nodes where all of the sources and destinations are the register we are trying to take into SSA form. Because we can’t use variables as sources or destinations of phi nodes, we fake it in the nir_lower_variables() pass with a hash table, but the principle is the same.
nir_to_ssa() 阶段,我们可以在所有源和目标都是我们试图转换为 SSA 形式的寄存器的地方插入 phi 节点。因为我们不能使用变量作为 phi 节点的源或目标,所以在 nir_lower_variables() 阶段我们用哈希表来模拟,但原理是相同的。

Going out of SSA form
退出 SSA 形式

Going out of SSA is even trickier than going into SSA. The pass we have right now would be more-or-less state-of-the-art in a scalar world. Unfortunately, vectors complicate things so we could probably be doing a lot better.
退出 SSA 比进入 SSA 还要棘手。我们当前的这个阶段在标量世界中可以说是相当先进的。不幸的是,向量使得问题复杂化,所以我们可能还有很大的改进空间。

The pass we have now is based on “Revisiting Out-of-SSA Translation for Correctness, Code Quality, and Efficiency” by Boissinot et. al. That is the only out-of-SSA paper I recommend you ever read as the others I’ve found are completely bonkers and rely on impossible data structures. The basic process is as follows:
我们现在使用的优化阶段基于 Boissinot 等人的论文“重新审视 SSA 转换的正确性、代码质量和效率”。这是我推荐你唯一需要阅读的关于 SSA 转换的论文,因为其他我找到的论文完全不可靠,依赖于不可能实现的数据结构。基本过程如下:

  1. Isolate phi nodes. This process inserts a bunch of “parallel copy” instructions at the beinning and ends of basic blocks that ensure SSA values used as sources and destinations of phi nodes are only ever used once. (They are only defined once since they are SSA values.) This prevents many of the technical difficulties of going out of SSA form
    隔离 phi 节点。此过程在基本块的开头和结尾插入大量“并行复制”指令,以确保用作 phi 节点源和目标的 SSA 值仅使用一次。(由于它们是 SSA 值,因此仅定义一次。)这可以避免许多从 SSA 形式退出的技术难题

  2. Aggressive register coalescing. This process starts by putting assigning all of the sources and destinations of a given phi node to the same register. It then tries to, one at a time, eliminate the parallel copies by assigning the source and destination of the copy to the same register. The entire time, it keeps track of the interferences and makes sure that nothing gets clobbered. This is done using a concept called a “dominance forest” which was first put forth by Budimlic et. al. in “Fast Copy Coalescing and Live-Range Identification”. Go read the papers for more info.
    激进的寄存器合并。此过程首先将给定 phi 节点的所有源和目标分配到同一个寄存器。然后,它尝试依次通过将复制的源和目标分配到同一个寄存器来消除并行复制。整个过程中,它会跟踪干扰并确保没有任何内容被破坏。这使用了一种称为“支配森林”的概念,该概念最早由 Budimlic 等人在“快速复制合并和活动范围识别”一文中提出。更多详情请参阅相关论文。

  3. Assign registers. Now that we have figured out what registers to use for the phi nodes, we can assign registers to all of the other SSA values as well.
    分配寄存器。现在我们已经确定了 phi 节点使用的寄存器,我们可以为所有其他 SSA 值分配寄存器。

  4. Resolve the parallel copies. A parallel copy operation takes a bunch of values and copies them to a bunch of other locations all at the same time. In this way, a single parallel copy can be used to shuffle arbitrarily many values around in an atomic fassion. Obviously, no one implements a parallel copy instruction in hardware, so this has to be lowered to a sequence of move operations.
    解决并行复制。并行复制操作会将一组值同时复制到其他多个位置。通过这种方式,单个并行复制可以以原子方式重新排列任意数量的值。显然,硬件中没有人实现并行复制指令,因此必须将其转换为一系列移动操作。

Algebraic optimizations  代数优化

For doing algebraic optimizations and lowering passes, we have an infrastructure that allows you to easily perform search/replace operations on expressions. This consists of two data-structures: nir_search_expression and nir_search_value and a function nir_replace_instr() which checks if the given instruction matches the search expression and, if it does, replaces it with a new instruction (or chain of instructions) according to the given replacement value. The framework automatically handles searching expression trees and dealing with swizzles for you so you don’t have to think about it.
为了进行代数优化和降级操作,我们有一个基础设施,可以让你轻松地在表达式上执行查找/替换操作。这包括两个数据结构: nir_search_expressionnir_search_value 以及一个函数 nir_replace_instr() ,该函数检查给定指令是否与查找表达式匹配,如果匹配,则根据给定的替换值将其替换为新指令(或指令链)。该框架会自动处理表达式树的查找以及处理 swizzles,因此你不必考虑这些问题。

Because creating these data structures is cumbersome, we have a bit if python/mako magic that auto-generates the structures and a function that walks the shader doing the search-and-replace as it goes. The function generated has a first-order switch statement so it only calls nir_replace_instr() if it knows that at least the instructions match. The search and replace expressions are written in a little language using python tuples. From nir_opt_algebraic.py:
由于创建这些数据结构很繁琐,我们使用了一些 python/mako 魔法来自动生成这些结构和一个函数,该函数在遍历着色器时执行查找和替换操作。生成的函数有一个一阶开关语句,因此它只会在知道至少指令匹配时调用 nir_replace_instr() 。查找和替换表达式使用 python 元组编写的一种小语言来编写。从 nir_opt_algebraic.py

# Convenience variables
a = 'a'
b = 'b'
c = 'c'
d = 'd'

# Written in the form (<search>, <replace>) where <search> is an expression
# and <replace> is either an expression or a value.  An expression is
# defined as a tuple of the form (<op>, <src0>, <src1>, <src2>, <src3>)
# where each source is either an expression or a value.  A value can be
# either a numeric constant or a string representing a variable name.  For
# constants, you have to be careful to make sure that it is the right type
# because python is unaware of the source and destination types of the
# opcodes.

optimizations = [
   (('fadd', a, 0.0), a),
   (('iadd', a, 0), a),
   (('fmul', a, 0.0), 0.0),
   (('imul', a, 0), 0),
   (('fmul', a, 1.0), a),
   (('imul', a, 1), a),
   (('ffma', 0.0, a, b), b),
   (('ffma', a, 0.0, b), b),
   (('ffma', a, b, 0.0), ('fmul', a, b)),
   (('flrp', a, b, 0.0), a),
   (('flrp', a, b, 1.0), b),
   (('flrp', a, a, b), a),
   (('flrp', 0.0, a, b), ('fmul', a, b)),
   (('fadd', ('fmul', a, b), c), ('ffma', a, b, c)),
   (('fge', ('fneg', ('fabs', a)), 0.0), ('feq', a, 0.0)),
   (('fmin', ('fmax', a, 1.0), 0.0), ('fsat', a)),
# This one may not be exact
   (('feq', ('fadd', a, b), 0.0), ('feq', a, ('fneg', b))),
]

As you can see, that makes adding new algebraic optimizations stupid easy. The framework can also be used for writing lowering passes to for things like lowering sat to min/max.
正如你所见,这使得添加新的代数优化变得非常简单。该框架还可以用于编写降级操作,例如将饱和操作降级为最小值/最大值操作。

Metadata and analysis  元数据和分析

Many of the optimization/lowering passes require different bits if metadata that are provided by different analysis passes. Right now, we don’t have much of this, but we do have some. As time goes on and we add things like value numbering, the amount of metadata we have will increase. In order to manage this, we have a simple metadata system consisting of an enum and two functions:
许多优化/降级过程需要由不同分析过程提供的不同元数据。目前,我们拥有的不多,但确实有一些。随着时间的推移,当我们添加诸如值编号之类的功能时,我们拥有的元数据量将会增加。为了管理这些元数据,我们有一个简单的元数据系统,由一个枚举和两个函数组成:

Unfortunately, we have no way to automatically dirty everything if you don’t call nir_metadata_preserve(). So shame on you if you forget it.
不幸的是,如果你不调用 nir_metadata_preserve() ,我们没有办法自动将所有内容标记为脏数据。所以,如果你忘记了调用它,那可就怪你自己了。