This is the second part in a series of articles on how debuggers work. Make sure you read the first part before this one.
这是关于调试器工作原理的系列文章的第二部分。在阅读本部分之前,请务必先阅读第一部分。
In this part 在这一部分
I'm going to demonstrate how breakpoints are implemented in a debugger. Breakpoints are one of the two main pillars of debugging - the other being able to inspect values in the debugged process's memory. We've already seen a preview of the other pillar in part 1 of the series, but breakpoints still remain mysterious. By the end of this article, they won't be.
下面我将演示如何在调试器中实现断点。断点是调试的两大支柱之一,另一个支柱是检查被调试进程内存中的值。我们已经在本系列的第一部分预览了另一个支柱,但断点仍然很神秘。本文结束后,断点就不再神秘了。
Software interrupts 软件中断
To implement breakpoints on the x86 architecture, software interrupts (also known as "traps") are used. Before we get deep into the details, I want to explain the concept of interrupts and traps in general.
要在 x86 架构上实现断点,需要使用软件中断(也称为 "陷阱")。在深入探讨细节之前,我想先解释一下中断和陷阱的一般概念。
A CPU has a single stream of execution, working through instructions one by one [1]. To handle asynchronous events like IO and hardware timers, CPUs use interrupts. A hardware interrupt is usually a dedicated electrical signal to which a special "response circuitry" is attached. This circuitry notices an activation of the interrupt and makes the CPU stop its current execution, save its state, and jump to a predefined address where a handler routine for the interrupt is located. When the handler finishes its work, the CPU resumes execution from where it stopped.
CPU 只有一个执行流,逐条执行指令 [1]。为了处理 IO 和硬件定时器等异步事件,CPU 使用中断。硬件中断通常是一个专用的电信号,上面连接着一个特殊的 "响应电路"。该电路会注意到中断的激活,使 CPU 停止当前执行,保存其状态,并跳转到中断处理程序所在的预定义地址。处理程序完成工作后,CPU 从停止的位置重新开始执行。
Software interrupts are similar in principle but a bit different in practice. CPUs support special instructions that allow the software to simulate an interrupt. When such an instruction is executed, the CPU treats it like an interrupt - stops its normal flow of execution, saves its state and jumps to a handler routine. Such "traps" allow many of the wonders of modern OSes (task scheduling, virtual memory, memory protection, debugging) to be implemented efficiently.
软件中断在原理上类似,但在实践中有些不同。CPU 支持允许软件模拟中断的特殊指令。当这样的指令被执行时,CPU 会将其视为中断--停止其正常的执行流程,保存其状态并跳转到处理程序例程。这种 "陷阱 "使得现代操作系统的许多神奇功能(任务调度、虚拟内存、内存保护、调试)得以高效实现。
Some programming errors (such as division by 0) are also treated by the CPU as traps, and are frequently referred to as "exceptions". Here the line between hardware and software blurs, since it's hard to say whether such exceptions are really hardware interrupts or software interrupts. But I've digressed too far away from the main topic, so it's time to get back to breakpoints.
一些编程错误(如除以 0)也会被 CPU 作为陷阱处理,通常被称为 "异常"。在这里,硬件和软件之间的界限变得模糊,因为很难说这种异常是真正的硬件中断还是软件中断。不过我已经离题太远了,现在该回到断点上来了。
int 3 in theory
理论上的 int 3
Having written the previous section, I can now simply say that breakpoints are implemented on the CPU by a special trap called int 3. int is x86 jargon for "trap instruction" - a call to a predefined interrupt handler. x86 supports the int instruction with a 8-bit operand specifying the number of the interrupt that occurred, so in theory 256 traps are supported. The first 32 are reserved by the CPU for itself, and number 3 is the one we're interested in here - it's called "trap to debugger".
写完上一节后,我现在可以简单地说,CPU 是通过一种名为 int 3 的特殊陷阱来实现断点的。 int 是 x86 的行话,意为 "陷阱指令"--调用预定义的中断处理程序。x86 支持 int 指令,其 8 位操作数指定了发生中断的编号,因此理论上支持 256 个陷阱。前 32 个是 CPU 为自己保留的,第 3 个才是我们感兴趣的--它被称为 "调试器陷阱"。
Without further ado, I'll quote from the bible itself [2]:
闲话少说,我引用一下《圣经》本身[2]:
The INT 3 instruction generates a special one byte opcode (CC) that is intended for calling the debug exception handler. (This one byte form is valuable because it can be used to replace the first byte of any instruction with a breakpoint, including other one byte instructions, without over-writing other code).
INT 3 指令会生成一个特殊的单字节操作码 (CC),用于调用调试异常处理程序。(这种单字节形式很有价值,因为它可以用来替换带有断点的任何指令的第一个字节,包括其他单字节指令,而不会重写其他代码)。
The part in parens is important, but it's still too early to explain it. We'll come back to it later in this article.
括号中的部分很重要,但现在解释还为时过早。我们将在本文稍后再讨论这个问题。
int 3 in practice
int 3 的实践
Yes, knowing the theory behind things is great, OK, but what does this really mean? How do we use int 3 to implement breakpoints? Or to paraphrase common programming Q&A jargon - Plz show me the codes!
是的,知道事情背后的理论很好,但这到底意味着什么?我们如何使用 int 3 来实现断点?或者套用常见的编程问答行话--请给我看看代码!
In practice, this is really very simple. Once your process executes the int 3 instruction, the OS stops it [3]. On Linux (which is what we're concerned with in this article) it then sends the process a signal - SIGTRAP.
实际上,这非常简单。一旦进程执行了 int 3 指令,操作系统就会停止它 [3]。在 Linux 系统中(本文主要讨论 Linux 系统),操作系统会向进程发送一个信号 - SIGTRAP 。
That's all there is to it - honest! Now recall from the first part of the series that a tracing (debugger) process gets notified of all the signals its child (or the process it attaches to for debugging) gets, and you can start getting a feel of where we're going.
这就是全部--诚实!现在回忆一下本系列第一部分的内容,即跟踪(调试器)进程会收到其子进程(或它为调试而附加的进程)收到的所有信号,你就能开始明白我们要去哪里了。
That's it, no more computer architecture 101 jabber. It's time for examples and code.
到此为止,不再赘述计算机体系结构 101。现在是举例和编写代码的时候了。
Setting breakpoints manually
手动设置断点
I'm now going to show code that sets a breakpoint in a program. The target program I'm going to use for this demonstration is the following:
我现在要演示在程序中设置断点的代码。我要演示的目标程序如下:
section .text
; The _start symbol must be declared for the linker (ld)
global _start
_start:
; Prepare arguments for the sys_write system call:
; - eax: system call number (sys_write)
; - ebx: file descriptor (stdout)
; - ecx: pointer to string
; - edx: string length
mov edx, len1
mov ecx, msg1
mov ebx, 1
mov eax, 4
; Execute the sys_write system call
int 0x80
; Now print the other message
mov edx, len2
mov ecx, msg2
mov ebx, 1
mov eax, 4
int 0x80
; Execute sys_exit
mov eax, 1
int 0x80
section .data
msg1 db 'Hello,', 0xa
len1 equ $ - msg1
msg2 db 'world!', 0xa
len2 equ $ - msg2
I'm using assembly language for now, in order to keep us clear of compilation issues and symbols that come up when we get into C code. What the program listed above does is simply print "Hello," on one line and then "world!" on the next line. It's very similar to the program demonstrated in the previous article.
为了避免编译问题和进入 C 代码时出现的符号,我现在使用汇编语言。上面列出的程序只是在一行打印 "Hello",然后在下一行打印 "world!"。这与上一篇文章中演示的程序非常相似。
I want to set a breakpoint after the first printout, but before the second one. Let's say right after the first int 0x80 [4], on the mov edx, len2 instruction. First, we need to know what address this instruction maps to. Running objdump -d:
我想在第一次打印输出之后、第二次打印输出之前设置一个断点。比方说,就在第一条 int 0x80 [4] 之后,在 mov edx, len2 指令上。首先,我们需要知道这条指令映射到什么地址。运行 objdump -d :
traced_printer2: file format elf32-i386
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000033 08048080 08048080 00000080 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .data 0000000e 080490b4 080490b4 000000b4 2**2
CONTENTS, ALLOC, LOAD, DATA
Disassembly of section .text:
08048080 <.text>:
8048080: ba 07 00 00 00 mov $0x7,%edx
8048085: b9 b4 90 04 08 mov $0x80490b4,%ecx
804808a: bb 01 00 00 00 mov $0x1,%ebx
804808f: b8 04 00 00 00 mov $0x4,%eax
8048094: cd 80 int $0x80
8048096: ba 07 00 00 00 mov $0x7,%edx
804809b: b9 bb 90 04 08 mov $0x80490bb,%ecx
80480a0: bb 01 00 00 00 mov $0x1,%ebx
80480a5: b8 04 00 00 00 mov $0x4,%eax
80480aa: cd 80 int $0x80
80480ac: b8 01 00 00 00 mov $0x1,%eax
80480b1: cd 80 int $0x80
So, the address we're going to set the breakpoint on is 0x8048096. Wait, this is not how real debuggers work, right? Real debuggers set breakpoints on lines of code and on functions, not on some bare memory addresses? Exactly right. But we're still far from there - to set breakpoints like real debuggers we still have to cover symbols and debugging information first, and it will take another part or two in the series to reach these topics. For now, we'll have to do with bare memory addresses.
所以,我们要设置断点的地址是 0x8048096。等等,真正的调试器不是这样工作的吧?真正的调试器是在代码行和函数上设置断点,而不是在一些裸露的内存地址上?完全正确。但我们还远未达到目标--要想像真正的调试器那样设置断点,我们还必须先了解符号和调试信息,而要达到这些主题,还需要本系列的另一两个部分。现在,我们只能使用裸露的内存地址。
At this point I really want to digress again, so you have two choices. If it's really interesting for you to know why the address is 0x8048096 and what does it mean, read the next section. If not, and you just want to get on with the breakpoints, you can safely skip it.
说到这里,我又想跑题了,所以你有两个选择。如果你真的很想知道为什么地址是 0x8048096,以及它意味着什么,请阅读下一节。如果不感兴趣,只想继续使用断点,则可以放心跳过。
Digression - process addresses and entry point
题外话--流程地址和切入点
Frankly, 0x8048096 itself doesn't mean much, it's just a few bytes away from the beginning of the text section of the executable. If you look carefully at the dump listing above, you'll see that the text section starts at 0x08048080. This tells the OS to map the text section starting at this address in the virtual address space given to the process. On Linux these addresses can be absolute (i.e. the executable isn't being relocated when it's loaded into memory), because with the virtual memory system each process gets its own chunk of memory and sees the whole 32-bit address space as its own (called "linear" address).
老实说,0x8048096 本身并没有什么意义,它只是距离可执行文件文本部分起始位置的几个字节而已。如果仔细查看上面的转储列表,就会发现文本部分的起始地址是 0x08048080。这就告诉操作系统将从这个地址开始的文本部分映射到进程的虚拟地址空间中。在 Linux 上,这些地址可以是绝对地址(即可执行文件在加载到内存时没有被重定位),因为在虚拟内存系统中,每个进程都有自己的内存块,并将整个 32 位地址空间视为自己的地址(称为 "线性 "地址)。
If we examine the ELF [5] header with readelf, we get:
如果我们用 readelf 检查 ELF [5] 标头,就会得到结果:
$ readelf -h traced_printer2
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Intel 80386
Version: 0x1
Entry point address: 0x8048080
Start of program headers: 52 (bytes into file)
Start of section headers: 220 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 2
Size of section headers: 40 (bytes)
Number of section headers: 4
Section header string table index: 3
Note the "entry point address" section of the header, which also points to 0x8048080. So if we interpret the directions encoded in the ELF file for the OS, it says:
请注意文件头的 "入口点地址 "部分,它也指向 0x8048080。因此,如果我们对 ELF 文件中为操作系统编码的指令进行解释,它是这样说的:
- Map the text section (with given contents) to address 0x8048080
将文本部分(包含给定内容)映射到地址 0x8048080 - Start executing at the entry point - address 0x8048080
从入口点开始执行 - 地址 0x8048080
But still, why 0x8048080? For historic reasons, it turns out. Some googling led me to a few sources that claim that the first 128MB of each process's address space were reserved for the stack. 128MB happens to be 0x8000000, which is where other sections of the executable may start. 0x8048080, in particular, is the default entry point used by the Linux ld linker. This entry point can be modified by passing the -Ttext argument to ld.
但是,为什么是 0x8048080?原来是出于历史原因。通过谷歌搜索,我找到了一些资料,其中声称每个进程地址空间的前 128MB 是为堆栈保留的。128MB 恰好是 0x8000000,也就是可执行文件其他部分的起始位置。尤其是 0x8048080,它是 Linux ld 链接器使用的默认入口点。这个入口点可以通过向 ld 传递 -Ttext 参数来修改。
To conclude, there's nothing really special in this address and we can freely change it. As long as the ELF executable is properly structured and the entry point address in the header matches the real beginning of the program's code (text section), we're OK.
总之,这个地址并没有什么特别之处,我们可以随意更改。只要 ELF 可执行文件结构合理,头文件中的入口点地址与程序代码(文本部分)的真正开头一致,我们就没问题。
Setting breakpoints in the debugger with int 3
用 int 3 在调试器中设置断点
To set a breakpoint at some target address in the traced process, the debugger does the following:
要在跟踪进程中的某个目标地址设置断点,调试器需要执行以下操作:
- Remember the data stored at the target address
记住目标地址存储的数据 - Replace the first byte at the target address with the int 3 instruction
用 int 3 指令替换目标地址的第一个字节
Then, when the debugger asks the OS to run the process (with PTRACE_CONT as we saw in the previous article), the process will run and eventually hit upon the int 3, where it will stop and the OS will send it a signal. This is where the debugger comes in again, receiving a signal that its child (or traced process) was stopped. It can then:
然后,当调试器要求操作系统运行该进程时(正如我们在前一篇文章中看到的 PTRACE_CONT ),该进程将运行并最终到达 int 3 处,在此进程将停止,操作系统将向其发送一个信号。这时调试器再次介入,接收其子进程(或跟踪进程)停止的信号。然后它就可以
- Replace the int 3 instruction at the target address with the original instruction
用原始指令替换目标地址上的 int 3 指令 - Roll the instruction pointer of the traced process back by one. This is needed because the instruction pointer now points after the int 3, having already executed it.
将跟踪进程的指令指针向后滚动一个。之所以需要这样做,是因为指令指针现在指向 int 3 之后,已经执行了该指令。 - Allow the user to interact with the process in some way, since the process is still halted at the desired target address. This is the part where your debugger lets you peek at variable values, the call stack and so on.
允许用户以某种方式与进程交互,因为进程仍停留在所需的目标地址。在这部分,调试器可以让你窥视变量值、调用堆栈等。 - When the user wants to keep running, the debugger will take care of placing the breakpoint back (since it was removed in step 1) at the target address, unless the user asked to cancel the breakpoint.
当用户希望继续运行时,除非用户要求取消断点,否则调试器会将断点(因为已在步骤 1 中移除)放回目标地址。
Let's see how some of these steps are translated into real code. We'll use the debugger "template" presented in part 1 (forking a child process and tracing it). In any case, there's a link to the full source code of this example at the end of the article.
让我们看看这些步骤是如何转化为实际代码的。我们将使用第一部分中介绍的调试器 "模板"(分叉子进程并跟踪它)。无论如何,文章末尾都有该示例完整源代码的链接。
/* Obtain and show child's instruction pointer */
ptrace(PTRACE_GETREGS, child_pid, 0, ®s);
procmsg("Child started. EIP = 0x%08x\n", regs.eip);
/* Look at the word at the address we're interested in */
unsigned addr = 0x8048096;
unsigned data = ptrace(PTRACE_PEEKTEXT, child_pid, (void*)addr, 0);
procmsg("Original data at 0x%08x: 0x%08x\n", addr, data);
Here the debugger fetches the instruction pointer from the traced process, as well as examines the word currently present at 0x8048096. When run tracing the assembly program listed in the beginning of the article, this prints:
在这里,调试器会从跟踪进程中获取指令指针,并检查当前位于 0x8048096 的字。当运行本文开头列出的汇编程序时,会打印出这样的结果:
[13028] Child started. EIP = 0x08048080
[13028] Original data at 0x08048096: 0x000007ba
So far, so good. Next:
到目前为止,一切顺利。下一个
/* Write the trap instruction 'int 3' into the address */
unsigned data_with_trap = (data & 0xFFFFFF00) | 0xCC;
ptrace(PTRACE_POKETEXT, child_pid, (void*)addr, (void*)data_with_trap);
/* See what's there again... */
unsigned readback_data = ptrace(PTRACE_PEEKTEXT, child_pid, (void*)addr, 0);
procmsg("After trap, data at 0x%08x: 0x%08x\n", addr, readback_data);
Note how int 3 is inserted at the target address. This prints:
注意 int 3 是如何插入目标地址的。这样就打印出来了:
[13028] After trap, data at 0x08048096: 0x000007cc
Again, as expected - 0xba was replaced with 0xcc. The debugger now runs the child and waits for it to halt on the breakpoint:
正如所料, 0xba 被替换为 0xcc 。调试器现在运行子程序,等待它在断点处停止:
/* Let the child run to the breakpoint and wait for it to
** reach it
*/
ptrace(PTRACE_CONT, child_pid, 0, 0);
wait(&wait_status);
if (WIFSTOPPED(wait_status)) {
procmsg("Child got a signal: %s\n", strsignal(WSTOPSIG(wait_status)));
}
else {
perror("wait");
return;
}
/* See where the child is now */
ptrace(PTRACE_GETREGS, child_pid, 0, ®s);
procmsg("Child stopped at EIP = 0x%08x\n", regs.eip);
This prints: 打印
Hello,
[13028] Child got a signal: Trace/breakpoint trap
[13028] Child stopped at EIP = 0x08048097
Note the "Hello," that was printed before the breakpoint - exactly as we planned. Also note where the child stopped - just after the single-byte trap instruction.
注意断点前打印的 "Hello"--与我们计划的完全一致。还要注意子程序停止的位置--就在单字节陷阱指令之后。
Finally, as was explained earlier, to keep the child running we must do some work. We replace the trap with the original instruction and let the process continue running from it.
最后,如前所述,为了让子进程继续运行,我们必须做一些工作。我们用原始指令替换陷阱,让进程继续运行。
/* Remove the breakpoint by restoring the previous data
** at the target address, and unwind the EIP back by 1 to
** let the CPU execute the original instruction that was
** there.
*/
ptrace(PTRACE_POKETEXT, child_pid, (void*)addr, (void*)data);
regs.eip -= 1;
ptrace(PTRACE_SETREGS, child_pid, 0, ®s);
/* The child can continue running now */
ptrace(PTRACE_CONT, child_pid, 0, 0);
This makes the child print "world!" and exit, just as planned.
这样,孩子就会按计划打印出 "世界!"并退出。
Note that we don't restore the breakpoint here. That can be done by executing the original instruction in single-step mode, then placing the trap back and only then do PTRACE_CONT. The debug library demonstrated later in the article implements this.
注意,这里我们不恢复断点。这可以通过在单步模式下执行原始指令,然后放回陷阱,再执行 PTRACE_CONT 来实现。本文稍后演示的调试库就是这样实现的。
More on int 3
关于 int 3 的更多信息
Now is a good time to come back and examine int 3 and that curious note from Intel's manual. Here it is again:
现在,我们不妨再来看看 int 3 和英特尔手册中的那条奇怪的说明。又来了
This one byte form is valuable because it can be used to replace the first byte of any instruction with a breakpoint, including other one byte instructions, without over-writing other code
这种单字节形式很有价值,因为它可以用来替换带有断点的任何指令的第一个字节,包括其他单字节指令,而不会覆盖其他代码
int instructions on x86 occupy two bytes - 0xcd followed by the interrupt number [6]. int 3 could've been encoded as cd 03, but there's a special single-byte instruction reserved for it - 0xcc.
x86 上的 int 指令占用两个字节 - 0xcd 后跟中断号 [6]。 int 3 可以编码为 cd 03 ,但有一个特殊的单字节指令为其保留 - 0xcc。
Why so? Because this allows us to insert a breakpoint without ever overwriting more than one instruction. And this is important. Consider this sample code:
为什么?因为这样可以插入断点,而不会覆盖一条以上的指令。这一点非常重要。请看这段示例代码:
.. some code ..
jz foo
dec eax
foo:
call bar
.. some code ..
Suppose we want to place a breakpoint on dec eax. This happens to be a single-byte instruction (with the opcode 0x48). Had the replacement breakpoint instruction been longer than 1 byte, we'd be forced to overwrite part of the next instruction (call), which would garble it and probably produce something completely invalid. But what is the branch jz foo was taken? Then, without stopping on dec eax, the CPU would go straight to execute the invalid instruction after it.
假设我们要在 dec eax 上设置断点。这恰好是一条单字节指令(操作码为 0x48 )。如果替换断点指令的长度超过 1 字节,我们就会被迫覆盖下一条指令( call )的部分内容,这样就会造成混乱,很可能产生完全无效的结果。但如果分支 jz foo 被执行了呢?那么,CPU 在执行 dec eax 时不会停止,而是直接执行其后的无效指令。
Having a special 1-byte encoding for int 3 solves this problem. Since 1 byte is the shortest an instruction can get on x86, we guarantee than only the instruction we want to break on gets changed.
为 int 3 采用特殊的 1 字节编码可以解决这个问题。由于 1 字节是 x86 上最短的指令,因此我们可以保证只有我们想要破解的指令才会被修改。
Encapsulating some gory details
概括一些血淋淋的细节
Many of the low-level details shown in code samples of the previous section can be easily encapsulated behind a convenient API. I've done some encapsulation into a small utility library called debuglib - its code is available for download at the end of the article. Here I just want to demonstrate an example of its usage, but with a twist. We're going to trace a program written in C.
上一节代码示例中显示的许多底层细节都可以很容易地封装在方便的应用程序接口后面。我已经在一个名为 debuglib 的小型实用程序库中做了一些封装,其代码可在文章末尾下载。在这里,我只想举例说明它的用法,但会有所变化。我们将跟踪一个用 C 语言编写的程序。
Tracing a C program
跟踪 C 程序
So far, for the sake of simplicity, I focused on assembly language targets. It's time to go one level up and see how we can trace a program written in C.
到目前为止,为了简单起见,我主要关注汇编语言目标。现在,我们要更上一层楼,看看如何跟踪用 C 语言编写的程序。
It turns out things aren't very different - it's just a bit harder to find where to place the breakpoints. Consider this simple program:
事实证明,情况并没有什么不同--只是在寻找断点的位置上有些困难。请看这个简单的程序:
#include <stdio.h>
void do_stuff()
{
printf("Hello, ");
}
int main()
{
for (int i = 0; i < 4; ++i)
do_stuff();
printf("world!\n");
return 0;
}
Suppose I want to place a breakpoint at the entrance to do_stuff. I'll use the old friend objdump to disassemble the executable, but there's a lot in it. In particular, looking at the text section is a bit useless since it contains a lot of C runtime initialization code I'm currently not interested in. So let's just look for do_stuff in the dump:
假设我想在 do_stuff 的入口处设置一个断点,我会使用老朋友 objdump 来反汇编可执行文件,但其中有很多内容。特别是,查看 text 部分有点无用,因为它包含了大量我目前不感兴趣的 C 运行时初始化代码。因此,我们只需查找转储中的 do_stuff :
080483e4 <do_stuff>:
80483e4: 55 push %ebp
80483e5: 89 e5 mov %esp,%ebp
80483e7: 83 ec 18 sub $0x18,%esp
80483ea: c7 04 24 f0 84 04 08 movl $0x80484f0,(%esp)
80483f1: e8 22 ff ff ff call 8048318 <puts@plt>
80483f6: c9 leave
80483f7: c3 ret
Alright, so we'll place the breakpoint at 0x080483e4, which is the first instruction of do_stuff. Moreover, since this function is called in a loop, we want to keep stopping at the breakpoint until the loop ends. We're going to use the debuglib library to make this simple. Here's the complete debugger function:
好了,我们将断点放在 0x080483e4 处,也就是 do_stuff 的第一条指令。此外,由于该函数是在一个循环中调用的,因此我们要一直在断点处停止,直到循环结束。我们将使用 debuglib 库来简化这一过程。下面是完整的调试器函数:
void run_debugger(pid_t child_pid)
{
procmsg("debugger started\n");
/* Wait for child to stop on its first instruction */
wait(0);
procmsg("child now at EIP = 0x%08x\n", get_child_eip(child_pid));
/* Create breakpoint and run to it*/
debug_breakpoint* bp = create_breakpoint(child_pid, (void*)0x080483e4);
procmsg("breakpoint created\n");
ptrace(PTRACE_CONT, child_pid, 0, 0);
wait(0);
/* Loop as long as the child didn't exit */
while (1) {
/* The child is stopped at a breakpoint here. Resume its
** execution until it either exits or hits the
** breakpoint again.
*/
procmsg("child stopped at breakpoint. EIP = 0x%08X\n", get_child_eip(child_pid));
procmsg("resuming\n");
int rc = resume_from_breakpoint(child_pid, bp);
if (rc == 0) {
procmsg("child exited\n");
break;
}
else if (rc == 1) {
continue;
}
else {
procmsg("unexpected: %d\n", rc);
break;
}
}
cleanup_breakpoint(bp);
}
Instead of getting our hands dirty modifying EIP and the target process's memory space, we just use create_breakpoint, resume_from_breakpoint and cleanup_breakpoint. Let's see what this prints when tracing the simple C code displayed above:
我们无需动手修改 EIP 和目标进程的内存空间,只需使用 create_breakpoint 、 resume_from_breakpoint 和 cleanup_breakpoint 即可。让我们看看在跟踪上面显示的简单 C 代码时会打印出什么:
$ bp_use_lib traced_c_loop
[13363] debugger started
[13364] target started. will run 'traced_c_loop'
[13363] child now at EIP = 0x00a37850
[13363] breakpoint created
[13363] child stopped at breakpoint. EIP = 0x080483E5
[13363] resuming
Hello,
[13363] child stopped at breakpoint. EIP = 0x080483E5
[13363] resuming
Hello,
[13363] child stopped at breakpoint. EIP = 0x080483E5
[13363] resuming
Hello,
[13363] child stopped at breakpoint. EIP = 0x080483E5
[13363] resuming
Hello,
world!
[13363] child exited
Just as expected! 果然不出所料!
The code 代码
Here are the complete source code files for this part. In the archive you'll find:
以下是本部分的完整源代码文件。在压缩包中,您可以找到
- debuglib.h and debuglib.c - the simple library for encapsulating some of the inner workings of a debugger
debuglib.h 和 debuglib.c - 用于封装调试器部分内部结构的简单库 - bp_manual.c - the "manual" way of setting breakpoints presented first in this article. Uses the debuglib library for some boilerplate code.
bp_manual.c - 本文首先介绍的 "手动 "设置断点的方法。在一些模板代码中使用了 debuglib 库。 - bp_use_lib.c - uses debuglib for most of its code, as demonstrated in the second code sample for tracing the loop in a C program.
bp_use_lib.c - 其大部分代码都使用 debuglib ,如第二个示例代码所示,用于跟踪 C 程序中的循环。
Conclusion and next steps
结论和今后的步骤
We've covered how breakpoints are implemented in debuggers. While implementation details vary between OSes, when you're on x86 it's all basically variations on the same theme - substituting int 3 for the instruction where we want the process to stop.
我们已经介绍过调试器是如何实现断点的。虽然不同操作系统的实现细节各不相同,但在 x86 操作系统上,基本上都是同一主题的变体--用 int 3 代替我们希望进程停止的指令。
That said, I'm sure some readers, just like me, will be less than excited about specifying raw memory addresses to break on. We'd like to say "break on do_stuff", or even "break on this line in do_stuff" and have the debugger do it. In the next article I'm going to show how it's done.
尽管如此,我相信有些读者和我一样,对指定原始内存地址进行断点续传并不那么兴奋。我们想说 "break on do_stuff ",甚至 "break on this line in do_stuff ",然后让调试器来执行。在下一篇文章中,我将介绍如何做到这一点。
References 参考资料
I've found the following resources and articles useful in the preparation of this article:
在撰写本文的过程中,我发现以下资源和文章非常有用:
- How debugger works 调试器如何工作
- Understanding ELF using readelf and objdump
使用 readelf 和 objdump 理解 ELF - Implementing breakpoints on x86 Linux
在 x86 Linux 上实施断点 - NASM manual NASM 手册
- SO discussion of the ELF entry point
SO 讨论 ELF 入口点 - This Hacker News discussion of the first part of the series
Hacker News 对该系列第一部分的讨论 - GDB Internals GDB 内部

| [1] | On a high-level view this is true. Down in the gory details, many CPUs today execute multiple instructions in parallel, some of them not in their original order. 从高层次来看,这是正确的。细究起来,如今的许多 CPU 都是并行执行多条指令,其中有些指令的执行顺序并不完全相同。 |
| [2] | The bible in this case being, of course, Intel's Architecture software developer's manual, volume 2A. 当然,这里所说的 "圣经 "就是英特尔的《架构软件开发人员手册》第 2A 卷。 |
| [3] | How can the OS stop a process just like that? The OS registered its own handler for int 3 with the CPU, that's how! 操作系统怎么能就这样停止一个进程呢?操作系统向 CPU 注册了自己的 int 3 处理程序,就是这样! |
| [4] | Wait, int again? Yes! Linux uses int 0x80 to implement system calls from user processes into the OS kernel. The user places the number of the system call and its arguments into registers and executes int 0x80. The CPU then jumps to the appropriate interrupt handler, where the OS registered a procedure that looks at the registers and decides which system call to execute. 等等,又是 int ?是的!Linux 使用 int 0x80 来实现从用户进程到操作系统内核的系统调用。用户将系统调用的编号及其参数放入寄存器,然后执行 int 0x80 。然后,CPU 跳转到相应的中断处理程序,在那里,操作系统会注册一个程序,查看寄存器并决定执行哪个系统调用。 |
| [5] | ELF (Executable and Linkable Format) is the file format used by Linux for object files, shared libraries and executables. ELF(可执行和可链接格式)是 Linux 用于对象文件、共享库和可执行文件的文件格式。 |
| [6] | An observant reader can spot the translation of int 0x80 into cd 80 in the dumps listed above. 细心的读者可以发现,在上述转储中, int 0x80 被翻译成了 cd 80 。 |