Exercise 1.3

At what point does the processor start executing 32-bit code? What exactly causes the switch from 16- to 32-bit mode?

关键代码从 boot/boot.S:48 开始, 首先建立直接映射的临时页表, 然后设置 CR0 上的 PE 位, 接着一个 ljmp $PROT_MODE_CSEG, $protcseg 就跳到了32位模式。 所以第一句32位代码就是 .code32 里面的第一句话, 也就是 movw $PROT_MODE_DSEG, %ax 那句。GDB 也很好地把这个边界标了出来。

The target architecture is assumed to be i8086
[f000:fff0]    0xffff0: ljmp   $0xf000,$0xe05b
0x0000fff0 in ?? ()
+ symbol-file obj/kern/kernel
(gdb) b *0x7c2a
Breakpoint 1 at 0x7c2a
(gdb) c
Continuing.
[   0:7c2a] => 0x7c2a:  mov    %eax,%cr0

Breakpoint 1, 0x00007c2a in ?? ()
(gdb) si
[   0:7c2d] => 0x7c2d:  ljmp   $0x8,$0x7c32
0x00007c2d in ?? ()
(gdb)
The target architecture is assumed to be i386
=> 0x7c32:  mov    $0x10,%ax
0x00007c32 in ?? ()
(gdb)

What is the last instruction of the boot loader executed, and what is the first instruction of the kernel it just loaded?

boot/boot.S:69: call bootmain 这句话是 boot loader 汇编部分的最后一句话, 这句话之后就跳转到了 boot/main.c:bootmain 了。 bootmain 直接操作磁盘,把内核读到内存里面,然后跳到内核里面。 所以 boot loader 最后一句话是 ((void (*)(void)) (ELFHDR->e_entry))();。 这句话走完了之后,跳到了 0x0010000c 位置 movw $0x1234,0x472

(gdb) b *0x7d6b
Breakpoint 2 at 0x7d6b
(gdb) c
Continuing.
=> 0x7d6b:  call   *0x10018

Breakpoint 2, 0x00007d6b in ?? ()
(gdb) x/x 0x10018
0x10018:  0x0010000c
(gdb) si
=> 0x10000c:  movw   $0x1234,0x472
0x0010000c in ?? ()
(gdb)

Where is the first instruction of the kernel?

经过查找 obj/kern/kernel.asm 可以找出来这句话来自 kern/entry.S:44,这是内核执行的第一句话。

How does the boot loader decide how many sectors it must read in order to fetch the entire kernel from disk? Where does it find this information?

boot/main.c:56 这句 readseg(ph->p_pa, ph->p_memsz, ph->p_offset); 每次会读取一个片段到内存里面, 这个函数内部会调用 readsect 读取一个扇区。 这句话本身又是套在了一个 for (; ph < eph; ph++) 里面的, 这两个变量是根据磁盘的头8个扇区里面的信息算出来的, 这头8个扇区是最先被 readseg((uint32_t) ELFHDR, SECTSIZE*8, 0); 读出来的。

Exercise 1.5

boot/Makefrag 里面 -Ttext 0x7c00 改成 -Ttext 0x7d00,然后 make clean 以及 make。 看看 obj/boot/boot.asm 可以发现,起始地址全都变成了 0x7d00 了。

重新开启 qemu 和 gdb,依然设置断点在 0x7c00,因为前面说了,BIOS 的规定就是 bootloader 从 0x7c00 开始跑。 往下走几句可以发现,尽管我们修改了 link address,但是从 0x7c00 开始的这些话确实是我们 boot.S 里面的代码。 这是因为在前面讲到,如果 BIOS 发现了可启动的软盘或者硬盘, 那么就会把上面的启动扇区,也就是第一个扇区(512字节),复制到 0x7c00 ~ 0x7dff

If the disk is bootable, the first sector is called the boot sector, since this is where the boot loader code resides. When the BIOS finds a bootable floppy or hard disk, it loads the 512-byte boot sector into memory at physical addresses 0x7c00 through 0x7dff, and then uses a jmp instruction to set the CS:IP to 0000:7c00, passing control to the boot loader. Like the BIOS load address, these addresses are fairly arbitrary - but they are fixed and standardized for PCs.

往下走都正常,直到 0x7c2d: ljmp $0x8,$0x7d32 这句话走过去之后,qemu 就会提示出错了。 这句话的上下文是:

  # Switch from real to protected mode, using a bootstrap GDT
  # and segment translation that makes virtual addresses 
  # identical to their physical addresses, so that the 
  # effective memory map does not change during the switch.
  lgdt    gdtdesc
  movl    %cr0, %eax
  orl     $CR0_PE_ON, %eax
  movl    %eax, %cr0
  
  # Jump to next instruction, but in 32-bit code segment.
  # Switches processor into 32-bit mode.
  ljmp    $PROT_MODE_CSEG, $protcseg

  .code32                     # Assemble for 32-bit mode
protcseg:
  # Set up the protected-mode data segment registers
  movw    $PROT_MODE_DSEG, %ax    # Our data segment selector

照道理来说,这句话就是要跳到下面的 protcseg 然而出错了(我估计出错原因是脏数据不能理解成CPU指令)。 实际上,在执行这句 ljmp 的时候已经有了页表,而上面载入页表的时候就没能成功。

lgdt gdtdesc 的意思就是把 gdtdesc 这个位置开始之后的6个字节载入到 GDTR 里面。 gdtdesc 这里 linker 给我们算出来的位置是 0x7d64,然而实际上它被装载在了 0x7c64。 这样 GDTR 的值就不正确了。

00007d64 <gdtdesc>:
    7d64:   17                      pop    %ss
    7d65:   00 4c 7d 00             add    %cl,0x0(%ebp,%edi,2)
    ...
(gdb) x/6xb 0x7d64
0x7d64: 0xff    0xff    0x83    0xc4    0x0c    0xeb
(gdb) x/6xb 0x7c64
0x7c64: 0x17    0x00    0x4c    0x7d    0x00    0x00

Exercise 1.6

(gdb) b *0x7c00
Breakpoint 1 at 0x7c00
(gdb) c
Continuing.
[   0:7c00] => 0x7c00:  cli

Breakpoint 1, 0x00007c00 in ?? ()
(gdb) x/8x 0x100000
0x100000:   0x00000000  0x00000000  0x00000000  0x00000000
0x100010:   0x00000000  0x00000000  0x00000000  0x00000000
(gdb) b *0x10000c
Breakpoint 2 at 0x10000c
(gdb) c
Continuing.
The target architecture is assumed to be i386
=> 0x10000c:    movw   $0x1234,0x472

Breakpoint 2, 0x0010000c in ?? ()
(gdb) x/8x 0x00100000
0x100000:   0x1badb002  0x00000000  0xe4524ffe  0x7205c766
0x100010:   0x34000004  0x0000b812  0x220f0011  0xc0200fd8

在刚进入 bootloader 的时候,看到的都是0,而进入了 kernel 之后看到的就和 obj/kern/kernel.asm 里面的看到的内容一样了。 这个原因还是很好理解的,刚进入 bootloader 的时候,1MB以上的空间都是0。 在进入内核之前,bootloader 开启了A20、进入了保护模式、载入了直接映射的页表、把内核载入到相应的内存空间中,所以就能看到我们想要的数据了。

Exercise 1.7

The target architecture is assumed to be i8086
[f000:fff0]    0xffff0: ljmp   $0xf000,$0xe05b
0x0000fff0 in ?? ()
+ symbol-file obj/kern/kernel
(gdb) b *0x100025
Breakpoint 1 at 0x100025
(gdb) c
Continuing.
The target architecture is assumed to be i386
=> 0x100025:    mov    %eax,%cr0

Breakpoint 1, 0x00100025 in ?? ()
(gdb) x/4x 0x00100000
0x100000:   0x1badb002  0x00000000  0xe4524ffe  0x7205c766
(gdb) x/4x 0xf0100000
0xf0100000 <_start+4026531828>: 0x00000000  0x00000000  0x00000000  0x00000000
(gdb) si
=> 0x100028:    mov    $0xf010002f,%eax
0x00100028 in ?? ()
(gdb) x/4x 0x00100000
0x100000:   0x1badb002  0x00000000  0xe4524ffe  0x7205c766
(gdb) x/4x 0xf0100000
0xf0100000 <_start+4026531828>: 0x1badb002  0x00000000  0xe4524ffe  0x7205c766
(gdb)

movl %eax, %cr0 之后页表就启用了嘛,然后他这个页表把 0xf0000000 ~ 0xf0400000 映射到了 0x00000000 ~ 0x00400000, 所以在启用页表之前看到的都是0,启用之后看到的就和直接看低地址的内存一样了。

如果把启用页表的这句话删了,很明显 jmp *%eax 之后就会出问题嘛,因为你跳到了很高的空间去,又没有启用页表,那你看到的就不是正常的指令。

# Load the physical address of entry_pgdir into cr3.  entry_pgdir
    # is defined in entrypgdir.c.
    movl    $(RELOC(entry_pgdir)), %eax
    movl    %eax, %cr3
    # Turn on paging.
    movl    %cr0, %eax
    orl $(CR0_PE|CR0_PG|CR0_WP), %eax
    movl    %eax, %cr0

    # Now paging is enabled, but we're still running at a low EIP
    # (why is this okay?).  Jump up above KERNBASE before entering
    # C code.
    mov $relocated, %eax
    jmp *%eax
relocated:

Exercise 1.8

Explain the interface between printf.c and console.c. Specifically, what function does console.c export? How is this function used by printf.c?

  • lib/printfmt.c 放在 lib 下面嘛,那就是库嘛,就是做好格式化工作。
  • kern/console.c 就是一个干脏活的地方,它把最底层的键盘、串口、VGA等IO包装了一下,于是就有了一些比较“高级”的API。
  • kern/printf.c 简单包装提供了从控制台输出输出的函数。它调用 kern/console.c 的API进行输出,调用 lib/printfmt.c 进行格式化。

来看看 kern/console.c 向外暴露了哪些函数(也就是没有加 static 的函数):

  • 用于初始化
    • void serial_intr(void)
    • void cons_init(void)
    • void kbd_intr(void)
  • 低级API
    • int cons_getc(void)
  • “高级”API
    • void cputchar(int c)
    • int getchar(void)
    • int iscons(int fdnum)

上面这些函数里面只有 void cputchar(int c)kern/printf.c 调用了,因为 printf 就是输出嘛,没有输入。

Explain the following from console.c:

if (crt_pos >= CRT_SIZE) {
        int i;
        memmove(crt_buf, crt_buf + CRT_COLS, (CRT_SIZE - CRT_COLS) * sizeof(uint16_t));
        for (i = CRT_SIZE - CRT_COLS; i < CRT_SIZE; i++)
                crt_buf[i] = 0x0700 | ' ';
        crt_pos -= CRT_COLS;
}

这个 crt_pos 就是指光标的位置, CRT_SIZE 就是整个屏幕有多大。这段话就是说把整个屏幕上移一行,腾出最下面一行并且把它清空。

Trace the execution of the following code step-by-step:

int x = 1, y = 3, z = 4;
cprintf("x %d, y %x, z %d\n", x, y, z);
In the call to cprintf(), to what does fmt point? To what does ap point?

我把这段程序加到了 kern/init.c 里面了。看一看 obj/kern/kernel.asm 的汇编结果其实就能知道这两个问题的答案了。

    int x = 1, y = 3, z = 4;
    cprintf("x %d, y %x, z %d\n", x, y, z);
f01000c8:   6a 04                   push   $0x4
f01000ca:   6a 03                   push   $0x3
f01000cc:   6a 01                   push   $0x1
f01000ce:   68 d2 18 10 f0          push   $0xf01018d2
f01000d3:   e8 20 08 00 00          call   f01008f8 <cprintf>

调用 cprintf 的时候按照约定逆序把参数丢到栈上。

f01008f8 <cprintf>:

int
cprintf(const char *fmt, ...)
{
f01008f8:   55                      push   %ebp
f01008f9:   89 e5                   mov    %esp,%ebp
f01008fb:   83 ec 10                sub    $0x10,%esp
    va_list ap;
    int cnt;

    va_start(ap, fmt);
f01008fe:   8d 45 0c                lea    0xc(%ebp),%eax
    cnt = vcprintf(fmt, ap);
f0100901:   50                      push   %eax
f0100902:   ff 75 08                pushl  0x8(%ebp)
f0100905:   e8 c8 ff ff ff          call   f01008d2 <vcprintf>
    va_end(ap);

    return cnt;
}
f010090a:   c9                      leave  
f010090b:   c3                      ret    

看这一段代码可以知道,栈里从低到高依次存的是 ebp, fmt, x, y, z。 这里 fmt 的值就是 MEM[0x8(%ebp)] 也就是 0xf01018d2ap 指向的是可变参数的列表的头一个参数的地址,这里也就是 x 在栈上的地址,也就是 0xc(%ebp)

List (in order of execution) each call to cons_putc, va_arg, and vcprintf. For cons_putc, list its argument as well. For va_arg, list what ap points to before and after the call. For vcprintf list the values of its two arguments.

对于 cons_putcvcprintf,我们直接在 GDB 上面卡断点就好了。 而 va_arg 是一个宏,没法直接卡断点。不过观察一下,调用 va_arg 的地方在 vprintfmtgetint。观察一下,实际上,只有可能在 lib/printfmt.c:75 这句 return va_arg(*ap, int); 里面用到,所以在这行卡个断点就好了。

(gdb) b cons_putc
(gdb) b vcprintf
(gdb) b lib/printfmt.c:75

vcprintf (fmt=0xf01018d2 "x %d, y %x, z %d\n", ap=0xf010ffd4 "\001")
(gdb) x/3x 0xf010ffd4
0xf010ffd4: 0x00000001  0x00000003  0x00000004

cons_putc (c=120)  'x'
cons_putc (c=32)   ' '
va_arg: before: 0xf010ffd4, after: 0xf010ffd8
cons_putc (c=49)   '1'
cons_putc (c=44)   ','
cons_putc (c=32)   ' '
cons_putc (c=121)  'y'
cons_putc (c=32)   ' '
va_arg: before: 0xf010ffd8, after: 0xf010ffdc
cons_putc (c=51)   '3'
cons_putc (c=44)   ','
cons_putc (c=32)   ' '
cons_putc (c=122)  'z'
cons_putc (c=32)   ' '
va_arg: before: 0xf010ffdc, after: 0xf010ffe0
cons_putc (c=52)   '4'
cons_putc (c=10)   '\n'

这个运行结果在意料之内。这里顺带说一下 va_arg(va, type),实际上用下面伪代码(如果不想写成宏的话、不考虑对齐什么的)就可以说清楚了:

type va_arg(void* va, type) {
    type val = (type)va;
    va += sizeof(type);
    return val;
}

Run the following code.

unsigned int i = 0x00646c72;
cprintf("H%x Wo%s", 57616, &i);

What is the output? Explain how this output is arrived at in the step-by-step manner of the previous exercise.

会输出 He110 World。前面很好解释,57616 = 0xe110。因为x86是小端序,也就是把低位放在前面,所以 0x00646c72 放在内存里顺序读过去就是 0x72, 0x6c, 0x64, 0x00 = "rld"

The output depends on that fact that the x86 is little-endian. If the x86 were instead big-endian what would you set i to in order to yield the same output? Would you need to change 57616 to a different value?

大端序的话,那就得把 i 反过来了 i = 0x726c6400(正好就是从左到右的自然阅读顺序)。57616 就不需要反过来了,毕竟它是值。 i 需要反过来是因为这里是当作连续内存空间来读的。

In the following code, what is going to be printed after ‘y=’? (note: the answer is not a specific value.) Why does this happen? cprintf(“x=%d y=%d”, 3);

这个还是很明显的嘛,第二个 %d 会得到一个垃圾值。

Let’s say that GCC changed its calling convention so that it pushed arguments on the stack in declaration order, so that the last argument is pushed last. How would you have to change cprintf or its interface so that it would still be possible to pass it a variable number of arguments?

实际上只要改 va_start, va_arg 就好了,把增长方向反一下就行了。

Exercise 1.9

注意 kern/entry.S 的最后:

relocated:

    # Clear the frame pointer register (EBP)
    # so that once we get into debugging C code,
    # stack backtraces will be terminated properly.
    movl    $0x0,%ebp           # nuke frame pointer

    # Set the stack pointer
    movl    $(bootstacktop),%esp

    # now to C code
    call    i386_init

    # Should never get here, but in case we do, just spin.
spin:   jmp spin


.data
###################################################################
# boot stack
###################################################################
    .p2align    PGSHIFT     # force page alignment
    .globl      bootstack
bootstack:
    .space      KSTKSIZE
    .globl      bootstacktop   
bootstacktop:

上面两句设置 ebpesp 的就是设置栈指针。看下面 .data 区,首先是做了一个页对齐,接着留了一个大小为 KSTKSIZE 的空间,并且把最后那个位置设置成栈顶 bootstacktop。因为栈是从高往低增长的,所以说这个栈顶是在最高位的。看看 obj/kern/kernel.asm

    # Set the stack pointer
    movl    $(bootstacktop),%esp
f0100034:   bc 00 00 11 f0          mov    $0xf0110000,%esp

这我们就知道了栈顶在 0xf0110000。查 inc/memlayout.h 我们可以知道 KSTKSIZE = 8 * PGSIZE = 8 * 4096 = 32KB,所以说我们知道这个栈的大小是 32KB,栈底在 0xF0108000。所以说呢,内核就是在 .data 字段给自己留了一个 32KB 的栈。

顺带追究一下,为什么 .data 段是从 0xF0108000 前面一点点的位置(就是扣掉对齐的那部分)开始的呢?从 kern/kernel.ld 就可以知道了:

/* Link the kernel at this address: "." means the current address */
. = 0xF0100000;

/* AT(...) gives the load address of this section, which tells
   the boot loader where to load the kernel in physical memory */
.text : AT(0x100000) {
    *(.text .stub .text.* .gnu.linkonce.t.*)
}

/* The data segment */
.data : {
    *(.data)
}

Exercise 1.10

f0100040 <test_backtrace>:
#include <kern/console.h>

// Test the stack backtrace function (lab 1 only)
void
test_backtrace(int x)
{
f0100040:   55                      push   %ebp
f0100041:   89 e5                   mov    %esp,%ebp
f0100043:   53                      push   %ebx
f0100044:   83 ec 0c                sub    $0xc,%esp
f0100047:   8b 5d 08                mov    0x8(%ebp),%ebx
    cprintf("entering test_backtrace %d\n", x);
f010004a:   53                      push   %ebx
f010004b:   68 e0 18 10 f0          push   $0xf01018e0
f0100050:   e8 04 09 00 00          call   f0100959 <cprintf>

看这个 prologue,干这么几件事情:

  1. 备份 base pointer
  2. 把旧的栈顶设为现在的 base pointer
  3. 备份 callee saved 寄存器 ebx
  4. 扩展栈的大小
  5. x 的值读出来

Exercise 1.11

从上面那个代码片段,我们可以知道 calling convention,就可以知道栈从高到低是什么了,于是我们沿着 %ebp 一路往回就行了。

high  |            |
  ^   |____________|
  |   |    argN    |
  |   |    arg1    |
  |   |    arg0    |
  |   |  old %eip  |
  |   |  old %ebp  |  <--  %ebp  
  |   | local vars |
  |   | saved regs |  <--- %esp
  v   |------------|
 low  |            |

Exercise 1.12

想要知道每个函数的名称,这个肯定是要从 linker 入手。具体的细节不想多研究了,总之大佬们找到了一个精妙的方法把函数信息弄了出来。 另外我发现 Eipdebuginfo 里面有个 eip_fn_narg 域,于是就加了个判断,使得输出的参数个数和这个函数参数列表中的参数个数相同。

前面的题目描述两里面问说为什么 back tracer 不知道参数个数,就是因为栈上除了有 args, %eip, %ebp,还有局部变量和 caller-saved registers,所以你光靠指针运算算不出参数个数的。要想知道参数个数,要么修改 calling convention 把参数个数压栈,要么就是像这里一样,从 linker 获取函数信息。

为了更好地演示,我在 test_backtrace 上面多加了一个参数。跑起来感觉还不错。

entering test_backtrace x=00000005 sum=00000000
entering test_backtrace x=00000004 sum=00000005
entering test_backtrace x=00000003 sum=00000009
entering test_backtrace x=00000002 sum=0000000c
entering test_backtrace x=00000001 sum=0000000e
entering test_backtrace x=00000000 sum=0000000f
Stack backtrace:
  ebp f010ff08  eip f0100084 ARGS5 00000000 00000000 00000000 00000000 00000001
         kern/init.c:18: test_backtrace+68:
  ebp f010ff28  eip f0100071  args 00000000 0000000f
         kern/init.c:16: test_backtrace+49:
  ebp f010ff48  eip f0100071  args 00000001 0000000e
         kern/init.c:16: test_backtrace+49:
  ebp f010ff68  eip f0100071  args 00000002 0000000c
         kern/init.c:16: test_backtrace+49:
  ebp f010ff88  eip f0100071  args 00000003 00000009
         kern/init.c:16: test_backtrace+49:
  ebp f010ffa8  eip f0100071  args 00000004 00000005
         kern/init.c:16: test_backtrace+49:
  ebp f010ffc8  eip f010010d  args 00000005 00000000
         kern/init.c:44: i386_init+109:
  ebp f010fff8  eip f010003e  args
         kern/entry.S:83: <unknown>+0:
leaving test_backtrace x=00000000 sum=0000000f
leaving test_backtrace x=00000001 sum=0000000e
leaving test_backtrace x=00000002 sum=0000000c
leaving test_backtrace x=00000003 sum=00000009
leaving test_backtrace x=00000004 sum=00000005
leaving test_backtrace x=00000005 sum=00000000
Welcome to the JOS kernel monitor!
Type 'help' for a list of commands.
K> help
help - Display this list of commands
kerninfo - Display information about the kernel
backtrace - Display stack backtrace
K> backtrace
Stack backtrace:
  ebp f010ff58  eip f01009bc ARGS5 00000001 f010ff70 00000000 f010ffbc f0112540
         kern/monitor.c:150: monitor+256:
  ebp f010ffc8  eip f010011a  args 00000000
         kern/init.c:48: i386_init+122:
  ebp f010fff8  eip f010003e  args
         kern/entry.S:83: <unknown>+0:
K>