6.828 Lab4: Preemptive Multitasking

Exercise 4.1

就如 hint 所说，在 kern_pgdir 上面调用 boot_map_region 就好了。

Exercise 4.2

稍微改一下就好了。

Compare kern/mpentry.S side by side with boot/boot.S. Bearing in mind that kern/mpentry.S is compiled and linked to run above KERNBASE just like everything else in the kernel, what is the purpose of macro MPBOOTPHYS? Why is it necessary in kern/mpentry.S but not in boot/boot.S? In other words, what could go wrong if it were omitted in kern/mpentry.S?

Hint: recall the differences between the link address and the load address that we have discussed in Lab 1.

kern/mpentry.S 相比 boot/boot.S 有以下差别：

没有 Enable A20 的部分
GDT 相关的地址都用 MPBOOTPHYS 宏包装了一下
栈设置在了 mpentry_kstack
跳转到入口 mp_main

因为 kern/mpentry.S 都链接到了高位的虚拟地址，但是实际上装载在低位的物理地址，所以 MPBOOTPHYS 要把这个高位的地址映射到低位的地址。boot/boot.S 装载在低位并且链接也在低位，所以就不需要这样的宏。

abcdabcd987@vm-ubuntu:~/6.828$ objdump -h obj/boot/boot.out
obj/boot/boot.out:     file format elf32-i386
Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         00000186  00007c00  00007c00  00000074  2**2
                  CONTENTS, ALLOC, LOAD, CODE
  1 .eh_frame     000000a8  00007d88  00007d88  000001fc  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .stab         00000720  00000000  00000000  000002a4  2**2
                  CONTENTS, READONLY, DEBUGGING
  3 .stabstr      0000088f  00000000  00000000  000009c4  2**0
                  CONTENTS, READONLY, DEBUGGING
  4 .comment      00000034  00000000  00000000  00001253  2**0
                  CONTENTS, READONLY


abcdabcd987@vm-ubuntu:~/6.828$ objdump -h obj/kern/kernel
obj/kern/kernel:     file format elf32-i386
Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         000059a1  f0100000  00100000  00001000  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .rodata       00001c17  f01059c0  001059c0  000069c0  2**5
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .stab         00009889  f01075d8  001075d8  000085d8  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 .stabstr      0000360f  f0110e61  00110e61  00011e61  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .data         001142b0  f0115000  00115000  00016000  2**12
                  CONTENTS, ALLOC, LOAD, DATA
  5 .bss          00042008  f022a000  0022a000  0012a2b0  2**12
                  ALLOC
  6 .comment      00000034  00000000  00000000  0012a2b0  2**0
                  CONTENTS, READONLY

Exercise 4.3

就照着 inc/memlayout.h 里面那张图来改就好了，起始地址算好，然后用 boot_map_region 还有 PADDR 就行了。

Exercise 4.4

基本上就照着 hints 改就好了，但是之后就一直出问题。偷看了下别人的代码，要写成 ltr(GD_TSS0 + i * sizeof(struct Segdesc)); 才能过。看 inc/memlayout.h 里面几个 GD_* 的定义大概可以知道按照 sizeof(struct Segdesc) 来偏移。具体为什么还是不是很明白。有待研究一下。

Exercise 4.5

It seems that using the big kernel lock guarantees that only one CPU can run the kernel code at a time. Why do we still need separate kernel stacks for each CPU? Describe a scenario in which using a shared kernel stack will go wrong, even with the protection of the big kernel lock.

在中断触发的时候，CPU会把 trap 信息压栈。这个过程发生在 lock_kernel() 之前，所以大锁也无法保护。

Exercise 4.6

一切正常，但奇怪的是：

Setting CPUS larger than 1 at this time may result in a general protection or kernel page fault once there are no more runnable environments due to unhandled timer interrupts (which we will fix below!).

然而我并没有遇到这个问题？不管有几个CPU，都能正常调出 monitor？？

In your implementation of env_run() you should have called lcr3(). Before and after the call to lcr3(), your code makes references (at least it should) to the variable e, the argument to env_run. Upon loading the %cr3 register, the addressing context used by the MMU is instantly changed. But a virtual address (namely e) has meaning relative to a given address context–the address context specifies the physical address to which the virtual address maps. Why can the pointer e be dereferenced both before and after the addressing switch?

内核页表的映射是一样的。

Whenever the kernel switches from one environment to another, it must ensure the old environment’s registers are saved so they can be restored properly later. Why? Where does this happen?

因为中断可能发生在任何时候，所以要保护整个现场。见 Lab 3，一部分现场由CPU推到栈里面，另一部分由 kern/trapentry.S:_alltraps 推到栈里面。同理，恢复现场的时候，一部分由 kern/env.c:env_pop_tf 做，一部分由CPU做。

Exercise 4.7

好像也没啥好说的，hints 给的都非常详细。之前还觉得，这也不检查那也不检查，真是非常胆大。然后现在开始写用户态接口了，满满都是检查。说明再怎么胆大，在不可信输入的情况下都得老老实实的。

Exercise 4.8

隔了一个半月之后终于开始做 Lab B。

就先和前面一样找到 env，然后再设置一下结构体里面的成员就好了。我一开始觉得很奇怪，为什么这里不需要对 func 做检查？后来看到了下一题前面的说明，想通了。这个检查做不到，也不需要。因为就算这个 func 毒性满满，那等我们要运行这个 func 的时候，页表也切换了，DPL 也切换了，也就是说这个异常的处理是在用户态上做的，所以没有什么危险性。就算有危险性，也是用户程序自己坑自己呀。

Exercise 4.9

首先我们要做一些检查，确保用户态程序能处理 page fault、异常栈还有空间。然后，根据这是用户态程序第一次进入 page fault 还是递归地进入，确定当前异常栈的起始位置。接下来，我们按照格式，把 TrapFrame 复制到异常栈中对应的位置。最后，我们把当前进程的栈换到异常栈去，并且跳转到用户的 page fault handler。

Exercise 4.10

做上一题的时候就在想为啥在异常栈上要留一个空位，我猜是因为要用特殊的方法做跳转。然后读到 pfentry.S 的注释的时候发现猜对了！我们同时想要恢复所有寄存器，又想做一个绝对跳转。所以说我们把跳转的目的地推到栈上，然后用 ret 来做跳转。

这一段汇编程序要做的事情是：

调用给定的 _pgfault_handler
跳回到用户态程序中触发 page fault 的那一条指令

我们首先要把跳转的目的地推到 trap stack 上面，这样后面才可以通过 ret 调回去，这一步还是比较烦的。后面几步相对来说还是非常简单的，尤其是还给了详细的 hint。写这个汇编的时候，还是得对着 lab 说明里面画的那个 exception stack 结构来做，因为要算各种变量的偏移。

Exercise 4.11

因为这个用户是在用户态，所以说我们要申请内存的话就要用 syscall 了。这里要用到 sys_page_alloc 来申请一个 page 用来做异常栈。然后用 sys_env_set_pgfault_upcall 把我们刚刚用汇编写的 _pgfault_upcall 设置为当前进程的 page fault handler。另外，在用户态程序里面要找到当前进程，应该用 thisenv 而不是 curenv，顺手写了个 curenv 结果就编译错误了。

Exercise 4.8 的时候忘记把 sys_env_set_pgfault_upcall 加到 syscall() 的 switch 里面了，结果老是错，查了好久才发现。

Exercise 4.12

这题还是挺烦的，hint 少，然后又要写大段代码。

fork 首先参考 dumbfork，用 sys_exofork() 把子进程弄好，然后专心在父进程上设置子进程的页表。页表部分，一是把整个 < UTOP-PGSIZE 的部分做个 COW；二是开一个新的异常栈。
duppage
- 对于有 PTE_COW 或者有 PTE_W 的，我们要把自己和儿子对应的页表项都设置成 PTE_COW 且没有 PTE_W。因为无论是自己还是儿子都不应能够直接写那段地址，而是要先触发一次 page fault 再复制一份新的页。
- 对于两者都没有的，也就只读的页，就保持原有的权限就好了。
pgfault
- 错误类型在哪呢，找了一下才发现在 mmu.h 里面有 FEC_WR
- 三个 syscall 就是 sys_page_{alloc,map,unmap}

Exercise 4.13

怎么处理宏还是一个大问题，没找到优雅的方法。

Exercise 4.14

注意调用 lapic_eoi() 就好了。

Exercise 4.15

虽然是大段代码，但是大部分都是在检测非正常情况，而且有详细的说明。但是这里我调了好久，因为错的地方可能在前面。在多 CPU 的情况下，我的 primes 有时是对的有时是错的，我检查了过 Part C 的实现、检查了锁、检查了多核支持，最后发现是我的 sched_yield 写错了。

有个地方要注意一下，sys_ipc_try_send 的注释里面说：

The target environment is marked runnable again, returning 0 from the paused sys_ipc_recv system call. (Hint: does the sys_ipc_recv function ever actually return?)

这提醒了我们 sys_ipc_recv 并不会返回，而是要调用 sched_yield() 让出CPU。在让出CPU之前，要把 %eax 也就是返回值设置成0 （可以回过头看一下 trap_dispatch 里面处理 T_SYSCALL 的地方就明白了）。