MIT-6.S081-2021 Lab6: Multithreading
https://pdos.csail.mit.edu/6.S081/2021/labs/thread.html 主要熟悉多线程的一些操作。
1. Uthread: switching between threads
1.1 要求
In this exercise you will design the context switch mechanism for a user-level threading system, and then implement it. To get you started, your xv6 has two files user/uthread.c and user/uthread_switch.S, and a rule in the Makefile to build a uthread program. uthread.c contains most of a user-level threading package, and code for three simple test threads. The threading package is missing some of the code to create a thread and to switch between threads.
Your job is to come up with a plan to create threads and save/restore registers to switch between threads, and implement that plan. When you're done, make grade should say that your solution passes the uthread test.
实现 user-level 级别的线程切换。主要实现 thread_switch
和 thread_create
接口。
1.2 分析
1.2.1 thread_create
创建线程主要关注 2 点:
- 设置线程栈,
thread.stack
便是用来作为线程运行时的栈。其对应的寄存器为sp
寄存器,这里需要注意栈是往下增长的,因此应该sp = thread.stack + stacksize
- 设置要运行的函数,线程创建时会传入一个函数,运行线程时执行该函数,可以通过
ret
指令会跳转到ra
寄存器的特性,将函数地址保存到ra
寄存器
1.2.2 thread_switch
线程切换主要要做到
- 保存当前线程上下文,也就是所有 callee 寄存器,此外还有
sp
寄存器和ra
寄存器 - 将目标线程的寄存器加载到对应的寄存器
1.3 实现
首先这里增加 thread_context
结构,用于保存上下文环境(callee 寄存器)
struct thread_context {
uint64 ra;
uint64 sp;
// callee-saved
uint64 s0;
uint64 s1;
uint64 s2;
uint64 s3;
uint64 s4;
uint64 s5;
uint64 s6;
uint64 s7;
uint64 s8;
uint64 s9;
uint64 s10;
uint64 s11;
};
struct thread {
struct thread_context context;
char stack[STACK_SIZE]; /* the thread's stack */
int state; /* FREE, RUNNING, RUNNABLE */
};
初始化线程
void thread_init(void)
{
// main() is thread 0, which will make the first invocation to
// thread_schedule(). it needs a stack so that the first thread_switch() can
// save thread 0's state. thread_schedule() won't run the main thread ever
// again, because its state is set to RUNNING, and thread_schedule() selects
// a RUNNABLE thread.
current_thread = &all_thread[0];
current_thread->state = RUNNING;
}
void thread_create(void (*func)())
{
struct thread *t;
for (t = all_thread; t < all_thread + MAX_THREAD; t++) {
if (t->state == FREE) break;
}
t->state = RUNNABLE;
// YOUR CODE HERE
t->context.sp = (uint64)t->stack + STACK_SIZE;
t->context.ra = (uint64)func;
}
切换线程,这里需要注意,首先切换到下一个线程之前,需要把当前线程状态置为 RUNABLE
,但是如果当前线程是 FREE
状态(表示线程任务已经结束了)和 main
线程( main 函数的线程一旦切换出去,不需要再切回去),则不需要将当前线程状态重置为 RUNABLE
。
void thread_schedule(void)
{
struct thread *t, *next_thread;
/* Find another runnable thread. */
next_thread = 0;
t = current_thread + 1;
for(int i = 0; i < MAX_THREAD; i++){
if(t >= all_thread + MAX_THREAD)
t = all_thread;
if(t->state == RUNNABLE) {
next_thread = t;
break;
}
t = t + 1;
}
if (next_thread == 0) {
printf("thread_schedule: no runnable threads\n");
exit(-1);
}
//printf("found thread %x\n", next_thread);
if (current_thread != next_thread) { /* switch threads? */
next_thread->state = RUNNING;
t = current_thread;
current_thread = next_thread;
/* YOUR CODE HERE
* Invoke thread_switch to switch from t to next_thread:
* thread_switch(??, ??);
*/
if (t != &all_thread[0] && t->state != FREE) {
t->state = RUNNABLE;
}
thread_switch(&(t->context), &(current_thread->context));
} else
next_thread = 0;
}
上下文切换
.text
/*
* save the old thread's registers,
* restore the new thread's registers.
*/
.globl thread_switch
thread_switch:
sd ra, 0(a0)
sd sp, 8(a0)
sd s0, 16(a0)
sd s1, 24(a0)
sd s2, 32(a0)
sd s3, 40(a0)
sd s4, 48(a0)
sd s5, 56(a0)
sd s6, 64(a0)
sd s7, 72(a0)
sd s8, 80(a0)
sd s9, 88(a0)
sd s10, 96(a0)
sd s11, 104(a0)
ld ra, 0(a1)
ld sp, 8(a1)
ld s0, 16(a1)
ld s1, 24(a1)
ld s2, 32(a1)
ld s3, 40(a1)
ld s4, 48(a1)
ld s5, 56(a1)
ld s6, 64(a1)
ld s7, 72(a1)
ld s8, 80(a1)
ld s9, 88(a1)
ld s10, 96(a1)
ld s11, 104(a1)
ret /* return to ra */
2. Using threads
2.1 要求
In this assignment you will explore parallel programming with threads and locks using a hash table. You should do this assignment on a real Linux or MacOS computer (not xv6, not qemu) that has multiple cores. Most recent laptops have multicore processors.
This assignment uses the UNIX pthread threading library. You can find information about it from the manual page, with man pthreads, and you can look on the web, for example here, here, and here.
The file notxv6/ph.c contains a simple hash table that is correct if used from a single thread, but incorrect when used from multiple threads. In your main xv6 directory (perhaps ~/xv6-labs-2021), type this:
$ make ph
$ ./ph 1
Note that to build ph the Makefile uses your OS's gcc, not the 6.S081 tools. The argument to ph specifies the number of threads that execute put and get operations on the the hash table. After running for a little while, ph 1 will produce output similar to this:
100000 puts, 3.991 seconds, 25056 puts/second
0: 0 keys missing
100000 gets, 3.981 seconds, 25118 gets/second
The numbers you see may differ from this sample output by a factor of two or more, depending on how fast your computer is, whether it has multiple cores, and whether it's busy doing other things.
ph runs two benchmarks. First it adds lots of keys to the hash table by calling put(), and prints the achieved rate in puts per second. The it fetches keys from the hash table with get(). It prints the number keys that should have been in the hash table as a result of the puts but are missing (zero in this case), and it prints the number of gets per second it achieved.
You can tell ph to use its hash table from multiple threads at the same time by giving it an argument greater than one. Try ph 2:
$ ./ph 2 100000 puts, 1.885 seconds, 53044 puts/second
1: 16579 keys missing
0: 16579 keys missing
200000 gets, 4.322 seconds, 46274 gets/second
The first line of this ph 2 output indicates that when two threads concurrently add entries to the hash table, they achieve a total rate of 53,044 inserts per second. That's about twice the rate of the single thread from running ph 1. That's an excellent "parallel speedup" of about 2x, as much as one could possibly hope for (i.e. twice as many cores yielding twice as much work per unit time).
However, the two lines saying 16579 keys missing indicate that a large number of keys that should have been in the hash table are not there. That is, the puts were supposed to add those keys to the hash table, but something went wrong. Have a look at notxv6/ph.c, particularly at put() and insert().
简单来说,就是当 ph.c
实现了一个 hash table,先对该 hash table 进行 put
操作,随后再执行 get
操作,当其编译出来的进程指定线程为 2 时,get
操作会存在一定概率 miss。指定线程为 1 时, get
操作则不会出现找不到 key 的现象。
2.2 分析
2.2.1 为什么多线程时会出现 key 丢失现象?
首先,参考 put
代码:
static void insert(int key, int value, struct entry **p, struct entry *n)
{
struct entry *e = malloc(sizeof(struct entry));
e->key = key;
e->value = value;
e->next = n;
*p = e;
}
static void put(int key, int value)
{
int i = key % NBUCKET;
// is the key already present?
struct entry *e = 0;
for (e = table[i]; e != 0; e = e->next) {
if (e->key == key)
break;
}
if(e){
// update the existing key.
e->value = value;
} else {
// the new is new.
insert(key, value, &table[i], table[i]);
}
}
单线程时,只有一个线程在工作,同一时刻只会触发一次 insert,而多线程时,有可能 2 个线程同时触发同一个 BUCKET 的 insert,传入同一个 table[i]
,此时会导致 2 个线程只有 1 个会插入成功,后执行线程会覆盖 table[i]
,而不是按预期一样,成为 table[i].next
。
2.2.2 解决方式
该问题解决方式比较简单,给 put
操作加锁即可,此刻还要考虑一个问题,如果只有一个锁,那么多个线程同时执行 put
操作时,只能串行工作,为了提升效率,可以转为给该 hash table 的每个 BUCKET 增加一个锁,只有多个线程同时对同一个 BUCKET 进行 put
操作时,才会导致串行工作。
2.3 实现
pthread_mutex_t lock_table[NBUCKET];
static void put(int key, int value)
{
int i = key % NBUCKET;
pthread_mutex_t* lock = &lock_table[i];
pthread_mutex_lock(lock);
// is the key already present?
struct entry *e = 0;
for (e = table[i]; e != 0; e = e->next) {
if (e->key == key)
break;
}
if(e){
// update the existing key.
e->value = value;
} else {
// the new is new.
insert(key, value, &table[i], table[i]);
}
pthread_mutex_unlock(lock);
}
int main(int argc, char *argv[])
{
// some code ...
for (int i = 0; i < NBUCKET; i++) {
pthread_mutex_init(&lock_table[i], NULL);
}
// some code ...
}
3. Barrier
3.1 要求
In this assignment you'll implement a barrier: a point in an application at which all participating threads must wait until all other participating threads reach that point too. You'll use pthread condition variables, which are a sequence coordination technique similar to xv6's sleep and wakeup.
实现类似屏障的功能。
static void *
thread(void *xa)
{
long n = (long) xa;
long delay;
int i;
for (i = 0; i < 20000; i++) {
int t = bstate.round;
assert (i == t);
barrier();
usleep(random() % 100);
}
return 0;
}
有多个线程会执行 thread
函数,但是每当线程运行到 barrier
处时,需要其在此等待,直到所有线程都执行到这个地方,才能继续运行下去。
3.2 分析
这里主要利用条件变量 pthread_cond_t
和互斥锁 pthread_mutex_t
,这两者通常是相互配合一起使用的。主要针对如下死锁问题:
消费者线程 A 进入临界区,访问 n,A 必须等到 n 大于 0 才能接着往下执行,如果 n== 0,那么 A 将一直等待。
还有一个生产者线程 B,B 进入临界区,修改 n 的值,使得 n >0,当 n > 0 时,B 通知等待 n > 0 的消费者线程A。A 被 B 通知之后就可以接着往下执行了。
问题在于,A 进入临界区之后,B 就无法进入临界区了,因此需要条件变量锁,当条件不满足时,放开进入临界区的锁,直到条件满足,再通知 A 继续执行。
3.3 实现
这里的关键是 pthread_cond_wait
可以释放互斥锁,同时保持阻塞状态进行等待,直到 pthread_cond_broadcast
或者 pthread_cond_signal
将其唤醒,这两个唤醒方式的区别在于,有可能有多个线程在等待该锁,signal 只会唤醒一个线程,而 broadcast 会唤醒所有线程。
static void barrier()
{
// YOUR CODE HERE
//
// Block until all threads have called barrier() and
// then increment bstate.round.
//
static int arrive_cnt = 0;
pthread_mutex_lock(&bstate.barrier_mutex);
arrive_cnt += 1;
if (arrive_cnt != nthread){
// 此处会释放 mutex,因此其他线程可以进入临界区来满足条件,即执行 arrive_cnt+=1
pthread_cond_wait(&bstate.barrier_cond, &bstate.barrier_mutex);
} else {
pthread_cond_broadcast(&bstate.barrier_cond);
bstate.round += 1;
arrive_cnt = 0;
}
pthread_mutex_unlock(&bstate.barrier_mutex);
}