分布式学习之--6.824MITLab1记录
记录这个实验的人也不少了,我就简单记录下Part3和Part4
Part3
func schedule(jobName string, mapFiles []string, nReduce int, phase jobPhase, registerChan chan string) {
var ntasks int
var n_other int // number of inputs (for reduce) or outputs (for map)
switch phase {
case mapPhase:
ntasks = len(mapFiles)
n_other = nReduce
case reducePhase:
ntasks = nReduce
n_other = len(mapFiles)
}
fmt.Printf("Schedule: %v %v tasks (%d I/Os)\n", ntasks, phase, n_other)
// All ntasks tasks have to be scheduled on workers. Once all tasks
// have completed successfully, schedule() should return.
//
// Your code here (Part III, Part IV).
//
var done sync.WaitGroup
// workstate := make(map[int]bool)
for i:=0; i < ntasks; i=i+1 {
// workstate
done.Add(1) //有一个u说明有一个goroutine
go func(i int, registerChan chan string){
u := mapFiles[i]
req := <- registerChan //堵塞了怎么办
state := call(req,"Worker.DoTask", DoTaskArgs{jobName,u,phase,i,n_other},nil)
if state == false {
req2 := <- registerChan
call(req2,"Worker.DoTask", DoTaskArgs{jobName,u,phase,i,n_other},nil)
go func(req string){
registerChan <- req
}(req2)
}
go func(req string){
registerChan <- req
}(req)
done.Done()
}(i,registerChan)
}
//所有完成后才返回,所以要记得用waitgroup
done.Wait()
fmt.Printf("Schedule: %v done\n", phase)
}
part3主要是实现多个任务分配给多个工人,但是工人的数量少于任务数量,一个时间段一个工人只能处理一个任务,所以要变通一点,不能像之前常见的结构一样,让多个工人重复做一件事情了.这里有点坑的地方是,这个函数里使用的channel是没有buffer的,是会发生阻塞的,将工人信息放回channel的时候(意味着此时该工人的工作已经完成,它的状态是闲置的),如果直接用registerChan <- req,则会发生阻塞,因此我参照了一位知友的办法,另起一个协程,完成信息的发送.
Part4
state := call(req,"Worker.DoTask", DoTaskArgs{jobName,u,phase,i,n_other},nil)
if state == false {
req2 := <- registerChan
call(req2,"Worker.DoTask", DoTaskArgs{jobName,u,phase,i,n_other},nil)
go func(req string){
registerChan <- req
}(req2)
}
part4是实现错误解决,总的来说按照文字描述就可以,将这个工作再交付给另一个worker,但是需要注意的是,应该先从channel中接收另一个worker再将原来出问题的worker发送到channel中.(但是我一直好奇如果第二个worker也出错了,不会有问题么?)