6.824的MapReduce功能实现
为了熟悉go语言,顺便了解一下MapReduce,花一个上午将go的调试功能配置好,下午将MIT的6.824课程的MapReduce功能实现了一遍,一次就把全部案例跑通过了,有点出乎意料的好。
- 准备工作
配置go的调试功能dlv,刚开始使用默认的mode auto,实际使用的是debug,无论怎样都成功不了。
launch.json:
{
"name": "SingleMr",
"type": "go",
"request": "launch",
"mode": "auto",
"cwd": "${workspaceFolder}/src/main",
"program": "${workspaceFolder}/src/main/mrsequential.go",
"args": ["wc.so", "pg-being_ernest.txt", "pg-dorian_gray.txt", "pg-frankenstein.txt", "pg-grimm.txt", "pg-huckleberry_finn.txt", "pg-metamorphosis.txt", "pg-sherlock_holmes.txt", "pg-tom_sawyer.txt"],
"preLaunchTask": "wc.so"
},
tasks.json
{
"label": "wc.so",
"type": "go",
"command": "build",
"args": [
"-race",
"-gcflags=all=-N -l",
"-buildmode=plugin",
"../mrapps/wc.go"
],
"options": {"cwd": "${workspaceFolder}/src/main"},
"problemMatcher": [
"$go"
],
"presentation": {
"echo": true,
"reveal": "silent",
"focus": false,
"panel": "shared",
"showReuseMessage": false,
"clear": true
},
"group": "build",
},
报这样的错误:
cannot load plugin wc.so, err detail: plugin.Open("wc"): plugin was built with a different version of package runtime/internal/sys
这个是tasks.json中的任务build添加了 "-gcflags=all=-N -l"
选项的效果,说是dlv debug时编译会自动添加这个参数,所以,对应的plugin .so的编译也要添加同样的参数,但实际测试下来没有生效(版本的go 1.23.0, dlv go1.23.0)。
如果把这个选项去掉,又会报这个错误:
cannot load plugin wc.so, err detail: plugin.Open("wc"): plugin was built with a different version of package internal/abi
只好将dlv的mode切换成exec模式,然后,在tasks中编译wc.so和mrsequential.go,然后直接debug编译好的二进制执行程序。这种方式可以正常工作,完整的配置(包括单个执行和本任务多worker和coordinator)如下:
launch.json:
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "SingleMr",
"type": "go",
"request": "launch",
"mode": "exec",
"cwd": "${workspaceFolder}/src/main",
"program": "${workspaceFolder}/src/main/mrsequential",
"args": ["wc.so", "pg-being_ernest.txt", "pg-dorian_gray.txt", "pg-frankenstein.txt", "pg-grimm.txt", "pg-huckleberry_finn.txt", "pg-metamorphosis.txt", "pg-sherlock_holmes.txt", "pg-tom_sawyer.txt"],
"preLaunchTask": "mrsequential"
},
{
"name": "MultiWorker",
"type": "go",
"request": "launch",
"mode": "exec",
"cwd": "${workspaceFolder}/src/main",
"program": "${workspaceFolder}/src/main/mrworker",
"args": ["wc.so"],
"preLaunchTask": "mrworker",
"console": "integratedTerminal"
},
{
"name": "MultiCoordinator",
"type": "go",
"request": "launch",
"mode": "exec",
"cwd": "${workspaceFolder}/src/main",
"program": "${workspaceFolder}/src/main/mrcoordinator",
"args": ["pg-being_ernest.txt", "pg-dorian_gray.txt", "pg-frankenstein.txt", "pg-grimm.txt", "pg-huckleberry_finn.txt", "pg-metamorphosis.txt", "pg-sherlock_holmes.txt", "pg-tom_sawyer.txt"],
"preLaunchTask": "mrcoordinator"
}
]
}
tasks.json:
{
"version": "2.0.0",
"tasks": [
{
"label": "wc.so",
"type": "go",
"command": "build",
"args": [
"-race",
"-gcflags=all=-N -l",
"-buildmode=plugin",
"../mrapps/wc.go"
],
"options": {"cwd": "${workspaceFolder}/src/main"},
"problemMatcher": [
"$go"
],
"presentation": {
"echo": true,
"reveal": "silent",
"focus": false,
"panel": "shared",
"showReuseMessage": false,
"clear": true
},
"group": "build",
},
{
"label": "mrsequential",
"type": "go",
"command": "build",
"args": [
"-race",
"-gcflags=all=-N -l",
"./mrsequential.go"
],
"options": {"cwd": "${workspaceFolder}/src/main"},
"problemMatcher": [
"$go"
],
"presentation": {
"echo": true,
"reveal": "silent",
"focus": false,
"panel": "shared",
"showReuseMessage": false,
"clear": true
},
"group": "build",
"dependsOn": ["wc.so"]
},
{
"label": "mrworker",
"type": "go",
"command": "build",
"args": [
"-race",
"-gcflags=all=-N -l",
"./mrworker.go"
],
"options": {"cwd": "${workspaceFolder}/src/main"},
"problemMatcher": [
"$go"
],
"presentation": {
"echo": true,
"reveal": "silent",
"focus": false,
"panel": "shared",
"showReuseMessage": false,
"clear": true
},
"group": "build",
"dependsOn": ["wc.so"]
},
{
"label": "mrcoordinator",
"type": "go",
"command": "build",
"args": [
"-race",
"-gcflags=all=-N -l",
"./mrcoordinator.go"
],
"options": {"cwd": "${workspaceFolder}/src/main"},
"problemMatcher": [
"$go"
],
"presentation": {
"echo": true,
"reveal": "silent",
"focus": false,
"panel": "shared",
"showReuseMessage": false,
"clear": true
},
"group": "build",
// "dependsOn": ["wc.so"]
},
]
}
这样就可以比较愉快的debug程序了。多worker的时候,可以先启动coordinator,然后启动多个worker进行debug。
- 开发的一些体会
整个开发过程挺快,就是刚开始理解整个意图比较费时间,以为coordinator是个intermediate模式,所有的worker Map之后的结果要返回到coordinator,然后,coordinator再分配给worker进行reduce。这样的话,网络的通讯会比较繁重,但是不需要中间存储文件,coordinator的内存压力会非常大。当看到主页描述中关于中间文件使用名称mr-X-Y时,不知道为什么要有一个这样的中间文件。后面又看到使用ihash(key)函数时,想到应该是在worker端使用ihash(key) % nReduce的方式进行reduce工作的负载。这样就比较好理解了,首先将所有的txt文件进行负载,由worker主动去请求分配文件,简单点,每次请求一个文件,然后,worker拿到文件之后进行Map,得到[]KeyValue数组之后,按照key的ihash % nReduce的负载号Y以及分配到文件对应的序号X,将[]KeyValue数组分散到mr-X-Y文件中。完成之后,反馈给coordinator完成的状态。当coordinator检测到所有的txt文件全部Map完成之后,就切换到Reduce阶段,worker检测到阶段切换成Reduce之后,请求Reduce的index,从0到nReduce-1之间,也就是对应mr-X-Y中的y,y是0到nReduce-1中的数字,然后,根据这个y将所有的mr-X-y对应的文件找出来,进行合并,得到中间聚合结果,再将这个结果通过Reduce函数得到最终结果,并将结果存储在mr-out-y中。并将完成的状态反馈给coordinator。coordinator检测到所有的Reduce任务完成之后,整个过程全部结束。
这样需要考虑的是,worker可能会宕机退出,所以,需要worker完成所有工作之后,将状态反馈给coordinator进行确认之后,才算真正的完成。否则,即使完成了,在反馈给coordinator的时候宕机了,coordinator也认为工作没有完成,将通过超时时间,比如10秒来检测到。然后,由其他的worker重新认领任务之后,重复执行,并将原来的mr-X-Y文件覆写掉,是可以幂等重复执行的。这样,就实现了一个完整的高可靠的MapReduce程序。这里worker检测Map和Reduce阶段的完成情况是采用sleep 1s,不断检测来完成的,会有一点点的网络调用和耗时增加。可以通过减小间隔时间来降低延迟。或者使用更复杂的多线程信号控制来消除这个延迟,但那个复杂度会增加非常多,对于MapReduce应对超多数据规模来讲,0.几秒的延迟微不足道,带来的收益不是很明显,暂不做这个优化。如果仅仅是为了练习一下多线程的编程技巧可以试试。