Syncthing源码解析 - 启动过程
我相信很多朋友会认为启动就是双击一下Syncthing程序图标,随后就启动完毕了!如果这样认为,对,也不对!对,是因为的确是这样操作,启动了Syncthing;不对是因为在调试Syncthing启动过程时发现很奇怪!有哪些奇怪的地方呢?
一,Syncthing启动的奇怪问题。
1,启动时候总是运行"monitorMain(options)"函数,另外一个函数"syncthingMain(options)"从来没有运行过?
2,启动log前缀好几种,第三种log前缀不知道什么意思?也不知道包含这些前缀的log信息从哪里来的?
3,每次Syncthing启动完毕,通过任务管理器查看,都有两个正在运行的Syncthing,怎么同时有两个Syncthing进程?
下面是上述问题的各个截图:
问题1截图:
问题2截图:
问题3截图:
二,分析和解决Syncthing启动的奇怪问题。
我用Gogland调试时候发现,Syncthing启动总是进入"monitorMain(options)"函数,另外一个函数"syncthingMain(options)"从来没有进入过,难道另外一个函数彻底没用?没用还保留干什么?先不管这些了,我先找那些log前缀在哪里吧?
(1),[monitor]前缀,最终在monitor.go文件中的"monitorMain(options)"函数中发现,位于39行处。具体代码:
l.SetPrefix("[monitor] ")
(2),[start]前缀,没能在"monitorMain(options)"函数中找到!
(3),[NTUJ4]前缀,也没能在"monitorMain(options)"函数中找到!
难道这两个前缀的log信息,不是"monitorMain(options)"函数输出的?如果不是那会是哪个函数呢?经过调试发现,在"monitorMain(options)"函数中输出的最后一条log是:
[monitor] 17:34:20 INFO: Starting syncthing
继续调试,最终也没有发现剩余前缀log在哪里产生?但是确实千真万确输出这些前缀的log了呀!!!
先不管上面问题了,我们还是去看看"syncthingMain(options)"函数吧,它真的就是一个多余的函数,彻底没用?为了测试这个函数是否有用,我在这个函数最开始代码处写入:
l.Infoln("这是我的log信息,是否会出现呢?")
Syncthing默认log全部是英文,我写的log信息是中文,如果这个函数被调用,那么我一眼就能看出来我写的log信息!!立即运行Syncthing,奇迹出现了,居然log信息中有我写的测试log信息!!!
"syncthingMain(options)"函数并不是没有用啊?我写的中文测试log居然在运行时输出的log中了!!其它log信息都在我写的中文log之后,难道剩余log信息都是它输出的?在这个函数中查找代码:
l.SetPrefix
最终发现两处:
(1),
l.SetPrefix("[start] ")
这个前缀不就是上面我没能找到的一个吗?
(2),
l.SetPrefix(fmt.Sprintf("[%s] ", myID.String()[:5]))
难道这是剩余的那个log前缀?从这段代码理解,应该是取当前设备ID的前5个字符作为log前缀!
为了验证我的判断,我把这两个log前缀都修改了
l.SetPrefix("[syncthingMain][start] ") l.SetPrefix("[syncthingMain] ")
修改完毕后,运行一下Syncthing,log前缀确实全变成我修改的了,我的判断完全正确!!!
经过测试发现,包含"[start]"前缀的log,只有在Syncthing第一次运行时候出现,系统当中从来没有运行过Syncthing才出现,如果运行过,删除先前的配置文件后也会出现,因为那样再次运行Syncthing,等于从来没有在此电脑中运行过Syncthing!
到现在为止,我也不知道"syncthingMain(options)"函数是怎么运行起来的,但是可以肯定它一定是运行了,从它输出的log来判断,它应该是负责启动之后的Syncthing各项服务!!!偶然通过任务管理器发现,Syncthing运行后,居然有两个正着运行的Syncthing,怎么会有两个进程?一个使用内存小;一个使用内存稍大(和前面的进程使用内存比较)!先不管了,我先终止一下这两个进程吧,继续调试Syncthing!奇怪事情又发生了,如果终止内存稍大的Syncthing,立即会再次运行和它一样的进程,就是杀不死!!!如果我终止使用内存小的Syncthing进程,另外一个内存稍大的进程依然存在,而且Syncthing的各项功能均运行正常!!!难道这两个进程一个是监控进程;一个是服务进程?
(1),监控进程:使用内存小,负责监控服务进程运行状况。
(2),服务进程:使用内存稍大,为用户提供Syncthing各项服务。
我们还是认真来看看源代码吧!
首先看monitor.go文件中的"monitorMain(options)"函数,经过调试我最终确定,是这个函数启动和监控Syncthing的服务进程!!!
(1),启动Syncthing的服务进程:在"monitorMain(options)"函数中,有一个for的死循环,在这个for代码前面部分,有一段是启动Syncthing的服务进程:
cmd := exec.Command(args[0], args[1:]...) stderr, err := cmd.StderrPipe() if err != nil { l.Fatalln("stderr:", err) } stdout, err := cmd.StdoutPipe() if err != nil { l.Fatalln("stdout:", err) } l.Infoln("Starting syncthing") err = cmd.Start() if err != nil { l.Fatalln(err) }
(2),监控Syncthing的服务进程:上面那个for死循环,如果毫无节制,那系统肯定受不了啊?其实它是隔一段时间才执行一次的!看这个for循环最后一行的代码!
time.Sleep(1 * time.Second)
这行代码是休眠指定的时间,目前是一秒钟!!难怪Syncthing的服务进程被杀死,马上就会有另外一个Syncthing的服务进程运行起来,这样的设计,我一辈子也杀不死这个Syncthing的服务进程啊!!!!通过调整这个休眠时间测试,最终确定,这个地方就是控制监控Syncthing的服务进程的间隔时间代码!!!!
到此,我可以肯定了,Syncthing启动时候,会先后启动两个进程:
(1),监控进程:
- 最早运行。
- 调用"monitorMain(options)"函数。
- 使用内存小。
- 负责监控服务进程运行状况。
(2),服务进程:使用内存稍大,为用户提供Syncthing各项服务。
- 被监控进程启动。
- 调用"syncthingMain(options)"函数。
- 使用内存稍大。
- 为用户提供Syncthing的各项服务。
继续调试"syncthingMain(options)"函数,证实我的判断是正确的,到此Syncthing的启动就算搞明白了!!
以后再调试Syncthing,不需要按照现在逻辑运行了,注释掉main.go中main函数末尾代码:
if innerProcess || options.noRestart { syncthingMain(options) } else { monitorMain(options) }
把上面代码替换成:
syncthingMain(options)
这样我们就可以去掉监控进程,直接调试Syncthing服务进程了!!
最后还有一个问题,为什么Syncthing的监控进程里显示Syncthing服务进程输出的log?这是用GO语言的channel来完成的!!!
三,我调试时候,加入的注释的代码。
注释方式:采用TODO代码方式,这样便于在Gogland的TODO里面立即看到我自己加入的注释或修改代码!
注释格式:
TODO:[sunylat][begin或end]注释或修改描述信息
[begin或end]为可选项,表示需要相互协作才能工作代码的开始和结束处。
这个TODO可以用"//"和"/**/"两种注释方式包含。
(1),monitor.go文件中的"monitorMain(options)"函数:
/* TODO:[sunylat] 运行和监控提供真正服务syncthing的入口函数 */ func monitorMain(runtimeOptions RuntimeOptions) { os.Setenv("STMONITORED", "yes") l.SetPrefix("[monitor] ") var dst io.Writer = os.Stdout logFile := runtimeOptions.logFile if logFile != "-" { var fileDst io.Writer = newAutoclosedFile(logFile, logFileAutoCloseDelay, logFileMaxOpenTime) if runtime.GOOS == "windows" { // Translate line breaks to Windows standard fileDst = osutil.ReplacingWriter{ Writer: fileDst, From: '\n', To: []byte{'\r', '\n'}, } } // Log to both stdout and file. dst = io.MultiWriter(dst, fileDst) l.Infof(`Log output saved to file "%s"`, logFile) } args := os.Args var restarts [countRestarts]time.Time stopSign := make(chan os.Signal, 1) sigTerm := syscall.Signal(15) signal.Notify(stopSign, os.Interrupt, sigTerm) restartSign := make(chan os.Signal, 1) sigHup := syscall.Signal(1) signal.Notify(restartSign, sigHup) /* TODO:[sunylat] 这是个无限循环,负责监控提供服务syncthing运行状况,同时负责处理程序的退出等事件处理, 监控间隔时间由这个for循环最后面的休眠时间决定 */ for { if t := time.Since(restarts[0]); t < loopThreshold { l.Warnf("%d restarts in %v; not retrying further", countRestarts, t) os.Exit(exitError) } copy(restarts[0:], restarts[1:]) restarts[len(restarts)-1] = time.Now() //TODO:[sunylat] [begin] 启动提供服务的syncthing ---------------------- cmd := exec.Command(args[0], args[1:]...) stderr, err := cmd.StderrPipe() if err != nil { l.Fatalln("stderr:", err) } stdout, err := cmd.StdoutPipe() if err != nil { l.Fatalln("stdout:", err) } l.Infoln("Starting syncthing") //TODO:[sunylat] 这条语句真正执行启动提供服务的syncthing err = cmd.Start() if err != nil { l.Fatalln(err) } //TODO:[sunylat] [end] 启动提供服务的syncthing -------------------------- // Let the next child process know that this is not the first time // it's starting up. os.Setenv("STRESTART", "yes") stdoutMut.Lock() stdoutFirstLines = make([]string, 0, 10) stdoutLastLines = make([]string, 0, 50) stdoutMut.Unlock() wg := sync.NewWaitGroup() wg.Add(1) go func() { copyStderr(stderr, dst) wg.Done() }() wg.Add(1) go func() { copyStdout(stdout, dst) wg.Done() }() exit := make(chan error) go func() { wg.Wait() exit <- cmd.Wait() }() select { case s := <-stopSign: l.Infof("Signal %d received; exiting", s) cmd.Process.Kill() <-exit return case s := <-restartSign: l.Infof("Signal %d received; restarting", s) cmd.Process.Signal(sigHup) err = <-exit case err = <-exit: if err == nil { // Successful exit indicates an intentional shutdown return } else if exiterr, ok := err.(*exec.ExitError); ok { if status, ok := exiterr.Sys().(syscall.WaitStatus); ok { switch status.ExitStatus() { case exitUpgrading: // Restart the monitor process to release the .old // binary as part of the upgrade process. l.Infoln("Restarting monitor...") os.Setenv("STNORESTART", "") if err = restartMonitor(args); err != nil { l.Warnln("Restart:", err) } return } } } } l.Infoln("Syncthing exited:", err) //TODO:[sunylat] 这个休眠时间决定了监控的间隔时间! time.Sleep(1 * time.Second) } }
(2),main.go中的"syncthingMain(options)"函数:
/* TODO:[sunylat] 提供真正服务的syncthing入口函数 */ func syncthingMain(runtimeOptions RuntimeOptions) { //l.Infoln("[syncthingMain] 这是我的测试信息!") setupSignalHandling() // Create a main service manager. We'll add things to this as we go along. // We want any logging it does to go through our log system. //TODO:[sunylat] [begin] supervisor创建 ---------------------- mainService := suture.New("main", suture.Spec{ Log: func(line string) { l.Debugln(line) }, }) mainService.ServeBackground() // Set a log prefix similar to the ID we will have later on, or early log // lines look ugly. //l.SetPrefix("[start] ") //TODO:[sunylat] 我修改的启动时候显示的log头,加入了[syncthingMain] l.SetPrefix("[syncthingMain][start] ") if runtimeOptions.auditEnabled { startAuditing(mainService, runtimeOptions.auditFile) } if runtimeOptions.verbose { mainService.Add(newVerboseService()) } errors := logger.NewRecorder(l, logger.LevelWarn, maxSystemErrors, 0) systemLog := logger.NewRecorder(l, logger.LevelDebug, maxSystemLog, initialSystemLog) // Event subscription for the API; must start early to catch the early // events. The LocalChangeDetected event might overwhelm the event // receiver in some situations so we will not subscribe to it here. apiSub := events.NewBufferedSubscription(events.Default.Subscribe(events.AllEvents&^events.LocalChangeDetected&^events.RemoteChangeDetected), 1000) diskSub := events.NewBufferedSubscription(events.Default.Subscribe(events.LocalChangeDetected|events.RemoteChangeDetected), 1000) if len(os.Getenv("GOMAXPROCS")) == 0 { runtime.GOMAXPROCS(runtime.NumCPU()) } // Attempt to increase the limit on number of open files to the maximum // allowed, in case we have many peers. We don't really care enough to // report the error if there is one. osutil.MaximizeOpenFileLimit() // Ensure that that we have a certificate and key. cert, err := tls.LoadX509KeyPair(locations[locCertFile], locations[locKeyFile]) if err != nil { l.Infof("Generating ECDSA key and certificate for %s...", tlsDefaultCommonName) cert, err = tlsutil.NewCertificate(locations[locCertFile], locations[locKeyFile], tlsDefaultCommonName, bepRSABits) if err != nil { l.Fatalln(err) } } myID = protocol.NewDeviceID(cert.Certificate[0]) //这里把调试信息头设置为当前设备ID的一部分 //l.SetPrefix(fmt.Sprintf("[%s] ", myID.String()[:5])) //TODO:[sunylat] 修改log头为[syncthingMain] l.SetPrefix("[syncthingMain] ") //打印syncthing长版本 l.Infoln(LongVersion) l.Infoln("My ID:", myID) sha256.SelectAlgo() sha256.Report() perfWithWeakHash := cpuBench(3, 150*time.Millisecond, true) l.Infof("Hashing performance with weak hash is %.02f MB/s", perfWithWeakHash) perfWithoutWeakHash := cpuBench(3, 150*time.Millisecond, false) l.Infof("Hashing performance without weak hash is %.02f MB/s", perfWithoutWeakHash) // Emit the Starting event, now that we know who we are. events.Default.Log(events.Starting, map[string]string{ "home": baseDirs["config"], "myID": myID.String(), }) cfg := loadOrCreateConfig() if err := checkShortIDs(cfg); err != nil { l.Fatalln("Short device IDs are in conflict. Unlucky!\n Regenerate the device ID of one of the following:\n ", err) } if len(runtimeOptions.profiler) > 0 { go func() { l.Debugln("Starting profiler on", runtimeOptions.profiler) runtime.SetBlockProfileRate(1) err := http.ListenAndServe(runtimeOptions.profiler, nil) if err != nil { l.Fatalln(err) } }() } // The TLS configuration is used for both the listening socket and outgoing // connections. tlsCfg := &tls.Config{ Certificates: []tls.Certificate{cert}, NextProtos: []string{bepProtocolName}, ClientAuth: tls.RequestClientCert, SessionTicketsDisabled: true, InsecureSkipVerify: true, MinVersion: tls.VersionTLS12, CipherSuites: []uint16{ 0xCCA8, // TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305, Go 1.8 0xCCA9, // TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305, Go 1.8 tls.TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384, tls.TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384, tls.TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256, tls.TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256, tls.TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA, tls.TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA, tls.TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA, tls.TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA, }, } opts := cfg.Options() if !opts.SymlinksEnabled { symlinks.Supported = false } if opts.WeakHashSelectionMethod == config.WeakHashAuto { if perfWithoutWeakHash*0.8 > perfWithWeakHash { l.Infof("Weak hash disabled, as it has an unacceptable performance impact.") weakhash.Enabled = false } else { l.Infof("Weak hash enabled, as it has an acceptable performance impact.") weakhash.Enabled = true } } else if opts.WeakHashSelectionMethod == config.WeakHashNever { l.Infof("Disabling weak hash") weakhash.Enabled = false } else if opts.WeakHashSelectionMethod == config.WeakHashAlways { l.Infof("Enabling weak hash") weakhash.Enabled = true } if (opts.MaxRecvKbps > 0 || opts.MaxSendKbps > 0) && !opts.LimitBandwidthInLan { lans, _ = osutil.GetLans() for _, lan := range opts.AlwaysLocalNets { _, ipnet, err := net.ParseCIDR(lan) if err != nil { l.Infoln("Network", lan, "is malformed:", err) continue } lans = append(lans, ipnet) } networks := make([]string, len(lans)) for i, lan := range lans { networks[i] = lan.String() } l.Infoln("Local networks:", strings.Join(networks, ", ")) } /* TODO:[sunylat] 打开本地数据库操作 */ //TODO:[sunylat] 获取本地数据库全路径名称 dbFile := locations[locDatabase] //TODO:[sunylat] 打开本地数据库 ldb, err := db.Open(dbFile) if err != nil { l.Fatalln("Cannot open database:", err, "- Is another copy of Syncthing already running?") } if runtimeOptions.resetDeltaIdxs { l.Infoln("Reinitializing delta index IDs") ldb.DropDeltaIndexIDs() } protectedFiles := []string{ locations[locDatabase], locations[locConfigFile], locations[locCertFile], locations[locKeyFile], } // Remove database entries for folders that no longer exist in the config folders := cfg.Folders() for _, folder := range ldb.ListFolders() { if _, ok := folders[folder]; !ok { l.Infof("Cleaning data for dropped folder %q", folder) db.DropFolder(ldb, folder) } } if cfg.RawCopy().OriginalVersion == 15 { // The config version 15->16 migration is about handling ignores and // delta indexes and requires that we drop existing indexes that // have been incorrectly ignore filtered. ldb.DropDeltaIndexIDs() } //TODO:[sunylat] 创建模型 m := model.NewModel(cfg, myID, myDeviceName(cfg), "syncthing", Version, ldb, protectedFiles) if t := os.Getenv("STDEADLOCKTIMEOUT"); len(t) > 0 { it, err := strconv.Atoi(t) if err == nil { m.StartDeadlockDetector(time.Duration(it) * time.Second) } } else if !IsRelease || IsBeta { m.StartDeadlockDetector(20 * time.Minute) } if runtimeOptions.unpaused { setPauseState(cfg, false) } else if runtimeOptions.paused { setPauseState(cfg, true) } // Add and start folders for _, folderCfg := range cfg.Folders() { if folderCfg.Paused { continue } m.AddFolder(folderCfg) m.StartFolder(folderCfg.ID) } mainService.Add(m) // Start discovery cachedDiscovery := discover.NewCachingMux() mainService.Add(cachedDiscovery) // Start connection management connectionsService := connections.NewService(cfg, myID, m, tlsCfg, cachedDiscovery, bepProtocolName, tlsDefaultCommonName, lans) mainService.Add(connectionsService) if cfg.Options().GlobalAnnEnabled { for _, srv := range cfg.GlobalDiscoveryServers() { l.Infoln("Using discovery server", srv) gd, err := discover.NewGlobal(srv, cert, connectionsService) if err != nil { l.Warnln("Global discovery:", err) continue } // Each global discovery server gets its results cached for five // minutes, and is not asked again for a minute when it's returned // unsuccessfully. cachedDiscovery.Add(gd, 5*time.Minute, time.Minute, globalDiscoveryPriority) } } if cfg.Options().LocalAnnEnabled { // v4 broadcasts bcd, err := discover.NewLocal(myID, fmt.Sprintf(":%d", cfg.Options().LocalAnnPort), connectionsService) if err != nil { l.Warnln("IPv4 local discovery:", err) } else { cachedDiscovery.Add(bcd, 0, 0, ipv4LocalDiscoveryPriority) } // v6 multicasts mcd, err := discover.NewLocal(myID, cfg.Options().LocalAnnMCAddr, connectionsService) if err != nil { l.Warnln("IPv6 local discovery:", err) } else { cachedDiscovery.Add(mcd, 0, 0, ipv6LocalDiscoveryPriority) } } // GUI setupGUI(mainService, cfg, m, apiSub, diskSub, cachedDiscovery, connectionsService, errors, systemLog, runtimeOptions) if runtimeOptions.cpuProfile { f, err := os.Create(fmt.Sprintf("cpu-%d.pprof", os.Getpid())) if err != nil { log.Fatal(err) } pprof.StartCPUProfile(f) } for _, device := range cfg.Devices() { if len(device.Name) > 0 { l.Infof("Device %s is %q at %v", device.DeviceID, device.Name, device.Addresses) } } // Candidate builds always run with usage reporting. if IsCandidate { l.Infoln("Anonymous usage reporting is always enabled for candidate releases.") opts.URAccepted = usageReportVersion // Unique ID will be set and config saved below if necessary. } if opts.URAccepted > 0 && opts.URAccepted < usageReportVersion { l.Infoln("Anonymous usage report has changed; revoking acceptance") opts.URAccepted = 0 opts.URUniqueID = "" cfg.SetOptions(opts) } if opts.URAccepted >= usageReportVersion && opts.URUniqueID == "" { // Generate and save a new unique ID if it is missing. opts.URUniqueID = rand.String(8) cfg.SetOptions(opts) cfg.Save() } // The usageReportingManager registers itself to listen to configuration // changes, and there's nothing more we need to tell it from the outside. // Hence we don't keep the returned pointer. newUsageReportingManager(cfg, m) if opts.RestartOnWakeup { go standbyMonitor() } // Candidate builds should auto upgrade. Make sure the option is set, // unless we are in a build where it's disabled or the STNOUPGRADE // environment variable is set. if IsCandidate && !upgrade.DisabledByCompilation && !noUpgradeFromEnv { l.Infoln("Automatic upgrade is always enabled for candidate releases.") if opts.AutoUpgradeIntervalH == 0 || opts.AutoUpgradeIntervalH > 24 { opts.AutoUpgradeIntervalH = 12 } // We don't tweak the user's choice of upgrading to pre-releases or // not, as otherwise they cannot step off the candidate channel. } if opts.AutoUpgradeIntervalH > 0 { if noUpgradeFromEnv { l.Infof("No automatic upgrades; STNOUPGRADE environment variable defined.") } else { go autoUpgrade(cfg) } } events.Default.Log(events.StartupComplete, map[string]string{ "myID": myID.String(), }) cleanConfigDirectory() code := <-stop //TODO:[sunylat] [end] supervisor结束 ---------------------- mainService.Stop() l.Infoln("Exiting") if runtimeOptions.cpuProfile { pprof.StopCPUProfile() } os.Exit(code) }
提示:在"syncthingMain(options)"函数使用了第三方的库,使用的代码:
mainService := suture.New("main", suture.Spec{ Log: func(line string) { l.Debugln(line) }, }) mainService.ServeBackground()
第三方库的官方地址:
https://github.com/thejerf/suture
这是用GO语言写的Erlang语言中的supervisor trees功能,应该是很有用,值得我们关注!
Erlang supervisor trees 中文参考:
http://diaocow.iteye.com/blog/1762895