踩坑日志2:dataloader的num_workers问题

当我想用dataloader多开子进程加快图片加载速度时,发现报有关进程的错误:

RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

        To fix this issue, refer to the "Safe importing of main module"
        section in https://docs.python.org/3/library/multiprocessing.html

原因:我使用的系统是Windows,与Linux系统在子进程底层操作上还有些不同。Windows是将脚本里所有东西都重新执行一次,所必需将与创建子进程的相关代码放在if __name__ == '__main__'中。

此外仅仅将部分代码放在if __name__ == '__main__'还不够,发现:如果__name__外面还有其他非定义类、非定义函数的代码,则有多少个num_workers就会重复执行多少次代码。其原因也是因为Windows创建进程的方式是重新运行主脚本。

 总结:如果要使用多进程,必须将创建多进程的代码放在if __name__ == '__main__'的保护之下,此外为了防止重复执行主脚本的代码,必须将主脚本中除定义类、函数以外的代码也放进去。

 

posted @ 2024-09-15 12:12  Dr.Joker(月月安康)  阅读(40)  评论(0编辑  收藏  举报