Linux系统下使用pytorch多进程读取图片数据时的注意事项——DataLoader的多进程使用注意事项
原文:
PEP 703 – Making the Global Interpreter Lock Optional in CPython
相关内容:
The GIL Affects Python Library Usability
The GIL is a CPython implementation detail that limits multithreaded parallelism, so it might seem unintuitive to think of it as a usability issue. However, library authors frequently care a great deal about performance and will design APIs that support working around the GIL. These workaround frequently lead to APIs that are more difficult to use. Consequently, users of these APIs may experience the GIL as a usability issue and not just a performance issue.
For example, PyTorch exposes a multiprocessing-based API called DataLoader
for building data input pipelines. It uses fork()
on Linux because it is generally faster and uses less memory than spawn()
, but this leads to additional challenges for users: creating a DataLoader
after accessing a GPU can lead to confusing CUDA errors. Accessing GPUs within a DataLoader
worker quickly leads to out-of-memory errors because processes do not share CUDA contexts (unlike threads within a process).
===========================================
在pytorch中的多进程读取图片的API为DataLoader
,该API底层使用python的multiprocessing来实现的,多进程使用的是Linux中的fork()而不是spawn(),因为fork()速度更快并且内存更小,但是fork()的特性导致DataLoader API必须在代码中最早的位置实现,如果该API在CUDA调用之后实现那么该API所生成的所有子进程均会copy一个CUDA context进入到自身的内存空间中,从而造成内存泄露,占用大量内存空间,甚至导致程序因为内存不足而失败。
重点:
pytorch中要在代码最早的位置实现DataLoader的多进程操作。
===========================================
posted on 2023-08-04 07:37 Angry_Panda 阅读(131) 评论(0) 编辑 收藏 举报
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· TypeScript + Deepseek 打造卜卦网站:技术与玄学的结合
· Manus的开源复刻OpenManus初探
· AI 智能体引爆开源社区「GitHub 热点速览」
· 从HTTP原因短语缺失研究HTTP/2和HTTP/3的设计差异
· 三行代码完成国际化适配,妙~啊~
2022-08-04 空调除湿和制冷有什么区别
2017-08-04 基准报价计算程序(已知其它公司报价 真实案例)