Linux系统下使用pytorch多进程读取图片数据时的注意事项——DataLoader的多进程使用注意事项

原文:

PEP 703 – Making the Global Interpreter Lock Optional in CPython

 

 

相关内容:

The GIL Affects Python Library Usability

The GIL is a CPython implementation detail that limits multithreaded parallelism, so it might seem unintuitive to think of it as a usability issue. However, library authors frequently care a great deal about performance and will design APIs that support working around the GIL. These workaround frequently lead to APIs that are more difficult to use. Consequently, users of these APIs may experience the GIL as a usability issue and not just a performance issue.

For example, PyTorch exposes a multiprocessing-based API called DataLoader for building data input pipelines. It uses fork() on Linux because it is generally faster and uses less memory than spawn(), but this leads to additional challenges for users: creating a DataLoader after accessing a GPU can lead to confusing CUDA errors. Accessing GPUs within a DataLoader worker quickly leads to out-of-memory errors because processes do not share CUDA contexts (unlike threads within a process).

 

 

===========================================

 

 

在pytorch中的多进程读取图片的API为DataLoader,该API底层使用python的multiprocessing来实现的,多进程使用的是Linux中的fork()而不是spawn(),因为fork()速度更快并且内存更小,但是fork()的特性导致DataLoader API必须在代码中最早的位置实现,如果该API在CUDA调用之后实现那么该API所生成的所有子进程均会copy一个CUDA context进入到自身的内存空间中,从而造成内存泄露,占用大量内存空间,甚至导致程序因为内存不足而失败。

 

 

 

重点:

pytorch中要在代码最早的位置实现DataLoader的多进程操作。

 

 

 

 

===========================================

posted on   Angry_Panda  阅读(131)  评论(0编辑  收藏  举报

相关博文:
阅读排行:
· TypeScript + Deepseek 打造卜卦网站:技术与玄学的结合
· Manus的开源复刻OpenManus初探
· AI 智能体引爆开源社区「GitHub 热点速览」
· 从HTTP原因短语缺失研究HTTP/2和HTTP/3的设计差异
· 三行代码完成国际化适配,妙~啊~
历史上的今天:
2022-08-04 空调除湿和制冷有什么区别
2017-08-04 基准报价计算程序(已知其它公司报价 真实案例)

导航

< 2025年3月 >
23 24 25 26 27 28 1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31 1 2 3 4 5

统计

点击右上角即可分享
微信分享提示