Akshay ๐ (@akshay_pachaar)
2025-02-02 | โค๏ธ 327 | ๐ 49
Make data loading in PyTorch 5x faster!
Typically, while training on GPUs:
- .to(device) transfers the data to the GPU.
- And everything executes on the GPU afterward.
This means when the GPU is busy, the CPU is idle, and vice versa. However, you can optimize this with memory pinning.
Hereโs how it works:
โ As the model trains on the first mini-batch, the CPU can transfer the second mini-batch to the GPU.
โ This ensures that the GPU doesnโt have to wait for the next mini-batch once it completes processing the current one.
Enabling this is super simple:
- Set pin_memory=True in the DataLoader object.
- During data transfer, use: .to(device, non_blocking=True)
Additionally, specify num_workers in the DataLoader object for further optimization.
The speedup is clear from the image below.
Let me know in the comments what training optimisations have you done and are lesser known.
Find me โ @akshay_pachaar โ๏ธ For more insights and tutorials on AI and Machine Learning!
๋ฏธ๋์ด

๐ Related
Auto-generated - needs manual review