- When
num_workers>0, only these workers will retrieve data, main process won't. So whennum_workers=2you have at most 2 workers simultaneously putting data into RAM, not 3. - Well our CPU can usually run like 100 processes without trouble and these worker processes aren't special in anyway, so having more workers than cpu cores is ok. But is it efficient? it depends on how busy your cpu cores are for other tasks, speed of cpu, speed of your hard disk etc. In short, its complicated, so setting workers to number of cores is a good rule of thumb, nothing more.
- Nope. Remember
DataLoaderdoesn't just randomly return from what's available in RAM right now, it usesbatch_samplerto decide which batch to return next. Each batch is assigned to a worker, and main process will wait until the desired batch is retrieved by assigned worker.
Lastly to clarify, it isn't DataLoader's job to send anything directly to GPU, you explicitly call cuda() for that.
EDIT: Don't call cuda() inside Dataset's __getitem__() method, please look at @psarka's comment for the reasoning
A clear explanation of what num_workers=0 means for a DataLoader
Hello, the pytorch documentation it says that setting num_workers=0 for a DataLoader causes it to be handled by the “main process” from the pytorch doc: " 0 means that the data will be loaded in the main process." maybe i’m wrong but usually i find that the pytorch doc gives often (but ... More on discuss.pytorch.org
DataLoader persistent_workers Usage
Hello, I’m trying to better understand the operation of the persistent_workers option for DataLoader. My understanding is that the dataloader will not stop the worker processes that have been consuming the dataset after you stop consuming from it. To me this implies that it will save the ... More on discuss.pytorch.org
Communicating with Dataloader workers
Hey, I am having some issues with how the dataloader works when multiple workers are used. In my dataset, I resize the images to the input dimensions of the network. I am training a fully convolutional network and I can thus change the input dimension of my network in order to make it more ... More on discuss.pytorch.org
number of workers of data loader for reading data from HDD
The number of workers are the processes used to "get the minibatches ready" for your training loop. If you have multiple workers, minibatches can be loaded in parallel. So this has nothing to do with your model's accuracy/performance, but more with the time your model needs to train. Since the workers have to be coordinated, too many workers will actually slow you down. This is probably dependent on your individual setup. In my experience, 4-7 workers are fine - but you can just test this by timing your training for a few epochs. More on reddit.com
Videos
10:16
PyTorch DataLoaders Overview and Examples (batch_size, shuffle, ...
06:32
6. Dataloader in PyTorch - YouTube
PyTorch DataLoader Source Code - Debugging Session - deeplizard
06:38
PyTorch DataLoader num_workers - Deep Learning Speed Limit Increase ...
PyTorch DataLoader num_workers - Deep Learning Speed Limit Increase ...
06:41
PyTorch Lecture 08: PyTorch DataLoader - YouTube
PyTorch Forums
discuss.pytorch.org › t › a-clear-explanation-of-what-num-workers-0-means-for-a-dataloader › 177614
A clear explanation of what num_workers=0 means for a DataLoader - PyTorch Forums
April 15, 2023 - Hello, the pytorch documentation it says that setting num_workers=0 for a DataLoader causes it to be handled by the “main process” from the pytorch doc: " 0 means that the data will be loaded in the main process." ma…
AWS
docs.aws.amazon.com › codeguru › detector-library › python › pytorch-data-loader-with-multiple-workers
Pytorch data loader with multiple workers | Amazon Q, Detector Library
Using DataLoader with num_workers greater than 0 can cause increased memory consumption over time when iterating over native Python objects such as list or dict. Pytorch uses multiprocessing in this scenario placing the data in shared memory. However, reference counting triggers copy-on-writes ...
PyTorch Forums
discuss.pytorch.org › data
DataLoader persistent_workers Usage - data - PyTorch Forums
October 3, 2023 - Hello, I’m trying to better understand the operation of the persistent_workers option for DataLoader. My understanding is that the dataloader will not stop the worker processes that have been consuming the dataset after you stop consuming from it. To me this implies that it will save the state of the Dataloader instance and when you come back to consume more batches it will pick up where it left off.
Lightning AI
lightning.ai › docs › pytorch › stable › advanced › speed.html
Speed Up Model Training — PyTorch Lightning 2.6.1 documentation
In this case, setting persistent_workers=True in your dataloader will significantly speed up the worker startup time across epochs. GPUs of the generation Ampere or later (A100, H100, etc.) support low-precision matrix multiplication to trade-off precision for performance: # Default used by PyTorch ...
PyTorch Forums
discuss.pytorch.org › t › communicating-with-dataloader-workers › 11473
Communicating with Dataloader workers - PyTorch Forums
December 22, 2017 - Hey, I am having some issues with how the dataloader works when multiple workers are used. In my dataset, I resize the images to the input dimensions of the network. I am training a fully convolutional network and I can thus change the input dimension of my network in order to make it more ...
PyTorch
docs.pytorch.org › docs › stable › data.html
Redirecting…
Redirecting… · Continue to ../2.12/data.html
Kaggle
kaggle.com › questions-and-answers › 175432
How does the “number of workers” parameter in PyTorch dataloader actually work? | Kaggle
The value of num_workers decides the number of cores of cpu to be used for data processing. If you assign num_workers=0, it uses one core of the cpu. If you assign num_workers greater than the number of cores you have available, it will simply ...
PyTorch Lightning
pytorch-lightning.readthedocs.io › en › 0.10.0 › performance.html
Fast Performance — PyTorch-Lightning 0.10.0 documentation
Dataloader(dataset, num_workers=8, pin_memory=True)
Reddit
reddit.com › r/pytorch › number of workers of data loader for reading data from hdd
r/pytorch on Reddit: number of workers of data loader for reading data from HDD
August 28, 2024 -
Hello,will there be an advantage of using num_workers > 0 when reading data from a hdd during training? and is there a downside to my models accuracy when using less workers. Thank you for your response
Top answer 1 of 2
4
The number of workers are the processes used to "get the minibatches ready" for your training loop. If you have multiple workers, minibatches can be loaded in parallel. So this has nothing to do with your model's accuracy/performance, but more with the time your model needs to train. Since the workers have to be coordinated, too many workers will actually slow you down. This is probably dependent on your individual setup. In my experience, 4-7 workers are fine - but you can just test this by timing your training for a few epochs.
2 of 2
3
most likely yes, but benchmarking it would give you a more definitive answer. no, it'll just be slower.
PyTorch Forums
discuss.pytorch.org › vision
How to choose the value of the num_workers of Dataloader - vision - PyTorch Forums
August 21, 2019 - I run models on a machine with 8 core CPU and NVIDIA v100, how should I choose the num_workers to make the data be loaded efficiently.
PyTorch Forums
discuss.pytorch.org › t › in-what-order-do-dataloader-workers-do-their-job › 88288
In what order do dataloader workers do their job? - PyTorch Forums
July 7, 2020 - Hello, Hello, i was wondering how the dataloder with num_workers > 0 queu works. I imagine N wokers are created. I see 2 options: the program goes through all workers in sequence? This would mean that if one worker is delayed for some reason, the other workers have to wait until this specific ...
DigitalOcean
digitalocean.com › community › tutorials › dataloaders-abstractions-pytorch
A Guide to the DataLoader Class and Abstractions in PyTorch | DigitalOcean
February 3, 2026 - The PyTorch DataLoader improves model training performance through mini-batch loading, multiprocessing with num_workers, and configurable memory optimizations.
Reddit
reddit.com › r/machinelearning › pytorch dataloader optimizations [d]
r/MachineLearning on Reddit: PyTorch Dataloader Optimizations [D]
March 27, 2024 -
What are some optimizations that one could use for the data loader in PyTorch? The data type could be anything. But I primarily work with images and text. We know you can define your own. But does anyone have any clever tricks to share? Thank you in advance!
Top answer 1 of 5
94
Doubling num_workers is my favourite "optimization".
2 of 5
35
My current side project (which should work if properly implemented): rewrite it all in c++. Multiprocessing+pin_memory overhead is pretty high for some of our cases (ideally we need to sustain ~1GB/s/GPU, maybe 100-400 unique features). Decreasing the overhead from 4 copies after reading to 1 should hopefully help. Currently we have: Read data from s3 into pyarrow table combine_chunks for each batch because it's hard to work with chunked arrays directly (copy 1) Fill nulls (copy 2, sometimes two copies) add to multiprocessing queue (copy 3, iiuc this calls share_memory_() which copies) read from multiprocessing queue (zero copy, but it can be quite slow if you have a lot of tensors) Pin memory (copy 4, in thread, but still is slow if you have a lot of tensors) And the most fun way to optimize seems to be just rewriting it all