1. When num_workers>0, only these workers will retrieve data, main process won't. So when num_workers=2 you have at most 2 workers simultaneously putting data into RAM, not 3.
  2. Well our CPU can usually run like 100 processes without trouble and these worker processes aren't special in anyway, so having more workers than cpu cores is ok. But is it efficient? it depends on how busy your cpu cores are for other tasks, speed of cpu, speed of your hard disk etc. In short, its complicated, so setting workers to number of cores is a good rule of thumb, nothing more.
  3. Nope. Remember DataLoader doesn't just randomly return from what's available in RAM right now, it uses batch_sampler to decide which batch to return next. Each batch is assigned to a worker, and main process will wait until the desired batch is retrieved by assigned worker.

Lastly to clarify, it isn't DataLoader's job to send anything directly to GPU, you explicitly call cuda() for that.

EDIT: Don't call cuda() inside Dataset's __getitem__() method, please look at @psarka's comment for the reasoning

Answer from Shihab Shahriar Khan on Stack Overflow
🌐
PyTorch Forums
discuss.pytorch.org › t › guidelines-for-assigning-num-workers-to-dataloader › 813
Guidelines for assigning num_workers to DataLoader - PyTorch Forums
March 1, 2017 - I realize that to some extent this comes down to experimentation, but are there any general guidelines on how to choose the num_workers for a DataLoader object? Should num_workers be equal to the batch size? Or the nu…
Discussions

A clear explanation of what num_workers=0 means for a DataLoader
Hello, the pytorch documentation it says that setting num_workers=0 for a DataLoader causes it to be handled by the “main process” from the pytorch doc: " 0 means that the data will be loaded in the main process." maybe i’m wrong but usually i find that the pytorch doc gives often (but ... More on discuss.pytorch.org
🌐 discuss.pytorch.org
8
0
April 15, 2023
DataLoader persistent_workers Usage
Hello, I’m trying to better understand the operation of the persistent_workers option for DataLoader. My understanding is that the dataloader will not stop the worker processes that have been consuming the dataset after you stop consuming from it. To me this implies that it will save the ... More on discuss.pytorch.org
🌐 discuss.pytorch.org
4
0
October 3, 2023
Communicating with Dataloader workers
Hey, I am having some issues with how the dataloader works when multiple workers are used. In my dataset, I resize the images to the input dimensions of the network. I am training a fully convolutional network and I can thus change the input dimension of my network in order to make it more ... More on discuss.pytorch.org
🌐 discuss.pytorch.org
8
2
December 22, 2017
number of workers of data loader for reading data from HDD
The number of workers are the processes used to "get the minibatches ready" for your training loop. If you have multiple workers, minibatches can be loaded in parallel. So this has nothing to do with your model's accuracy/performance, but more with the time your model needs to train. Since the workers have to be coordinated, too many workers will actually slow you down. This is probably dependent on your individual setup. In my experience, 4-7 workers are fine - but you can just test this by timing your training for a few epochs. More on reddit.com
🌐 r/pytorch
9
1
August 28, 2024
🌐
GeeksforGeeks
geeksforgeeks.org › deep learning › how-the-number-of-workers-parameter-in-pytorch-dataloader-actually-works
How the "Number of Workers" Parameter in PyTorch DataLoader Actually Works - GeeksforGeeks
July 23, 2025 - Set num_workers=0 for single-threaded data loading. b. Set num_workers>0 to enable multi-threaded data loading. 4. Initialize model and optimizer. 5. Start training loop: a. For each epoch: i. Iterate over DataLoader to fetch batches of data. ii. Pass data to the model for training.
🌐
PyTorch Forums
discuss.pytorch.org › t › a-clear-explanation-of-what-num-workers-0-means-for-a-dataloader › 177614
A clear explanation of what num_workers=0 means for a DataLoader - PyTorch Forums
April 15, 2023 - Hello, the pytorch documentation it says that setting num_workers=0 for a DataLoader causes it to be handled by the “main process” from the pytorch doc: " 0 means that the data will be loaded in the main process." ma…
🌐
Medium
chtalhaanwar.medium.com › pytorch-num-workers-a-tip-for-speedy-training-ed127d825db7
PyTorch num_workers, a tip for speedy training | by Talha Anwar | Medium
September 23, 2021 - There is a huge debate what should be the optimal num_workers for your dataloader. Num_workers tells the data loader instance how many sub-processes to use for data loading. If the num_worker is zero (default) the GPU has to weight for CPU to ...
🌐
AWS
docs.aws.amazon.com › codeguru › detector-library › python › pytorch-data-loader-with-multiple-workers
Pytorch data loader with multiple workers | Amazon Q, Detector Library
Using DataLoader with num_workers greater than 0 can cause increased memory consumption over time when iterating over native Python objects such as list or dict. Pytorch uses multiprocessing in this scenario placing the data in shared memory. However, reference counting triggers copy-on-writes ...
🌐
PyTorch Forums
discuss.pytorch.org › data
DataLoader persistent_workers Usage - data - PyTorch Forums
October 3, 2023 - Hello, I’m trying to better understand the operation of the persistent_workers option for DataLoader. My understanding is that the dataloader will not stop the worker processes that have been consuming the dataset after you stop consuming from it. To me this implies that it will save the state of the Dataloader instance and when you come back to consume more batches it will pick up where it left off.
Find elsewhere
🌐
Lightning AI
lightning.ai › docs › pytorch › stable › advanced › speed.html
Speed Up Model Training — PyTorch Lightning 2.6.1 documentation
In this case, setting persistent_workers=True in your dataloader will significantly speed up the worker startup time across epochs. GPUs of the generation Ampere or later (A100, H100, etc.) support low-precision matrix multiplication to trade-off precision for performance: # Default used by PyTorch ...
🌐
PyTorch Forums
discuss.pytorch.org › t › communicating-with-dataloader-workers › 11473
Communicating with Dataloader workers - PyTorch Forums
December 22, 2017 - Hey, I am having some issues with how the dataloader works when multiple workers are used. In my dataset, I resize the images to the input dimensions of the network. I am training a fully convolutional network and I can thus change the input dimension of my network in order to make it more ...
🌐
Kaggle
kaggle.com › questions-and-answers › 175432
How does the “number of workers” parameter in PyTorch dataloader actually work? | Kaggle
The value of num_workers decides the number of cores of cpu to be used for data processing. If you assign num_workers=0, it uses one core of the cpu. If you assign num_workers greater than the number of cores you have available, it will simply ...
🌐
Eventual
eventual.ai › blog › pytorch-data-loader
Using PyTorch DataLoaders to Streamline Multimodal Data
October 22, 2025 - PyTorch's DataLoader is a utility that plays a critical role in deep learning pipelines. It takes a dataset and wraps it with an iterable that can efficiently load data in batches, shuffle data each epoch, and utilize parallel workers for speed.
🌐
PyTorch Forums
discuss.pytorch.org › vision
How to choose the value of the num_workers of Dataloader - vision - PyTorch Forums
August 21, 2019 - I run models on a machine with 8 core CPU and NVIDIA v100, how should I choose the num_workers to make the data be loaded efficiently.
🌐
PyTorch Forums
discuss.pytorch.org › t › in-what-order-do-dataloader-workers-do-their-job › 88288
In what order do dataloader workers do their job? - PyTorch Forums
July 7, 2020 - Hello, Hello, i was wondering how the dataloder with num_workers > 0 queu works. I imagine N wokers are created. I see 2 options: the program goes through all workers in sequence? This would mean that if one worker is delayed for some reason, the other workers have to wait until this specific ...
🌐
DigitalOcean
digitalocean.com › community › tutorials › dataloaders-abstractions-pytorch
A Guide to the DataLoader Class and Abstractions in PyTorch | DigitalOcean
February 3, 2026 - The PyTorch DataLoader improves model training performance through mini-batch loading, multiprocessing with num_workers, and configurable memory optimizations.
🌐
CodeGenes
codegenes.net › blog › pytorch-dataloader-persistent-work
PyTorch DataLoader Persistent Workers: A Comprehensive Guide — codegenes.net
One of the useful features of the DataLoader is the persistent_workers option, which can significantly improve data loading performance, especially in scenarios where data loading is time-consuming.
🌐
Medium
medium.com › @Modexa › 8-pytorch-dataloader-tactics-to-max-out-your-gpu-22270f6f3fa8
8 PyTorch DataLoader Tactics to Max Out Your GPU | by Modexa | Medium
October 6, 2025 - Eight proven PyTorch DataLoader tactics — workers, pin memory, prefetching, GPU streams, bucketing, and more — to keep GPUs saturated and training fast.