If you are lazily loading the data (which is the common use case, if you are dealing with large datasets), the memory overhead from the copies might be small in comparison to the overall memory usage in the script. That being said, you could try to use shared arrays as described here instead. Answer from ptrblck on discuss.pytorch.org
Discussions

python multiprocessing - Pytorch dataset and shared memory? - Stack Overflow
7 Can we share memory between workers in a Pytorch DataLoader? More on stackoverflow.com
🌐 stackoverflow.com
How to share memory for Dataloader when using multiprocess?
I wrap my data with Dataset, then use Dataloader for enumerate. But because of copy-on-write mechanism, my memory goes so high out of expected. My problem can be simplified as following: class DataIter(Dataset): def __init__(self): self.data = range(90317731) def __len__(self): return ... More on discuss.pytorch.org
🌐 discuss.pytorch.org
0
2
July 9, 2018
Shared memory with torch.multiprocessing
On top of that, I use multiple num_workers in my dataloader so having a simple Python list as a caxhe would mean multiple caches which eats up a lot of memory. The natural solution is to use shared memory. And this is how I use it In the launch process, do if __name__ == '__main__... More on discuss.pytorch.org
🌐 discuss.pytorch.org
0
1
July 4, 2020
Pytorch Dataloader Memory Leak
Hi, I noticed that while training a PyTorch model the subprocesses that are started by the dataloader workers are accumulating memory over time while loading new batches and it seems this memory is never released, ultimately resulting in a “dataloader worker does not have sufficient shared memory” ... More on discuss.pytorch.org
🌐 discuss.pytorch.org
8
1
August 30, 2023
🌐
Yuxin's Blog
ppwwyyxx.com › blog › 2022 › Demystify-RAM-Usage-in-Multiprocess-DataLoader
Demystify RAM Usage in Multi-Process Data Loaders - Yuxin's Blog
December 24, 2022 - The essence of the solution is to let all processes share memory through a single torch.Tensor object, which needs to be moved to Linux shared memory by PyTorch's custom pickling routine.
Top answer
1 of 1
2

The answer depends on your OS and settings. If you are using Linux with the default process start method, you don't have to worry about duplicates or process communication, because worker processes share memory! This is efficiently implemented as Inter Process Communication (IPC) through shared memory (some more details here). For Windows, things are more complicated. From the documentation:

Since workers rely on Python multiprocessing, worker launch behavior is different on Windows compared to Unix.

On Unix, fork() is the default multiprocessing start method. Using fork(), child workers typically can access the dataset and Python argument functions directly through the cloned address space.

On Windows, spawn() is the default multiprocessing start method. Using spawn(), another interpreter is launched which runs your main script, followed by the internal worker function that receives the dataset, collate_fn and other arguments through pickle serialization.

This means that your dynamically cached Dataset members would be automatically shared between all processes on Linux. That's great! However, on Windows, processes will not have received copies of them (they only received the Dataset upon spawning), so you should use a process communication scheme, e.g. through multiprocessing Pipe, Queue or Manager (preferred for broadcasting to multiple processes, but you would have to convert tensors to lists). This is not very efficient, besides rather bothersome to implement.

Nevertheless, there is another method: memory mapping (memmaping). This means that your objects will be written to virtual memory, and again all processes will have access to it, while a respective "shadow copy" of these objects will at some point be flushed and exist on your hard drive (can be placed in a /tmp directory). You can use memmaping with the mmap module, in which case your objects will have to be serialized as a binary file, or you can use numpy.memmap. You can find more details here.

🌐
PyTorch Forums
discuss.pytorch.org › t › how-to-share-memory-for-dataloader-when-using-multiprocess › 20790
How to share memory for Dataloader when using multiprocess? - PyTorch Forums
July 9, 2018 - I wrap my data with Dataset, then use Dataloader for enumerate. But because of copy-on-write mechanism, my memory goes so high out of expected. My problem can be simplified as following: class DataIter(Dataset): def __init__(self): self.data = range(90317731) def __len__(self): return ...
🌐
Latentwalk
latentwalk.io › 2023 › 08 › 19 › torch-shmem
PyTorch DataLoaders and Shared Memory · Walking in the Latent Space
August 19, 2023 - Unlike pipes, once a shared memory region is mapped, the kernel is not involved with data transfers which means bytes can be copied more efficiently. So when a tensor is put inside the data queue by a worker process, PyTorch creates a new shared memory region and places tensor data in it.
🌐
AWS
docs.aws.amazon.com › codeguru › detector-library › python › pytorch-data-loader-with-multiple-workers
Pytorch data loader with multiple workers | Amazon Q, Detector Library
Using DataLoader with num_workers greater than 0 can cause increased memory consumption over time when iterating over native Python objects such as list or dict. Pytorch uses multiprocessing in this scenario placing the data in shared memory. However, reference counting triggers copy-on-writes ...
🌐
PyTorch Forums
discuss.pytorch.org › distributed
Shared memory with torch.multiprocessing - distributed - PyTorch Forums
July 4, 2020 - On top of that, I use multiple num_workers in my dataloader so having a simple Python list as a caxhe would mean multiple caches which eats up a lot of memory. The natural solution is to use shared memory. And this is how I use it In the launch process, do if __name__ == '__main__...
Find elsewhere
🌐
PyTorch Forums
discuss.pytorch.org › data
Dataset size and limited shared memory - data - PyTorch Forums
January 26, 2023 - The training cannot start because I obtain the following message: RuntimeError: DataLoader worker (pid 12945) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit. I’m new to PyTorch and Colab and I’m ...
🌐
Kaggle
kaggle.com › product-feedback › 72606
increase pytorch shared memory | Kaggle
I'm trying out pytorch 1.0 and fastai 1.0 https://www.kaggle.com/dromosys/human-protein-fastai-v3 I get RuntimeError: DataLoader worker (pid 173) is killed...
🌐
GitHub
github.com › jotaf98 › shareddataset
GitHub - jotaf98/shareddataset: A PyTorch Dataset that caches samples in shared memory, accessible globally to all processes · GitHub
# the worker processes of a DataLoader all share the same memory. # use persistent workers to ensure the SharedDataset is not deallocated # between epochs.
Starred by 25 users
Forked by 2 users
Languages   Python
🌐
Stack Overflow
stackoverflow.com › questions › 73613484 › python-multi-processing-with-shared-memory-and-pytorch-data-loader-runtimeerro
python multi processing with shared memory and pytorch data loader - RuntimeError:use CUDA with multiprocessing you must use the 'spawn' start method - Stack Overflow
I am trying to implement a program with a producer and a consumer classes. The producer class reads the numpy array(an image) and puts it in a shared memory and the consumer class will read the numpy array data from the shared memory and apply a pytorch inference model on that.
🌐
GitHub
github.com › pytorch › pytorch › issues › 5040
Give a better error when we run out of shared memory, instead of "RuntimeError: DataLoader worker (pid 13) is killed by signal: Bus error." · Issue #5040 · pytorch/pytorch
February 5, 2018 - When I set num_workers=1 or other value greater than 0 in torch.utils.data.DataLoader, I get this error. The detail of the error: Traceback (most recent call last): File "/opt/project/train.py", line 150, in <module> dataset_sizes=dataset_sizes) File "/opt/project/train.py", line 51, in train_model outputs = model(inputs) File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__ result = self.forward(*input, **kwargs) File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 64, in forward
Author   pytorch
🌐
Reddit
reddit.com › r/pytorch › dataloader and multiprocessing
r/pytorch on Reddit: Dataloader and multiprocessing
July 30, 2019 -

Hi everyone! I working on image classification and I have a project where we made the data loading part ourselves. The code is capable to load and preprocess images for the next batch on a different threads (using an output Tensor in shared memory for efficiency), while the current batch is being processed by the GPU.

But I want to implement a more complex data sampling scheme so I need something like the pytorch dataloader.

Is there a way to keep the efficiency of the old design (load next batch during inference and backprop, as few Tensors as possible) while using DataLoader?

I tried implementing something using Dataloader but it was very unefficient, especially the execution collate_fn.

Any advice on efficient dataloading that could be interesting?

🌐
GitHub
github.com › PyTorchLightning › pytorch-lightning › issues › 2352
Shared memory leak with large dataset and num_workers > 0 · Issue #2352 · Lightning-AI/pytorch-lightning
June 25, 2020 - When I use num_workers > 0 in DataLoader I obviosly use shared memory through Pytorch multiprocessing.
Author   Lightning-AI
🌐
Eventual
eventual.ai › blog › pytorch-data-loader
Using PyTorch DataLoaders to Streamline Multimodal Data
October 22, 2025 - If you have a very large in-memory Dataset and spawn multiple workers, you would run out of RAM because each worker replicates that data. There are ways to work around this: Use shared memory constructs or memory-mapped files
🌐
GitHub
gist.github.com › pzelasko › cda0d8d7f4de880e2f59e4ed5e3b346a
Disable shared memory in PyTorch dataloader · GitHub
Disable shared memory in PyTorch dataloader. GitHub Gist: instantly share code, notes, and snippets.
🌐
Grokipedia
grokipedia.com › shared memory leak in pytorch dataloader
Shared memory leak in PyTorch DataLoader — Grokipedia
March 19, 2026 - The shared memory leak in PyTorch's DataLoader refers to behaviors that can lead to excessive memory consumption during long-running training loops when using multiple worker processes (num_workers > 0) with multiprocessing enabled.