I don't think the other answer is correct. Allocation and deallocation definitely happens during runtime, the thing to note is that the CPU code runs asynchronously from the GPU code, so you need to wait for any deallocation to happen if you want to reserve more memory after it. Take a look at this:
import torch
a = torch.zeros(100,100,100).cuda()
print(torch.cuda.memory_allocated())
del a
torch.cuda.synchronize()
print(torch.cuda.memory_allocated())
Outputs
4000256
0
So you should del the tensors you don't need and call torch.cuda.synchronize() to make sure that the deallocation goes through before your CPU code continues to run.
In your specific case, after your function trn_l returns, any variables that were local to that function, and do not have references elsewhere, will be deallocated along with the corresponding GPU tensors. All you need to do is wait for this to happen by calling torch.cuda.synchronize() after the function call.
So, Pytorch does not allocate and deallocate memory from GPU in training time.
From https://pytorch.org/docs/stable/notes/faq.html#my-gpu-memory-isn-t-freed-properly:
PyTorch uses a caching memory allocator to speed up memory allocations. As a result, the values shown in nvidia-smi usually don’t reflect the true memory usage. See Memory management for more details about GPU memory management.
If your GPU memory isn’t freed even after Python quits, it is very likely that some Python subprocesses are still alive. You may find them via ps -elf | grep python and manually kill them with kill -9 [pid].
You can call torch.cuda.empty_cache() to free all unused memory (however, that is not really good practice as memory re-allocation is time consuming). Docs of empty_cace() : https://pytorch.org/docs/stable/cuda.html#torch.cuda.empty_cache