I was doing similar things today, and had the same results on my Jetson NX running in the NVP model 2 mode (15W, 6 core).
Using the CPU to resize an image 10,000 times was faster than resizing the same image 10,000 times with the GPU.
This was my code for the CPU:
cv::Mat cpu_original_image = cv::imread("test.png"); // 1400x690 RGB image
for (size_t count = 0; count < number_of_times_to_iterate; count ++)
{
cv::Mat cpu_resized_image;
cv::resize(cpu_original_image, cpu_resized_image, desired_image_size);
}
This was my code for the GPU:
cv::cuda::GpuMat gpu_original_image;
gpu_original_image.upload(cpu_original_image);
for (size_t count = 0; count < number_of_times_to_iterate; count ++)
{
cv::cuda::GpuMat gpu_resized_image;
cv::cuda::resize(gpu_original_image, gpu_resized_image, desired_image_size);
}
My timing code (not shown above) was only for the for() loops, it didn't include imread() nor upload().
When called in a loop 10K times, my results were:
- CPU: 5786.930 milliseconds
- GPU: 9678.054 milliseconds (plus an additional 170.587 milliseconds for the
upload())
Then I made 1 change to each loop. I moved the "resized" mat outside of the loop to prevent it from being created and destroyed at each iteration. My code then looked like this:
cv::Mat cpu_original_image = cv::imread("test.png"); // 1400x690 RGB image
cv::Mat cpu_resized_image;
for (size_t count = 0; count < number_of_times_to_iterate; count ++)
{
cv::resize(cpu_original_image, cpu_resized_image, desired_image_size);
}
...and for the GPU:
cv::cuda::GpuMat gpu_original_image;
gpu_original_image.upload(cpu_original_image);
cv::cuda::GpuMat gpu_resized_image;
for (size_t count = 0; count < number_of_times_to_iterate; count ++)
{
cv::cuda::resize(gpu_original_image, gpu_resized_image, desired_image_size);
}
The for() loop timing results are now:
- CPU: 5768.181 milliseconds (basically unchanged)
- GPU: 2827.898 milliseconds (from 9.7 seconds to 2.8 seconds)
This looks much better! GPU resize is now faster than CPU resize...as long as you're doing lots of work with the GPU and not a single resize. And as long as you don't continuously re-allocate temporary GPU mats, as that seems to be quite expensive.
But after all this, to go back to your original question: if all you are doing is resizing a single image once, or resizing many images once each, the GPU resize won't help you since uploading each image to the GPU mat will take longer than the original resize! Here are my results when trying that on a Jetson NX:
- single image resize on CPU: 3.565 milliseconds
- upload mat to GPU: 186.966 milliseconds
- allocation of 2nd GPU mat and gpu resize: 225.925 milliseconds
So on the CPU the NX can do it in < 4 milliseconds, while on the GPU it takes over 400 milliseconds.
Answer from Stéphane on Stack Overflow