To illustrate, I've opened up this same color JPEG image:

once using the conversion
img = cv2.imread(path)
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
and another by loading it in gray scale mode
img_gray_mode = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
Like you've documented, the diff between the two images is not perfectly 0, I can see diff pixels in towards the left and the bottom

I've summed up the diff too to see
import numpy as np
np.sum(diff)
# I got 6143, on a 494 x 750 image
I tried all cv2.imread() modes
Among all the IMREAD_ modes for cv2.imread(), only IMREAD_COLOR and IMREAD_ANYCOLOR can be converted using COLOR_BGR2GRAY, and both of them gave me the same diff against the image opened in IMREAD_GRAYSCALE
The difference doesn't seem that big. My guess is comes from the differences in the numeric calculations in the two methods (loading grayscale vs conversion to grayscale)
Naturally what you want to avoid is fine tuning your code on a particular version of the image just to find out it was suboptimal for images coming from a different source.
In brief, let's not mix the versions and types in the processing pipeline.
So I'd keep the image sources homogenous, e.g. if you have capturing the image from a video camera in BGR, then I'd use BGR as the source, and do the BGR to grayscale conversion cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
Vice versa if my ultimate source is grayscale then I'd open the files and the video capture in gray scale cv2.imread(path, cv2.IMREAD_GRAYSCALE)
To illustrate, I've opened up this same color JPEG image:

once using the conversion
img = cv2.imread(path)
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
and another by loading it in gray scale mode
img_gray_mode = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
Like you've documented, the diff between the two images is not perfectly 0, I can see diff pixels in towards the left and the bottom

I've summed up the diff too to see
import numpy as np
np.sum(diff)
# I got 6143, on a 494 x 750 image
I tried all cv2.imread() modes
Among all the IMREAD_ modes for cv2.imread(), only IMREAD_COLOR and IMREAD_ANYCOLOR can be converted using COLOR_BGR2GRAY, and both of them gave me the same diff against the image opened in IMREAD_GRAYSCALE
The difference doesn't seem that big. My guess is comes from the differences in the numeric calculations in the two methods (loading grayscale vs conversion to grayscale)
Naturally what you want to avoid is fine tuning your code on a particular version of the image just to find out it was suboptimal for images coming from a different source.
In brief, let's not mix the versions and types in the processing pipeline.
So I'd keep the image sources homogenous, e.g. if you have capturing the image from a video camera in BGR, then I'd use BGR as the source, and do the BGR to grayscale conversion cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
Vice versa if my ultimate source is grayscale then I'd open the files and the video capture in gray scale cv2.imread(path, cv2.IMREAD_GRAYSCALE)
cv2.imread(path, 0) (or rather cv2.imread(path, cv2.IMREAD_GRAYSCALE)) asks the image file reading code to load the image as grayscale. For most file types, some 3rd party library is used to read them. If this library supports grayscale conversion, we’ll be using that library’s conversion routine.
Otherwise, we’re using OpenCV’s implementation of the conversion to grayscale.
There’s no reason to assume one implementation is better than the other, the differences observed are likely due to a different computation order, or to a different assumption about the whitepoint. But note that if the file has a color profile embedded, the 3rd party library might be able to use it to do the conversion, and so will have whitepoint information available. This color profile does not get loaded into OpenCV, so OpenCV will always make an assumption about the whitepoint.
Reading the image directly as grayscale is possibly a bit more efficient. If the 3rd party library converts each pixel as it’s read, we won’t need a temporary memory space to store the full color image (which takes up 3x as much memory as the grayscale image). For a format such as JPEG, which stores intensity and color information separately, reading as grayscale also avoids a lot of computation (we’re directly outputting the intensity value, rather than computing the RGB values and then converting those back to intensity).
Reading directly as grayscale has the possibility of giving different results if the image file gets converted to a different format, as then a different grayscale conversion will be used. Say, you convert your JPEG file to PNG, then using IMREAD_GRAYSCALE will use different libraries to do the grayscale conversion, using OpenCV’s conversion code will ensure both files read identically.