I think I understand what you are trying to do. I have no idea of the relative performance of our two machines so maybe you can benchmark it yourself.
from PIL import Image
import numpy as np
# Load images, convert to RGB, then to numpy arrays and ravel into long, flat things
a=np.array(Image.open('a.png').convert('RGB')).ravel()
b=np.array(Image.open('b.png').convert('RGB')).ravel()
# Calculate the sum of the absolute differences divided by number of elements
MAE = np.sum(np.abs(np.subtract(a,b,dtype=np.float))) / a.shape[0]
The only "tricky" thing in there is the forcing of the result type of np.subtract() to a float which ensures I can store negative numbers. It may be worth trying with dtype=np.int16 on your hardware to see if that is faster.
A fast way to benchmark it is as follows. Start ipython and then type in the following:
from PIL import Image
import numpy as np
a=np.array(Image.open('a.png').convert('RGB')).ravel()
b=np.array(Image.open('b.png').convert('RGB')).ravel()
Now you can time my code with:
%timeit np.sum(np.abs(np.subtract(a,b,dtype=np.float))) / a.shape[0]
6.72 µs ± 21.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Or, you can try an int16 version like this:
%timeit np.sum(np.abs(np.subtract(a,b,dtype=np.int16))) / a.shape[0]
6.43 µs ± 30.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
If you want to time your code, paste in your function then use:
%timeit compare_images_pil(img1, img2)
Answer from Mark Setchell on Stack OverflowI think I understand what you are trying to do. I have no idea of the relative performance of our two machines so maybe you can benchmark it yourself.
from PIL import Image
import numpy as np
# Load images, convert to RGB, then to numpy arrays and ravel into long, flat things
a=np.array(Image.open('a.png').convert('RGB')).ravel()
b=np.array(Image.open('b.png').convert('RGB')).ravel()
# Calculate the sum of the absolute differences divided by number of elements
MAE = np.sum(np.abs(np.subtract(a,b,dtype=np.float))) / a.shape[0]
The only "tricky" thing in there is the forcing of the result type of np.subtract() to a float which ensures I can store negative numbers. It may be worth trying with dtype=np.int16 on your hardware to see if that is faster.
A fast way to benchmark it is as follows. Start ipython and then type in the following:
from PIL import Image
import numpy as np
a=np.array(Image.open('a.png').convert('RGB')).ravel()
b=np.array(Image.open('b.png').convert('RGB')).ravel()
Now you can time my code with:
%timeit np.sum(np.abs(np.subtract(a,b,dtype=np.float))) / a.shape[0]
6.72 µs ± 21.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Or, you can try an int16 version like this:
%timeit np.sum(np.abs(np.subtract(a,b,dtype=np.int16))) / a.shape[0]
6.43 µs ± 30.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
If you want to time your code, paste in your function then use:
%timeit compare_images_pil(img1, img2)
Digging a bit, I found this repository that takes a different approach that is more based on Pillow itself and seems to give similar results.
from PIL import Image
from PIL import ImageChops, ImageStat
def compare_images_pil(img1, img2):
'''Calculate the difference between two images of the same size
by comparing channel values at the pixel level.
`delete_diff_file`: removes the diff image after ratio found
`diff_img_file`: filename to store diff image
Adapted from Nicolas Hahn:
https://github.com/nicolashahn/diffimg/blob/master/diffimg/__init__.py
'''
# Don't compare if images are of different modes or different sizes.
if (img1.mode != img2.mode) \
or (img1.size != img2.size) \
or (img1.getbands() != img2.getbands()):
return None
# Generate diff image in memory.
diff_img = ImageChops.difference(img1, img2)
# Calculate difference as a ratio.
stat = ImageStat.Stat(diff_img)
# Can be [r,g,b] or [r,g,b,a].
sum_channel_values = sum(stat.mean)
max_all_channels = len(stat.mean) * 255
diff_ratio = sum_channel_values / max_all_channels
return diff_ratio * 100
For my test images sample, the results seem to be the same (except for a few minor float rounding errors) and it runs considerably faster than the first version I had above.
python - Sum of absolute differences of a number in an array - Stack Overflow
performance - Python pypy: Efficient sum of absolute array/vector difference - Stack Overflow
python - Sum of absolute difference of values and corresponding indices of an array - Code Review Stack Exchange
python - Absolute difference of two NumPy arrays - Stack Overflow
You can simply use NumPy functions to get the desired result:
np.sum(np.abs(np.diff(time)))
This works according to your desired formula even though np.diff calculates the difference time[i+1] - time[i] (instead of time[i] - time[i+1]) because you're using the absolute value.
Because this uses NumPy functions on a NumPy array it's probably much faster than any Python comprehension and/or functions.
You want list generator:
li = [abs(time[i-1] - t) for i, t in enumerate(time) if i > 0]
sum(li)
make list of difference between current and previous element starting with second element (by index 1).
I can offer an O(n log n) solution for a start: Let fi be the i-th number of the result. We have:

When walking through the array from left to right and maintain a binary search tree of the elements a0 to ai-1, we can solve all parts of the formula in O(log n):
- Keep subtree sizes to count the elements larger than/smaller than a given one
- Keep cumulative subtree sums to answer the sum queries for elements larger than/smaller than a given one
We can replace the augmented search tree with some simpler data structures if we want to avoid the implementation cost:
- Sort the array beforehand. Assign every number its rank in the sorted order
- Keep a binary indexed tree of 0/1 values to calculate the number of elements smaller than a given value
- Keep another binary indexed tree of the array values to calculate the sums of elements smaller than a given value
TBH I don't think this can be solved in O(n) in the general case. At the very least you would need to sort the numbers at some point. But maybe the numbers are bounded or you have some other restriction, so you might be able to implement the sum and count operations in O(1).
An implementation:
# binary-indexed tree, allows point updates and prefix sum queries
class Fenwick:
def __init__(self, n):
self.tree = [0]*(n+1)
self.n = n
def update_point(self, i, val): # O(log n)
i += 1
while i <= self.n:
self.tree[i] += val
i += i & -i
def read_prefix(self, i): # O(log n)
i += 1
sum = 0
while i > 0:
sum += self.tree[i]
i -= i & -i
return sum
def solve(a):
rank = { v : i for i, v in enumerate(sorted(a)) }
res = []
counts, sums = Fenwick(len(a)), Fenwick(len(a))
total_sum = 0
for i, x in enumerate(a):
r = rank[x]
num_smaller = counts.read_prefix(r)
sum_smaller = sums.read_prefix(r)
res.append(total_sum - 2*sum_smaller + x * (2*num_smaller - i))
counts.update_point(r, 1)
sums.update_point(r, x)
total_sum += x
return res
print(solve([3,5,6,7,1])) # [0, 2, 4, 7, 17]
print(solve([2,0,1])) # [0, 2, 2]
Here's an Omega(n log n)-comparison lower bound in the linear decision tree model. This rules out the possibility of a "nice" o(n log n)-time algorithm (two now-deleted answers both were in this class).
There is a trivial reduction to this problem from the problem of computing
f(x1, ..., xn) = sum_i sum_j |xi - xj|.
The function f is totally differentiable at x1, ..., xn if and only if x1, ..., xn are pairwise distinct. The set where f is totally differentiable thus has n! connected components, of which each leaf of the decision tree can handle at most one.
If you want the absolute element-wise difference between both matrices, you can easily subtract them with NumPy and use numpy.absolute on the resulting matrix.
import numpy as np
X = [[12,7,3],
[4 ,5,6],
[7 ,8,9]]
Y = [[5,8,1],
[6,7,3],
[4,5,9]]
result = np.absolute(np.array(X) - np.array(Y))
Outputs:
[[7 1 2]
[2 2 3]
[3 3 0]]
Alternatively (although unnecessary), if you were required to do so in native Python you could zip the dimensions together in a nested list comprehension.
result = [[abs(a-b) for a, b in zip(xrow, yrow)]
for xrow, yrow in zip(X,Y)]
Outputs:
[[7, 1, 2], [2, 2, 3], [3, 3, 0]]
Doing this becomes trivial if you cast your 2D arrays to numpy arrays:
import numpy as np
X = [[12, 7, 3],
[4, 5, 6],
[7, 8, 9]]
Y = [[5, 8, 1],
[6, 7, 3],
[4, 5, 9]]
X, Y = map(np.array, (X, Y))
result = X - Y
Numpy is designed to work easily and efficiently with matrices.
Also, you spoke about subtracting matrices, but you also seemed to want to square the individual elements and then take the square root on the result. This is also easy with numpy:
result = np.sqrt((A ** 2) - (B ** 2))