This is known as the subset-sum problem and it is a well known NP-complete problem. So basically there is no efficient solution. See for example https://en.wikipedia.org/wiki/Subset_sum_problem
However If your number N is not too large, there is a pseudo polynomial algorithms, using dynamic programming: You read the list A from left to right and keep the list of the sum which are doable and smaller than N. If you know the number which are doable for a given A, you can easily get those which are doable for A + [a]. Hence the dynamic programming. It will typically be fast enough for a problem of the size you gave there.
Here is a Python quick solution:
def subsetsum(A, N):
res = {0 : []}
for i in A:
newres = dict(res)
for v, l in res.items():
if v+i < N:
newres[v+i] = l+[i]
elif v+i == N:
return l+[i]
res = newres
return None
Then
>>> A = [8, 9, 15, 15, 33, 36, 39, 45, 46, 60, 68, 73, 80, 92, 96]
>>> subsetsum(A, 183)
[15, 15, 33, 36, 39, 45]
After OP edit:
Now I correctly understand you problem, I'll still think that your problem can be solved efficiently, provided you have an efficient subset-sum solver: I'd use divide and conquer solution on B:
- cut B into two approximately equal pieces B1 and B2
- use your subset-sum solver to search among A for all subsets S whose sum are equal to sum(B1).
- for each such S:
- call recursively solve(S, B1) and solve(A - S, B2)
- if both succeed you have a solution
However, your (71, 10) problem below is out of reach for the dynamic programming solution I suggested.
By the way, here is a quick solution of your problem not using divide and conquer, but which contains the correct adaptation of my dynamic solver to get all solutions:
class NotFound(BaseException):
pass
from collections import defaultdict
def subset_all_sums(A, N):
res = defaultdict(set, {0 : {()}})
for nn, i in enumerate(A):
# perform a deep copy of res
newres = defaultdict(set)
for v, l in res.items():
newres[v] |= set(l)
for v, l in res.items():
if v+i <= N:
for s in l:
newres[v+i].add(s+(i,))
res = newres
return res[N]
def list_difference(l1, l2):
## Similar to merge.
res = []
i1 = 0; i2 = 0
while i1 < len(l1) and i2 < len(l2):
if l1[i1] == l2[i2]:
i1 += 1
i2 += 1
elif l1[i1] < l2[i2]:
res.append(l1[i1])
i1 += 1
else:
raise NotFound
while i1 < len(l1):
res.append(l1[i1])
i1 += 1
return res
def solve(A, B):
assert sum(A) == sum(B)
if not B:
return [[]]
res = []
ss = subset_all_sums(A, B[0])
for s in ss:
rem = list_difference(A, s)
for sol in solve(rem, B[1:]):
res.append([s]+sol)
return res
Then:
>>> solve(A, B)
[[(15, 33, 39, 96), (36,), (8, 15, 60, 68, 80), (9, 46, 73), (45, 92)],
[(15, 33, 39, 96), (36,), (8, 9, 15, 46, 73, 80), (60, 68), (45, 92)],
[(8, 15, 15, 33, 39, 73), (36,), (9, 46, 80, 96), (60, 68), (45, 92)],
[(15, 15, 73, 80), (36,), (8, 9, 33, 39, 46, 96), (60, 68), (45, 92)],
[(15, 15, 73, 80), (36,), (9, 39, 45, 46, 92), (60, 68), (8, 33, 96)],
[(8, 33, 46, 96), (36,), (9, 15, 15, 39, 73, 80), (60, 68), (45, 92)],
[(8, 33, 46, 96), (36,), (15, 15, 60, 68, 73), (9, 39, 80), (45, 92)],
[(9, 15, 33, 46, 80), (36,), (8, 15, 39, 73, 96), (60, 68), (45, 92)],
[(45, 46, 92), (36,), (8, 15, 39, 73, 96), (60, 68), (9, 15, 33, 80)],
[(45, 46, 92), (36,), (8, 15, 39, 73, 96), (15, 33, 80), (9, 60, 68)],
[(45, 46, 92), (36,), (15, 15, 60, 68, 73), (9, 39, 80), (8, 33, 96)],
[(45, 46, 92), (36,), (9, 15, 15, 39, 73, 80), (60, 68), (8, 33, 96)],
[(9, 46, 60, 68), (36,), (8, 15, 39, 73, 96), (15, 33, 80), (45, 92)]]
>>> %timeit solve(A, B)
100 loops, best of 3: 10.5 ms per loop
So it is quite fast for this size of problem, though nothing is optimized here.
Answer from hivert on Stack OverflowVideos
This is known as the subset-sum problem and it is a well known NP-complete problem. So basically there is no efficient solution. See for example https://en.wikipedia.org/wiki/Subset_sum_problem
However If your number N is not too large, there is a pseudo polynomial algorithms, using dynamic programming: You read the list A from left to right and keep the list of the sum which are doable and smaller than N. If you know the number which are doable for a given A, you can easily get those which are doable for A + [a]. Hence the dynamic programming. It will typically be fast enough for a problem of the size you gave there.
Here is a Python quick solution:
def subsetsum(A, N):
res = {0 : []}
for i in A:
newres = dict(res)
for v, l in res.items():
if v+i < N:
newres[v+i] = l+[i]
elif v+i == N:
return l+[i]
res = newres
return None
Then
>>> A = [8, 9, 15, 15, 33, 36, 39, 45, 46, 60, 68, 73, 80, 92, 96]
>>> subsetsum(A, 183)
[15, 15, 33, 36, 39, 45]
After OP edit:
Now I correctly understand you problem, I'll still think that your problem can be solved efficiently, provided you have an efficient subset-sum solver: I'd use divide and conquer solution on B:
- cut B into two approximately equal pieces B1 and B2
- use your subset-sum solver to search among A for all subsets S whose sum are equal to sum(B1).
- for each such S:
- call recursively solve(S, B1) and solve(A - S, B2)
- if both succeed you have a solution
However, your (71, 10) problem below is out of reach for the dynamic programming solution I suggested.
By the way, here is a quick solution of your problem not using divide and conquer, but which contains the correct adaptation of my dynamic solver to get all solutions:
class NotFound(BaseException):
pass
from collections import defaultdict
def subset_all_sums(A, N):
res = defaultdict(set, {0 : {()}})
for nn, i in enumerate(A):
# perform a deep copy of res
newres = defaultdict(set)
for v, l in res.items():
newres[v] |= set(l)
for v, l in res.items():
if v+i <= N:
for s in l:
newres[v+i].add(s+(i,))
res = newres
return res[N]
def list_difference(l1, l2):
## Similar to merge.
res = []
i1 = 0; i2 = 0
while i1 < len(l1) and i2 < len(l2):
if l1[i1] == l2[i2]:
i1 += 1
i2 += 1
elif l1[i1] < l2[i2]:
res.append(l1[i1])
i1 += 1
else:
raise NotFound
while i1 < len(l1):
res.append(l1[i1])
i1 += 1
return res
def solve(A, B):
assert sum(A) == sum(B)
if not B:
return [[]]
res = []
ss = subset_all_sums(A, B[0])
for s in ss:
rem = list_difference(A, s)
for sol in solve(rem, B[1:]):
res.append([s]+sol)
return res
Then:
>>> solve(A, B)
[[(15, 33, 39, 96), (36,), (8, 15, 60, 68, 80), (9, 46, 73), (45, 92)],
[(15, 33, 39, 96), (36,), (8, 9, 15, 46, 73, 80), (60, 68), (45, 92)],
[(8, 15, 15, 33, 39, 73), (36,), (9, 46, 80, 96), (60, 68), (45, 92)],
[(15, 15, 73, 80), (36,), (8, 9, 33, 39, 46, 96), (60, 68), (45, 92)],
[(15, 15, 73, 80), (36,), (9, 39, 45, 46, 92), (60, 68), (8, 33, 96)],
[(8, 33, 46, 96), (36,), (9, 15, 15, 39, 73, 80), (60, 68), (45, 92)],
[(8, 33, 46, 96), (36,), (15, 15, 60, 68, 73), (9, 39, 80), (45, 92)],
[(9, 15, 33, 46, 80), (36,), (8, 15, 39, 73, 96), (60, 68), (45, 92)],
[(45, 46, 92), (36,), (8, 15, 39, 73, 96), (60, 68), (9, 15, 33, 80)],
[(45, 46, 92), (36,), (8, 15, 39, 73, 96), (15, 33, 80), (9, 60, 68)],
[(45, 46, 92), (36,), (15, 15, 60, 68, 73), (9, 39, 80), (8, 33, 96)],
[(45, 46, 92), (36,), (9, 15, 15, 39, 73, 80), (60, 68), (8, 33, 96)],
[(9, 46, 60, 68), (36,), (8, 15, 39, 73, 96), (15, 33, 80), (45, 92)]]
>>> %timeit solve(A, B)
100 loops, best of 3: 10.5 ms per loop
So it is quite fast for this size of problem, though nothing is optimized here.
A complete solution, which compute all the manner to do a total.
I use ints as characteristic sets for speed and memory usage : 19='0b10011' represent [A[0],A[1],A[4]]=[8,9,33] here.
A = [8, 9, 15, 15, 33, 36, 39, 45, 46, 60, 68, 73, 80, 92, 96]
B =[183, 36, 231, 128, 137]
def subsetsum(A,N):
res=[[0]]+[[] for i in range(N)]
for i,a in enumerate(A):
k=1<<i
stop=[len(l) for l in res]
for shift,l in enumerate(res[:N+1-a]):
n=a+shift
ln=res[n]
for s in l[:stop[shift]]: ln.append(s+k)
return res
res = subsetsum(A,max(B))
solB = [res[b] for b in B]
exactsol = ~-(1<<len(A))
def decode(answer):
return [[A[i] for i,b in enumerate(bin(sol)[::-1]) if b=='1'] for sol in answer]
def solve(i,currentsol,answer):
if currentsol==exactsol : print(decode(answer))
if i==len(B): return
for sol in solB[i]:
if not currentsol&sol:
answer.append(sol)
solve(i+1,currentsol+sol,answer)
answer.pop()
For :
solve(0,0,[])
[[9, 46, 60, 68], [36], [8, 15, 39, 73, 96], [15, 33, 80], [45, 92]]
[[9, 46, 60, 68], [36], [8, 15, 39, 73, 96], [15, 33, 80], [45, 92]]
[[8, 15, 15, 33, 39, 73], [36], [9, 46, 80, 96], [60, 68], [45, 92]]
[[9, 15, 33, 46, 80], [36], [8, 15, 39, 73, 96], [60, 68], [45, 92]]
[[9, 15, 33, 46, 80], [36], [8, 15, 39, 73, 96], [60, 68], [45, 92]]
[[15, 15, 73, 80], [36], [9, 39, 45, 46, 92], [60, 68], [8, 33, 96]]
[[15, 15, 73, 80], [36], [8, 9, 33, 39, 46, 96], [60, 68], [45, 92]]
[[45, 46, 92], [36], [15, 15, 60, 68, 73], [9, 39, 80], [8, 33, 96]]
[[45, 46, 92], [36], [9, 15, 15, 39, 73, 80], [60, 68], [8, 33, 96]]
[[45, 46, 92], [36], [8, 15, 39, 73, 96], [60, 68], [9, 15, 33, 80]]
[[45, 46, 92], [36], [8, 15, 39, 73, 96], [15, 33, 80], [9, 60, 68]]
[[45, 46, 92], [36], [8, 15, 39, 73, 96], [60, 68], [9, 15, 33, 80]]
[[45, 46, 92], [36], [8, 15, 39, 73, 96], [15, 33, 80], [9, 60, 68]]
[[15, 33, 39, 96], [36], [8, 15, 60, 68, 80], [9, 46, 73], [45, 92]]
[[15, 33, 39, 96], [36], [8, 9, 15, 46, 73, 80], [60, 68], [45, 92]]
[[15, 33, 39, 96], [36], [8, 15, 60, 68, 80], [9, 46, 73], [45, 92]]
[[15, 33, 39, 96], [36], [8, 9, 15, 46, 73, 80], [60, 68], [45, 92]]
[[8, 33, 46, 96], [36], [15, 15, 60, 68, 73], [9, 39, 80], [45, 92]]
[[8, 33, 46, 96], [36], [9, 15, 15, 39, 73, 80], [60, 68], [45, 92]]
Notice than when the two 15 are not in the same subset, the solution is doubled.
It resolves the unique solution problem :
A=[1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011,
1012, 1013, 1014, 1015, 1016, 1017, 1018, 1019, 1020, 1021, 1022, 1023,
1024, 1025, 1026, 1027, 1028, 1029, 1030, 1031, 1032, 1033, 1034, 1035,
1036, 1037, 1038, 1039, 1040, 1041, 1042, 1043, 1044, 1045, 1046, 1047,
1048, 1049]
B=[5010, 5035, 5060, 5085, 5110, 5135, 5160, 5185, 5210, 5235]
in one second. Unfortunately, it's not yet enough optimized for a (71,10) problem.
Yet another one, in the pure dynamic programming spirit : :
@functools.lru_cache(max(B))
def solutions(n):
if n==0 : return set({frozenset()}) #{{}}
if n<0 : return set()
sols=set()
for i,a in enumerate(A):
for s in solutions(n-a):
if i not in s : sols.add(s|{i})
return sols
def decode(answer): return([[A[i] for i in sol] for sol in answer])
def solve(B=B,currentsol=set(),answer=[]):
if len(currentsol)==len(A) : sols.append(decode(answer))
if B:
for sol in solutions(B[0]):
if set.isdisjoint(currentsol,sol):
solve(B[1:],currentsol|sol,answer+[sol])
sols=[];solve()
A common question in this subreddit is "I have a list of numbers and I want to see which of them add up to a specific total". There was one such post today.
This is something most people think should be fairly trivial to achieve in Excel. In reality, however, it ain't all that easy. The question is a variation on a well known NP-Complete problem in computer science called the Subset sum problem.
It can be done with Solver, but there is a variable limit and Solver will only return one possible solution.
As it is something that crops up so often I thought I'd share a workbook I have that can calculate this. Click here to download it (xlsm file). This file uses VBA to do the calculation. It uses dynamic programming to offset time complexity with space complexity but given a big list of numbers it still may take too long to be feasible (or cause you to run out of stack space...).
Hopefully this might help someone in the future.
There are doubtless other ways to do it in Excel, so if you have any I'd be interested to see them (especially interested to see if anyone can come up with a PowerQuery approach).
also, happy 35th birthday to Excel!
edit: change Dropbox link to Github