A list of lists named xss can be flattened using a nested list comprehension:
flat_list = [
x
for xs in xss
for x in xs
]
The above is equivalent to:
flat_list = []
for xs in xss:
for x in xs:
flat_list.append(x)
Here is the corresponding function:
def flatten(xss):
return [x for xs in xss for x in xs]
This is the fastest method.
As evidence, using the timeit module in the standard library, we see:
$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' '[x for xs in xss for x in xs]'
10000 loops, best of 3: 143 usec per loop
$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' 'sum(xss, [])'
1000 loops, best of 3: 969 usec per loop
$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' 'reduce(lambda xs, ys: xs + ys, xss)'
1000 loops, best of 3: 1.1 msec per loop
Explanation: the methods based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of M items each: the first M items are copied back and forth L-1 times, the second M items L-2 times, and so on; total number of copies is M times the sum of x for x from 1 to L excluded, i.e., M * (L**2)/2.
The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.
Answer from Alex Martelli on Stack OverflowA list of lists named xss can be flattened using a nested list comprehension:
flat_list = [
x
for xs in xss
for x in xs
]
The above is equivalent to:
flat_list = []
for xs in xss:
for x in xs:
flat_list.append(x)
Here is the corresponding function:
def flatten(xss):
return [x for xs in xss for x in xs]
This is the fastest method.
As evidence, using the timeit module in the standard library, we see:
$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' '[x for xs in xss for x in xs]'
10000 loops, best of 3: 143 usec per loop
$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' 'sum(xss, [])'
1000 loops, best of 3: 969 usec per loop
$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' 'reduce(lambda xs, ys: xs + ys, xss)'
1000 loops, best of 3: 1.1 msec per loop
Explanation: the methods based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of M items each: the first M items are copied back and forth L-1 times, the second M items L-2 times, and so on; total number of copies is M times the sum of x for x from 1 to L excluded, i.e., M * (L**2)/2.
The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.
You can use itertools.chain():
>>> import itertools
>>> list2d = [[1,2,3], [4,5,6], [7], [8,9]]
>>> merged = list(itertools.chain(*list2d))
Or you can use itertools.chain.from_iterable() which doesn't require unpacking the list with the * operator:
>>> import itertools
>>> list2d = [[1,2,3], [4,5,6], [7], [8,9]]
>>> merged = list(itertools.chain.from_iterable(list2d))
This approach is arguably more readable than [item for sublist in l for item in sublist] and appears to be faster too:
$ python3 -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99;import itertools' 'list(itertools.chain.from_iterable(l))'
20000 loops, best of 5: 10.8 usec per loop
$ python3 -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
10000 loops, best of 5: 21.7 usec per loop
$ python3 -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
1000 loops, best of 5: 258 usec per loop
$ python3 -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99;from functools import reduce' 'reduce(lambda x,y: x+y,l)'
1000 loops, best of 5: 292 usec per loop
$ python3 --version
Python 3.7.5rc1
Flatten an array in Python - Code Review Stack Exchange
algorithm - What is the fastest way to flatten arbitrarily nested lists in Python? - Stack Overflow
python - Flattening and unflattening a nested list of numpy arrays - Stack Overflow
Flatten multi dimensional array in python 3 - Stack Overflow
Videos
This is a list of numbers: Input: L = [1, [2], [3, 4, [5]]] Output: [1, 2, 3, 4, 5]
What would be the most optimal and Pythonic way to do this?
Your code looks fine, however to improve it, you should use a proper test system like pytest or unittest. To demonstrate, here is your code when using pytest, and making the test proper (you don't need to test every specific item:
def flatten(input_array):
result_array = []
for element in input_array:
if isinstance(element, int):
result_array.append(element)
elif isinstance(element, list):
result_array += flatten(element)
return result_array
def test01():
results = flatten([1, [2, 3, [4]], 5, [[6]]])
assert results == [1, 2, 3, 4, 5, 6]
def test02():
results = flatten([1, [2, 3, [4], []], [], 5, [[], [6]]])
assert results == [1, 2, 3, 4, 5, 6]
And here are the results:
C:\PycharmProjects\codereview\tests>pytest scratch_14.py
======================== test session starts ========================
platform win32 -- Python 3.7.0, pytest-3.6.2, py-1.5.4, pluggy-0.6.0
rootdir: C:\PycharmProjects\codereview\tests, inifile:
plugins: cov-2.5.1, celery-4.2.0
collected 2 items
scratch_14.py .. [100%]
===================== 2 passed in 0.09 seconds ======================
This is much easier to set up, and less code to write to validate if the solution is correct.
You asked: Is usage of TypeError exception justified?
I don't actually see any code referencing a type error. Did you forget to put it in? Or are you referring to the use of isinstance? If so, that code is fine.
Hope this helps!
Your function only deals with ints and lists. While it may be fine in the context of the question, this doesn't feel Pythonic at all as it disregard any other kind of iterable and any other type of data:
>>> flatten([1, (2, 3), [4.5], 6])
[1, 6]
Instead, you could make use of the iterator protocol to have a generic flatten function:
def flatten(iterable):
try:
iterator = iter(iterable)
except TypeError:
yield iterable
else:
for element in iterator:
yield from flatten(element)
Usage being:
>>> list(flatten([1, (2, 3), [4.5], 6]))
[1, 2, 3, 4.5, 6]
However, there are two potential issues with this approach:
you may not like that
flattenis now a generator: change it to an helper function and wrap it with a call tolist:def _flatten_generator(iterable): # previous code def flatten(iterable): return list(_flatten_generator(iterable))you won't be able to handle strings at all as individual characters are still a string and you will run into a:
RecursionError: maximum recursion depth exceeded while calling a Python objectSo you may want to add an explicit check for
strat the beginning of the function.
Here's a recursive approach that is string friendly:
nests = [1, 2, [3, 4, [5],['hi']], [6, [[[7, 'hello']]]]]
def flatten(container):
for i in container:
if isinstance(i, (list,tuple)):
for j in flatten(i):
yield j
else:
yield i
print list(flatten(nests))
returns:
[1, 2, 3, 4, 5, 'hi', 6, 7, 'hello']
Note, this doesn't make any guarantees for speed or overhead use, but illustrates a recursive solution that hopefully will be helpful.
It doesn't have to be recursive. In fact, an iterative solution is often faster because of the overhead involved in function calls. Here's an iterative version I wrote a while back:
def flatten(items, seqtypes=(list, tuple)):
for i, x in enumerate(items):
while i < len(items) and isinstance(items[i], seqtypes):
items[i:i+1] = items[i]
return items
Haven't tested the performance of this specific implementation, but it is probably not so great because of all the slice assignments, which could end up moving a lot of memory around. Still, don't assume it has to be recursive, or that it's simpler to write it that way.
This implementation does have the advantage of flattening the list "in place" rather than returning a copy, as recursive solutions invariably do. This could be useful when memory is tight. If you want a flattened copy, just pass in a shallow copy of the list you want to flatten:
flatten(mylist) # flattens existing list
newlist = flatten(mylist[:]) # makes a flattened copy
Also, this algorithm is not limited by the Python recursion limit because it's not recursive. I'm certain this will virtually never come into play, however.
2021 edit: it occurs to me that the check for the end of the list might be better handled with try/except because it will happen only once, and getting the test out of the main loop could provide a performance beneft. That would look like:
def flatten(items, seqtypes=(list, tuple)):
try:
for i, x in enumerate(items):
while isinstance(items[i], seqtypes):
items[i:i+1] = items[i]
except IndexError:
pass
return items
With some further tweaking to use the x returned by enumerate instead of accessing items[i] so much, you get this, which is either mildly or significantly faster than the original version up top, depending on the size and structure of your lists.
def flatten(items, seqtypes=(list, tuple)):
try:
for i, x in enumerate(items):
while isinstance(x, seqtypes):
items[i:i+1] = x
x = items[i]
except IndexError:
pass
return items
I was looking for a solution to flatten and unflatten nested lists of numpy arrays, but only found this unanswered question, so I came up with this:
def _flatten(values):
if isinstance(values, np.ndarray):
yield values.flatten()
else:
for value in values:
yield from _flatten(value)
def flatten(values):
# flatten nested lists of np.ndarray to np.ndarray
return np.concatenate(list(_flatten(values)))
def _unflatten(flat_values, prototype, offset):
if isinstance(prototype, np.ndarray):
shape = prototype.shape
new_offset = offset + np.product(shape)
value = flat_values[offset:new_offset].reshape(shape)
return value, new_offset
else:
result = []
for value in prototype:
value, offset = _unflatten(flat_values, value, offset)
result.append(value)
return result, offset
def unflatten(flat_values, prototype):
# unflatten np.ndarray to nested lists with structure of prototype
result, offset = _unflatten(flat_values, prototype, 0)
assert(offset == len(flat_values))
return result
Example:
a = [
np.random.rand(1),
[
np.random.rand(2, 1),
np.random.rand(1, 2, 1),
],
[[]],
]
b = flatten(a)
# 'c' will have values of 'b' and structure of 'a'
c = unflatten(b, a)
Output:
a:
[array([ 0.26453544]), [array([[ 0.88273824],
[ 0.63458643]]), array([[[ 0.84252894],
[ 0.91414218]]])], [[]]]
b:
[ 0.26453544 0.88273824 0.63458643 0.84252894 0.91414218]
c:
[array([ 0.26453544]), [array([[ 0.88273824],
[ 0.63458643]]), array([[[ 0.84252894],
[ 0.91414218]]])], [[]]]
License: WTFPL
Here is what I come up with, which turned out to be ~30x faster than iterating over the nested list and loading individually.
def flatten(nl):
l1 = [len(s) for s in itertools.chain.from_iterable(nl)]
l2 = [len(s) for s in nl]
nl = list(itertools.chain.from_iterable(
itertools.chain.from_iterable(nl)))
return nl,l1,l2
def reconstruct(nl,l1,l2):
return np.split(np.split(nl,np.cumsum(l1)),np.cumsum(l2))[:-1]
L_flat,l1,l2 = flatten(L)
L_reconstructed = reconstruct(L_flat,l1,l2)
A better solution solution would work iteratively for an arbitrary number of nested levels.
Since you're using Python 3, you can take advantage of yield from with a recursive function. It has been introduced in Python 3.3.
As a bonus, you can flatten arbitrary nested lists, tuples, sets or ranges:
test_list = [1, [1], [12, 'test', set([3, 4, 5])], 2, 3, ('hello', 'world'), [range(3)]]
def flatten(something):
if isinstance(something, (list, tuple, set, range)):
for sub in something:
yield from flatten(sub)
else:
yield something
print(list(flatten(test_list)))
# [1, 1, 12, 'test', 3, 4, 5, 2, 3, 'hello', 'world', 0, 1, 2]
print(list(flatten('Not a list')))
# ['Not a list']
print(list(flatten(range(10))))
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Here's another example with a debug line:
def flatten(something, level=0):
print("%sCalling flatten with %r" % (' ' * level, something))
if isinstance(something, (list, tuple, set, range)):
for sub in something:
yield from flatten(sub, level+1)
else:
yield something
list(flatten([1, [2, 3], 4]))
#Calling flatten with [1, [2, 3], 4]
# Calling flatten with 1
# Calling flatten with [2, 3]
# Calling flatten with 2
# Calling flatten with 3
# Calling flatten with 4
If the sublists always contain only one item then
flatList = [item[0] if isinstance(item, list) else item for item in testList]