In order to shuffle the sequence uniformly, random.shuffle() needs to know how long the input is. A generator cannot provide this; you have to materialize it into a list:
lst = list(yielding(x))
random.shuffle(lst)
for i in lst:
print i
You could, instead, use sorted() with random.random() as the key:
for i in sorted(yielding(x), key=lambda k: random.random()):
print(i)
but since this also produces a list, there is little point in going this route.
Demo:
>>> import random
>>> x = [1,2,3,4,5,6,7,8,9]
>>> sorted(iter(x), key=lambda k: random.random())
[9, 7, 3, 2, 5, 4, 6, 1, 8]
Answer from Martijn Pieters on Stack OverflowIn order to shuffle the sequence uniformly, random.shuffle() needs to know how long the input is. A generator cannot provide this; you have to materialize it into a list:
lst = list(yielding(x))
random.shuffle(lst)
for i in lst:
print i
You could, instead, use sorted() with random.random() as the key:
for i in sorted(yielding(x), key=lambda k: random.random()):
print(i)
but since this also produces a list, there is little point in going this route.
Demo:
>>> import random
>>> x = [1,2,3,4,5,6,7,8,9]
>>> sorted(iter(x), key=lambda k: random.random())
[9, 7, 3, 2, 5, 4, 6, 1, 8]
Depending on the case, if you know how much data you have ahead of time, you can index the data and compute/read from it based on a shuffled index. This amounts to: 'don't use a generator for this problem', and without specific use-cases it's hard to come up with a general method.
Alternatively... If you need to use the generator...
it depends on 'how shuffled' you want the data. Of course, like folks have pointed out, generators don't have a length, so you need to at some point evaluate the generator, which could be expensive. If you don't need perfect randomness, you can introduce a shuffle buffer:
from itertools import islice
import numpy as np
def shuffle(generator, buffer_size):
while True:
buffer = list(islice(generator, buffer_size))
if len(buffer) == 0:
break
np.random.shuffle(buffer)
for item in buffer:
yield item
shuffled_generator = shuffle(my_generator, 256)
This will shuffle data in chunks of buffer_size, so you can avoid memory issues if that is your limiting factor. Of course, this is not a truly random shuffle, so it shouldn't be used on something that's sorted, but if you just need to add some randomness to your data this may be a good solution.
python - Shuffling multiple iterators in order - Code Review Stack Exchange
python - Shuffled range iterator - Stack Overflow
python - How to reset and shuffle a "next" iterator? - Stack Overflow
Adding random.shuffled to the random module (renamed thread) - Ideas - Discussions on Python.org
While I do agree with others that Solution 2 is more readable with some improvements, there are also a few improvements that can be done on Solution 1.
It is unnecessary to construct lists from iterables (e.g., generator expressions) when all that is needed is an iterable. For example,
_args = [arg if type(arg)!=dict else arg.items() for arg in args] args_split = [arg for arg in zip(*_args)]Here, the unpacking operator
*works on arbitrary iterables. So one can just do_args = (arg if type(arg)!=dict else arg.items() for arg in args) args_split = [arg for arg in zip(*_args)]The parantheses keep the generator expressions without actually materializing them into lists.
It is better to use
isinstance(arg, cls)rather thantype(arg) == cls- Unpacking an iterable into a list can be done using
list(iterable), which is more efficient than a list comprehension[arg for arg in iterable]that uses an explicitfor-loop. This expression
args_types[i for i, arg in enumerate(args_shuffled)]can be rewritten using
zipto avoid the need of indices:[cls(arg) for cls, arg in zip(args_types, args_shuffled)]
Following is an improved version of Solution 1
def ordered_shuffle(*args):
arg_types = map(type, args)
arg_elements = (arg.items() if isinstance(arg, dict) else arg for arg in args)
zipped_args = list(zip(*arg_elements))
random.shuffle(zipped_args)
return [cls(elements) for cls, elements in zip(arg_types, zip(*zipped_args))]
functools.singledispatch
functools library includes the singledispatch() decorator. It lets you provide a generic function, but provide special cases based on the type of the first argument.
import functools
import random
@functools.singledispatch
def shuffle(arg, order):
"""this is the generic shuffle function"""
lst = list(arg)
return type(arg)(lst[i] for i in order)
@shuffle.register(dict)
def _(arg, order):
"""this is shuffle() specialized to handle dicts"""
item = list(arg.items())
return dict(item[i] for i in order)
def ordered_shuffle(*args):
min_length = min(map(len, args))
indices = random.sample(range(min_length), min_length)
return [shuffle(arg, indices) for arg in args]
Usage:
a = (1, 2, {3: 4}, 5)
b = [(5,6), [7,8], [9,0], [1,2]]
c = {'arrow': 5, 'knee': 'guard', 0: ('x',2)}
ordered_shuffle(a, b, c)
Output:
[({3: 4}, 1, 2),
[[9, 0], (5, 6), [7, 8]],
{0: ('x', 2), 'arrow': 5, 'knee': 'guard'}]
Generator endlessly shuffling and yielding:
def endless_shuffling(iterable):
values = list(iterable)
while True:
random.shuffle(values)
yield from values
Instead of your iter(all_angles), use endless_shuffling(all_angles) (and remove your own other shuffling).
One way to then get your list:
random_angles = endless_shuffling(range(-180, 180))
n_list = list(islice(random_angles, 1000))
If you give it an empty iterable and ask it for a value, it'll "hang", so either don't do that or guard against that case (e.g., with an extra if values: or with while values:).
I also tried a faster way to iterate than sending every value through a generator, but the shuffling dominates so it doesn't make a big difference:
with shuffling:
448.3 ms endless_shuffling1
426.7 ms endless_shuffling2
without shuffling:
26.4 ms endless_shuffling1
5.1 ms endless_shuffling2
Full code (Try it online!):
from random import shuffle
from itertools import chain, islice
from timeit import default_timer as time
def endless_shuffling1(iterable):
values = list(iterable)
while True:
shuffle(values)
yield from values
def endless_shuffling2(iterable):
values = list(iterable)
return chain.from_iterable(iter(
lambda: shuffle(values) or values,
[]
))
funcs = endless_shuffling1, endless_shuffling2
for f in funcs:
print(*islice(f('abc'), 21))
for i in range(6):
for f in funcs:
t0 = time()
next(islice(f(range(-180,180)), 999999, 1000000))
print('%5.1f ms ' % ((time() - t0) * 1e3), f.__name__)
print()
if i == 2:
print('without shuffling:\n')
def shuffle(x):
pass
Try this. It re-shuffles the list if it recieves an error while calling next:
import random
all_angles = list(range(-180,180))
random.shuffle(all_angles)
next_angle = iter(all_angles)
n_list = []
for i in range(1000):
try:
n_list.append(next(next_angle))
except StopIteration:
random.shuffle(all_angles)
next_angle = iter(all_angles)
n_list.append(next(next_angle))
print(len(n_list)) # 1000
I need to get some random distinct elements inside a range. I don't need to shuffle all the items, so collecting the iterator into a `vec` and calling the ` shuffle` method on it is not desirable. Is there a way to iterate over the range in random order and only get the first `n` items?
EDIT with some context:
I have arbitrarily large collections of `BigInt` numbers and I need to retrieve `n` values in a range from `0` to numbers even of hundreds or thousands of bits. I cannot collect 1 googol items in a `Vec` only to retrieve some tenths or hundreds values. Of course, I could assume that the probability of getting the same number twice is negligible in this specific case but, since the range is arbitrary large I could also have a range of 10 numbers and I must take 9 of them. As a general solution I have started getting random numbers in the target range and manually check if this has been already taken in a previous iteration but I was looking for a lazy solution that could avoid unnecessary checks or allocations.
random.shuffle should work. Here's an example, where the objects are lists:
from random import shuffle
x = [[i] for i in range(10)]
shuffle(x)
print(x)
# print(x) gives [[9], [2], [7], [0], [4], [5], [3], [1], [8], [6]]
Note that shuffle works in place, and returns None.
More generally in Python, mutable objects can be passed into functions, and when a function mutates those objects, the standard is to return None (rather than, say, the mutated object).
As you learned the in-place shuffling was the problem. I also have problem frequently, and often seem to forget how to copy a list, too. Using sample(a, len(a)) is the solution, using len(a) as the sample size. See https://docs.python.org/3.6/library/random.html#random.sample for the Python documentation.
Here's a simple version using random.sample() that returns the shuffled result as a new list.
import random
a = range(5)
b = random.sample(a, len(a))
print a, b, "two list same:", a == b
# print: [0, 1, 2, 3, 4] [2, 1, 3, 4, 0] two list same: False
# The function sample allows no duplicates.
# Result can be smaller but not larger than the input.
a = range(555)
b = random.sample(a, len(a))
print "no duplicates:", a == list(set(b))
try:
random.sample(a, len(a) + 1)
except ValueError as e:
print "Nope!", e
# print: no duplicates: True
# print: Nope! sample larger than population
As the documentation explains:
The functions supplied by this module are actually bound methods of a hidden instance of the random.Random class. You can instantiate your own instances of Random to get generators that don’t share state.
So, you can just create your own random.Random instance, with its own seed, which will not affect the global functions at all:
>>> import random
>>> x = [1, 2, 3, 4, 5, 6]
>>> random.Random(4).shuffle(x)
>>> x
[4, 6, 5, 1, 3, 2]
>>> x = [1, 2, 3, 4, 5, 6]
>>> random.Random(4).shuffle(x)
>>> x
[4, 6, 5, 1, 3, 2]
(You can also keep around the Random instance and re-seed it instead of creating new ones over and over; there's not too much difference.)
You can set the seed (which accepts the parameter) of your random generator, which will determinize your shuffling method
import random
x = [1, 2, 3, 4, 5, 6]
random.seed(4)
random.shuffle(x)
print x
and the result should be always
[2, 3, 6, 4, 5, 1]
In order to "rerandomize" the rest of the code you can simply reseed your random number generator with system time by running
random.seed()
after your "deterministic" part of code