When I don't want an upper bound I'll often use sys.maxint for the upper bound as an approximation
When I don't want an upper bound I'll often use sys.maxint for the upper bound as an approximation
You can't avoid an upper bound. How would the code work without one? This is how the code generates a random number between x and y:
0______________________________________________r__________________________________________1
r is a random decimal between 0 and 1. This is generated with a fixed algorithm.
Then, it takes r and multiplies it by the upper bound minus the lower bound. This pretty much means that 0 becomes x, and 1 becomes y. If rand is the random number, r : (1 - 0) :: rand : (y - x)
There actually is a way to generate a random number without an upper bound, but it is not logarithmically and not uniformly distributed. Take a look at this python algorithm:
import random
def randint():
i = 0
while True:
if random.random() < 0.5: # Or whatever other probability you want
return i
else:
i += 1
Pretty much, what this is doing is starting from zero, and then every time it has a 0.5 probability of returning that number; otherwise it continues.
This means that there is a 0.5 probability of it being 0, 25% for 1, 12.5% for 2, 5.25% for 3, etc. This is logarithmic distribution "without an upper bound".
Can Python generate a random number that excludes a set of numbers, without using recursion? - Stack Overflow
Whats your preferred way of generating a random number, e.g. between 1 and 100 ?
Generate random number without range for a given argument for the same length in python be it in numeric or alphanumeric - Stack Overflow
randomness - pick K random integers without repetition - Computer Science Stack Exchange
Videos
Generate one random number and map it onto your desired ranges of numbers.
If you wanted to generate an integer between 1-4 or 7-10, excluding 5 and 6, you might:
- Generate a random integer in the range
1-8 - If the random number is greater than 4, add
2to the result.
The mapping becomes:
Random number: 1 2 3 4 5 6 7 8
Result: 1 2 3 4 7 8 9 10
Doing it this way, you never need to "re-roll". The above example is for integers, but it can also be applied to floats.
Use random.choice(). In this example, a is your lower bound, the range between b and c is skipped and d is your upper bound.
import random
numbers = range(a,b) + range(c,d)
r = random.choice(numbers)
Looking to create a random number between 1 and 100. Actually 0 (zero) is valid too, so between 0-100.
Whats your preferred way of dong that in Python?
- For strings: Random shuffle of letters A-Z and number 0-9 and returns length of input seed
- For numbers: Shuffles digits 0-9 and converts to a number to return
- Shuffling is based upon the input (either string or number)
Code
import random
import string
def rand_gen(seed):
" Generate random strings and numbers based upon seed "
# Seed random number generator for shuffling
random.seed(seed)
# Alphabet based upon type of input (string or integer)
if isinstance(seed, int):
# Result based upon numbers 0-9
alphabet = string.digits
seed = str(seed)
is_int = True
else:
# Result uses letters in upon A-Z and 0-9
alphabet = string.ascii_uppercase + string.digits
is_int = False
# Output based upon random shuffling of alphabet
x = list(alphabet)
while True:
random.shuffle(x)
if x[0] != '0' or not is_digit: # Avoid left most digit being 0 for when working with numbers
break
output = ''.join(x[:len(seed)])
if is_int:
return int(output)
else:
return output
Test
for i in range(1, 15):
print(f'Numeric Seed: {i:<12} \tString: {rand_gen(i)}')
for i in range(1, 10):
seed = 'a'*i
print(f'Charact Seed: {seed:<12} String: {rand_gen(seed)}')
Output
Output width same as input seed
Numeric Seed: 1 String: 6
Numeric Seed: 2 String: 5
Numeric Seed: 3 String: 1
Numeric Seed: 4 String: 8
Numeric Seed: 5 String: 2
Numeric Seed: 6 String: 5
Numeric Seed: 7 String: 8
Numeric Seed: 8 String: 8
Numeric Seed: 9 String: 9
Numeric Seed: 10 String: 52
Numeric Seed: 11 String: 26
Numeric Seed: 12 String: 89
Numeric Seed: 13 String: 30
Numeric Seed: 14 String: 90
Charact Seed: a String: Q
Charact Seed: aa String: ZX
Charact Seed: aaa String: 5DE
Charact Seed: aaaa String: 3AV7
Charact Seed: aaaaa String: I2J76
Charact Seed: aaaaaa String: ZRHENX
Charact Seed: aaaaaaa String: R17ZXV4
Charact Seed: aaaaaaaa String: QEGZYTNA
Charact Seed: aaaaaaaaa String: VDIFQO7SH
by doing some research I will come with this ugly solution :
First you will generate a list with all ascii character:
value = '1230LE'
t = list('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')
Then remove from this long string the value character present in your string value (using t.index to find the index of the character):
for c in value:
del t[t.index(c)]
# Output (without '1', '2', '3', '0', 'L', 'E')
# ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','A','B','C','D','F','G','H','I','J','K','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','4','5','6','7','8','9']
Then you only need to do a basic generator based on your new string:
from random import choice
n = len(value)
str_characters = ''.join(t) # 'abcdefghijklmnopqrstuvwxyzABCDFGHIJKMNOPQRSTUVWXYZ456789'
generated_value = [choice(str_characters) for i in range(n)] # ['7', 'l', 'N', 'j', 'c', 'i']
''.join(generated_value) #'7lNjci'
Here you are, it's a tricky solution but it's working.
If you want to do it for only numeric you can use this value for t:
t = list('0123456789')
Full code :
value = '1230LE'
t = list('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')
for c in value:
del t[t.index(c)]
from random import choice
n = len(value)
str_characters = ''.join(t) # 'abcdefghijklmnopqrstuvwxyzABCDFGHIJKMNOPQRSTUVWXYZ456789'
generated_value = [choice(str_characters) for i in range(n)] # ['7', 'l', 'N', 'j', 'c', 'i']
''.join(generated_value) #'7lNjci'
Hope it will help, have fun !
Ṃųỻịgǻňạcểơửṩ mentioned that this is the reservoir problem. The reservoir sampling problem, though, is explicitly a very strict requirement for a streaming algorithm, i.e. it works in just one pass over a stream of values whose eventual size we don't know.
In this case, we want to do it in one pass, but we do know both N and K, so the problem is a little different and it permits a simpler solution.
Floyd's algorithm is designed for this task. It has some resemblance to both the Fisher-Yates shuffle and to reservoir sampling.
A couple of minor differences from the framing in your question:
- K < N
- we use randint(i, j) which produces a random integer between i and j inclusive
- the samples will be generated in the range 1 to N inclusive (but of course it is easy to adjust this at the end)
Here's the Python code:
import random
n = 200
k = 5
s = set()
for j in range(n-k+1, n+1):
t = random.randint(1, j)
if t not in s:
s.add(t)
else:
s.add(j)
print(s)
A good source for the origin of and mathematical justification for Floyd's algorithm is this article by Jon Bentley from 1987. It's worth noting that the algorithm only calls randint() K times.
A very brief description of how it works is that, on iteration t, it draws a number randomly from [1..n-k+t]. If the number has not been drawn before, it is added to the sample. Otherwise, it adds the number n-k+t to the sample. This works to give the proper probabilities of inclusion in the final set.
In Bentley's article there is a description of the recursive formulation of the algorithm:
We can appreciate the correctness of Floyd’s algorithm anecdotally. When M = 5 and N = 10, the algorithm first recursively computes in S a 4-element random sample from 1..9. Next it assigns to T a random integer in the range 1..10. Of the 10 values that T can assume, exactly 5 result in inserting 10 into S: the four values already in S, and the value 10 itself. Thus element 10 is inserted into the set with the correct probability of 5/10.
I can't do better than that explanation, right now!
The algorithm has been mentioned on StackOverflow and proofs are here on Math SE.
What quality of randomness do you need? You can construct a Linear Congruential Generator as a PRNG with period of N (or the next prime above N) and discard out-of-range samples.
Use rand() only to seed your LCG PRNG.
Of course you get the same sequence every time, just from different start points, unless you also try to randomize your LCG's multiplier and adder parameters. Or as John Bollinger puts it, for any given K it can produce only about N distinct K-element samples.
Re-randomizing the multiplier (a) and adder (c) constants while still satisfying the conditions for period m (the modulus) can make multiples sets of sequences possible. And/or raise m so there are more numbers between N and m you'd have to discard if generated. But there's limited freedom so this only goes so far. (And depending on LCG parameters there can be significant correlation between numbers.)
This requires O(1) storage and negligible time per result, just a 64-bit multiply, add, and modulo.
If you need bigints for this, I guess O(log2(N)) storage and time per query with a small constant factor. But 10^18 < 2^64. Initial setup time requires finding a prime above N.
Quality of randomness is low: even the best LCGs are not great pseudo-random generators by modern standards, although some of their downsides come from choices like a power-of-2 modulus which makes the low-order bits highly correlated. (Other downsides include a short period, which we're taking advantage of here.) But automated selection of LCG parameters to fit a given period can result in generators much worse than a "good" LCG. For some small N, pairs of consecutive numbers are common in the output sequence. Maybe different LCG parameters with the same or similar period could avoid that.
I used this in practice for a partly-finished subtree-pruning-regrafting library that I never did much with, GPLed code on Github with comments explaining the algorithm, citing Knuth TAOCP for the conditions that produce period m, and Numerical Recipes for some PRNG quality folklore and best practices. The LCG-selection part does work correctly, but basically gives up and makes the multiplier 1 and adder 0 in some corner cases. Also for tiny maxval = N, 6 or less, since for this library's purposes, 6 was few enough to just brute-force try all possibilities and see which tree gave the maximum likelihood.
struct lcg {
unsigned int state;
unsigned int a, c, m;
unsigned int startstate;
};
/******************** Linear Congruential Generator setup ************/
/* to generate all possible SPRs in a pseudo-random order, we generate
* all the numbers between 0 and the number of possible SPRs once each
* without repetition using an LCG of the form: x_n+1 = x_n*a + c mod m.
*/
/*
* Knuth: TAOCP 3.2.1: ex 2: if a and m are relatively prime,
* the number X_0 will always appear in the period. (will return to start?)
*
* 3.2.1.2: Theorem A: An LCG will have period m iff:
* - c is relatively prime to m
* - b = a-1 is a multiple of p, for every prime p dividing m
* - b is a multiple of 4, if m is a multiple of 4
*
* 3.2.1.3: ex 4: m = 2**e >= 8 -> maximum potency when a mod 8 = 5.
* small multipliers are to be avoided.
*
* Numerical recipies suggests c = a prime close to (1/2 - sqrt(3)/6)*m
*/
/* make up some parameters for an LCG that will have the maximum period
* equal to the range, so every value is generated once.
* When maxval doesn't have any repeated prime factors, a = m+1,
* which is the same as a=1. It's not exactly random, but it does still
* mix up which SPRs are done.
*
* TODO: take advantage of the fact that maxval = floor(sqrt(maxval))*ceil(sqrt(maxval))
* could do that, but then the code would be less general-purpose
* successfully brute-force tested for maxval=1..1000.
*/
void findlcg(struct lcg *lcg_params, int maxval)
{
unsigned int a, b, c, m = maxval;
int i;
primesetup (maxval+maxval/2);
if (m<=6){ // will be either 6 or 2. Just loop in order
b=0;
c=1;
}else{
int divlimit = m;
b=1; // b must be a multiple of all of m's prime factors
if (!(m%2)){
b=2;
while (divlimit%2 == 0) divlimit /= 2;
}
for (i=3 ; i <= divlimit ; i+=2){
if (is_prime(i) && m%i == 0){
b *= i;
while (divlimit%i == 0) divlimit /= i;
}
}
if (!(m%4)){ // if m is a mult of 4, b must be.
while (b%4) b *= 2;
}
/* make sure a isn't too small */
while (b<sqrtf(m)) b*=7;
if (b == m) b=0; // just give up and avoid overflow
// Numerical Recipies says there is "lore" behind this... :)
// TAOCP says it's useless unless the multiplier sucks (section 3.3.3, eq. 40)
// That would be us.
c = next_prime(max(5, (0.5 - sqrtf(3)/6.0)*m - 2));
while (m%c == 0) c = next_prime(c+1);
// Luckily we don't have to test for c>m, because it doesn't
// happen with any m<100, and there are enough primes later...
/* I've observed that when a == m, (e.g. a=13, c=13, m=72) you often get
* two consecutive numbers... Do something to avoid that if it's a problem */
}
a = b+1;
unsigned long long l = (unsigned long long)a * m;
if (l > ULONG_MAX){
fprintf(stderr, "spr: chosen Linear Congruential Generator is bogus\n"
" x_n+1 = x_n*%u+%u mod %u\n"
" a*m > ULONG_MAX, so it would overflow :(\n", a, c, m);
}
lcg_params->a = a;
lcg_params->c = c;
lcg_params->m = m;
lcg_params->startstate = UINT_MAX;
lcg_params->state = rand() % m;
}
My use-case was smallish trees so for prime finding I just used a straightforward Sieve to find true primes, not just relatively-prime which would be sufficient. Quality of pseudo-randomness was not a priority at the time, before moving on to other work. There might be room to spend more time choosing LCG parameters better than the algorithm shown here.