Create a numpy array of letters:
In [662]: letters = np.array(list(chr(ord('a') + i) for i in range(26))); letters
Out[662]:
array(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'],
dtype='<U1')
Use np.random.choice to generate random indices b/w 0 and 26, and index letters to generate random text:
np.random.choice(letters, n)
Timings:
In [664]: n = 1024 ** 2
In [701]: %timeit np.random.choice(letters, n)
100 loops, best of 3: 15.1 ms per loop
Alternatively,
In [705]: %timeit np.random.choice(np.fromstring(letters, dtype='<U1'), n)
100 loops, best of 3: 14.1 ms per loop
Answer from coldspeed95 on Stack Overflowpython - Generating random text strings of a given pattern - Stack Overflow
Best way to generate random file names in Python - Stack Overflow
Create random objects and inserting into text file
You can multiply a string with a number x and it'll repeat that string x times like this:
"a"*5 will become "aaaaa"
I also want to point out that your last if-statement is largely redundant. The break is unnecessary because the while condition would have ended the loop anyway and the myFile.close() is also not needed because the with-statement you use to open the file takes care of closing it for you. So i'd suggest removing all of that and move your print to after the while loop instead.
Another thing to consider is that in your file every datatype will repeat every 4 objects, whilst your example seems to sometimes have two objects of the same type after each other, you could build a new function that randomly selects a datatype and then call that four times instead.
More on reddit.compython - Writing a random number generator to a text file - Stack Overflow
#!/usr/bin/python
import random
import string
digits = "".join( [random.choice(string.digits) for i in xrange(8)] )
chars = "".join( [random.choice(string.letters) for i in xrange(15)] )
print digits + chars
EDIT: liked the idea of using random.choice better than randint() so I've updated the code to reflect that.
Note: this assumes lowercase and uppercase characters are desired. If lowercase only then change the second list comprehension to read:
chars = "".join( [random.choice(string.letters[:26]) for i in xrange(15)] )
Obviously for uppercase only you can just flip that around so the slice is [26:] instead of the other way around.
See an example - Recipe 59873: Random Password Generation .
Building on the recipe, here is a solution to your question :
from random import choice
import string
def GenPasswd2(length=8, chars=string.letters + string.digits):
return ''.join([choice(chars) for i in range(length)])
>>> GenPasswd2(8,string.digits) + GenPasswd2(15,string.ascii_letters)
'28605495YHlCJfMKpRPGyAw'
>>>
You could use the UUID module for generating a random string:
import uuid
filename = str(uuid.uuid4())
This is a valid choice, given that an UUID generator is extremely unlikely to produce a duplicate identifier (a file name, in this case):
Only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%. The probability of one duplicate would be about 50% if every person on earth owns 600 million UUIDs.
Python has facilities to generate temporary file names, see http://docs.python.org/library/tempfile.html. For instance:
In [4]: import tempfile
Each call to tempfile.NamedTemporaryFile() results in a different temp file, and its name can be accessed with the .name attribute, e.g.:
In [5]: tf = tempfile.NamedTemporaryFile()
In [6]: tf.name
Out[6]: 'c:\\blabla\\locals~1\\temp\\tmptecp3i'
In [7]: tf = tempfile.NamedTemporaryFile()
In [8]: tf.name
Out[8]: 'c:\\blabla\\locals~1\\temp\\tmpr8vvme'
Once you have the unique filename it can be used like any regular file. Note: By default the file will be deleted when it is
closed. However, if the delete parameter is False, the file is not
automatically deleted.
Full parameter set:
tempfile.NamedTemporaryFile([mode='w+b'[, bufsize=-1[, suffix=''[, prefix='tmp'[, dir=None[, delete=True]]]]]])
it is also possible to specify the prefix for the temporary file (as one of the various parameters that can be supplied during the file creation):
In [9]: tf = tempfile.NamedTemporaryFile(prefix="zz")
In [10]: tf.name
Out[10]: 'c:\\blabla\\locals~1\\temp\\zzrc3pzk'
Additional examples for working with temporary files can be found here
Hey guys, so I am trying to brush up on my Python skills and was given a challenge recently to create random objects and insert them into a text file with the output not exceeding 10MB. Then I have to read the contents of the file and print back to the console every occurrence of those 4 objects and what type they are.
This was the exact challenge given to me:
Write a Python script that will generate four types of printable random objects and store them in a single file. Each object will be separated by a ",". These are the 4 data types: floats, alphabetical strings, integers, alphanumerics.
The alphanumerics should contain a random number of whitespaces before and after it (not exceeding 10 spaces). The output should be 10MB in size.
Sample extracted output:
hisadfnnasd, 126263, assfdgsga12348fas, 13123.123, lizierdjfklaasf, 123192u3kjwekhf, 89181811238,122, nmarcysfa900jkifh , 3.781, 2.11, ....
Create a program that will read the generated file above and print to the console the object and its type. Spaces before and after the alphanumeric object must be stripped.
Sample output:
youruasdifafasd - alphabetical strings 127371237 - integer asdfka12348fas - alphanumeric 13123.123 - real numbers asjdfklasdjfklaasf - alphabetical strings 123192u3kjwekhf - alphanumeric
For the first part, this is the code that I currently have:
import random
import string
import os
minLength = 5
maxLength = 30
fileName = 'output_file.txt'
open(fileName, 'w')
fileSize = os.stat(fileName).st_size
def random_alphanumerics(length):
key = ''
for i in range(length):
key += random.choice(string.ascii_lowercase + string.digits)
return key
with open(fileName, 'a') as myFile:
while fileSize < 10485760:
length = random.randint(minLength, maxLength)
myAlphabets = ''.join(random.choice(string.ascii_lowercase) for x in range(length))
myInt = random.randint(0, 10000)
myReal = round(random.uniform(0.0, 10000.0), length)
myAlphanumerics = random_alphanumerics(length)
myFile.write(myAlphabets + ', {}'.format(myInt) + ', {}'.format(myReal) + ', ' + myAlphanumerics + ', ')
fileSize = os.stat(fileName).st_size
print(fileSize)
if fileSize == 10485760:
break
print('Done')
myFile.close()Now I know this may not be the cleanest code, it does output to a file and creates the objects as needed. The only thing I can't figure out is how to implement the random whitespaces before and after the alphanumeric strings.
And don't even get me started on the second bit... I know regex comes into play but I haven't used Python in so long, I can't even remember Regex commands to use. Any help there would also be greatly appreciated :)
Also, feel free to tear at my code and reprimand me if need be as I want to learn from you guys hehe.
EDIT: So I redid the code from the help of u/TheLiberius and the final version is below. The only issue I have is that it it takes a long time to write the 10MB worth of data into the file. Any way to make it faster?
import random
import string
import os
### FUNCTIONS ###
# let's define a function for creating a string of random alphanumerical characters
def random_alphanumerics():
length = random.randint(5, 30) # define the length of the alphanumeric string randomly between a range
output = ''
for i in range(length):
output += random.choice(string.ascii_lowercase + string.digits) # create our alphanumeric string
return output
# let's define a function for creating a string of random alphabetical characters
def random_string():
length = random.randint(5, 30) # define the length of the alphabetical string randomly between a range
output = ''.join(random.choice(string.ascii_lowercase) for x in range(length)) # we are using ascii lowercase so that encoding doesn't cause our file size to be unpredictable
return output
# let's define a function for creating a random integers and converting them to a string
def random_int():
output = random.randint(0, 10000)
intToStr = '{}'.format(output)
return intToStr
# let's define a function for creating random floats and converting them to a string
def random_float():
length = random.randint(1, 10) # let's make sure the float is between a randomly chosen decimal place
output = round(random.uniform(0.0, 10000.0), length)
floatToStr = '{}'.format(output)
return floatToStr
### time to write to our file!!! ###
# let's define our file name first
fileName = 'output_file.txt'
# we shall create and open our file
open(fileName, 'w')
# let's check the initial size of the file which should be 0
fileSize = os.stat(fileName).st_size
# alright let's open the file and append data to it
with open(fileName, 'a') as myFile:
while fileSize < 10485760: # run the loop until fileSize is 10485760 bytes which should be shown as 10MB in any OS but in reality it is 10.48MB in actuality
function_list = [random_alphanumerics, random_string, random_int, random_float] # put our functions into a list
dataType = random.choice(function_list) # randomly choose a function to run
if dataType == random_alphanumerics: # our alphanumeric string needs to have whitespaces before and after it so let's use an if statement for that
output = random_alphanumerics()
i = random.randint(0, 9) # the whitespaces shouldn't be more than 9
output = ' '*i + output + ' '*i # put them altogether
else:
output = dataType()
myFile.write(output + ', ')
fileSize = os.stat(fileName).st_size
print(fileSize)
# once loop is done, print final file size and close file
print('Final file size:', fileSize / 1000000, 'MB')
myFile.close()All you need to do is place your write() statement inside of your for loop.
for count in range(12):
#Get a random number.
num = random.randint(1, 100)
#Write 12 random intergers in the range of 1-100 on one line
#to the file.
outfile.write(str(num))
Your write statement needs to be inside your for loop:
for count in range(12): #Get a random number. num = random.randint(1, 100) #Write 12 random intergers in the range of 1-100 on one line #to the file. outfile.write(str(num) + ' ')#adds a space, unless you want the numbers to be all togertherYour Write statement should be:
outfile = open('numbersmake.txt', 'a+')So it will not overwrite the text already written, and it will create a new 'numbersmake.txt' if it doesn't exist.
» pip install random-file-generator
The easiest thing is to randomly select letters of the 25k file. Then the resultant has the same probability as the original.
import random
print(''.join(random.choice(original_text) for _ in range(500)))
You could do something like this:
import string
import random
def get_random_letter():
# depends how you want to randomize getting your letter
return random.choice(string.letters)
random_letters = []
for i in range(500):
random_letter = get_random_letter()
random_letters.append(random_letter)
with open("text.txt", 'w') as f:
f.write("".join(random_letters))
You would change the "get_random_letter" definition depending on your probability model and return that character (in that case, you do not need to import random or string, these are just used for example).
Edit: To get the letter based on a certain weight you could use this:
import random
inputs = ['e', 'f', 'g', 'h']
weights = [10, 30, 50, 10]
def get_random_letter(inputs, weights):
r = random.uniform(0, sum(weights))
current_cutoff = 0
for index in range(len(weights)):
current_cutoff = current_cutoff + weights[index]
if r < current_cutoff:
return inputs[index]
print get_random_letter(inputs, weights)
which is derived from the post here: Returning a value at random based on a probability weights
» pip install dummy-file-generator
We can do it by following command
base64 /dev/urandom | head -c 10000000 > file.txt
It creates a file with name file.txt size of 10 MB.
get the output of:
tr -dc A-Za-z0-9 </dev/urandom
and pipe it to a file.
You can use head command with -c or -n to limit the file size
example to generate a 1kB file a.txt:
tr -dc A-Za-z0-9 </dev/urandom | head -c 1024 > a.txt
This could be cleaned up a bit more, but the basic changes made are:
- adding a
pointscontainer - changing
filetof(you want to avoid defining variables with the same name as a Python built-in), - changing the format parameter to accept the tuple
p(it will automatically unpack) - making some basic formatting changes.
All in all, you were very close - just a few basic things that needed tweaking.
import random
numpoints = 512
L = 20
points = set()
# Open f and write
with open("question1.xyz","w") as f:
f.write("\ncomment goes here\n") #this is for the 2nd line in my xyz f
while len(points) < numpoints:
p = (random.randint(0, L), random.randint(0, L), random.randint(0, L))
if p not in points:
points.add(p)
f.write('H %f %f %f\n' % p)
The following isn't any more efficient, but introduces the concept of recursion in order to generate your random point. The previous version works just fine - this is more for fun :)
import random
numpoints = 512
L = 20
points = set()
def RandomPoint(points):
p = (random.randint(0, L), random.randint(0, L), random.randint(0, L))
if p in points:
p = RandomPoint(points)
return p
# Open f and write
with open("question1.xyz","w") as f:
f.write("\ncomment goes here\n") #this is for the 2nd line in my xyz f
for point in xrange(0, numpoints):
p = RandomPoint(points)
points.add(p)
f.write('H %f %f %f\n' % p)
Try something like this:
fh = open('filename', 'w')
fh.write(str(len(points)) + '\n')
fh.write("comment goes here\n")
for point in points:
fh.write("H %1.6f %1.6f %1.6f\n" % (point[0],point[1],point[2]))
fh.flush()
fh.close()
IMHO - the following is completely redundant:
f.write(struct.pack("=I",random.randint(0,sys.maxint*2+1)))
There's absolutely no need to use struct.pack, just do something like:
import os
fileSizeInBytes = 1024
with open('output_filename', 'wb') as fout:
fout.write(os.urandom(fileSizeInBytes)) # replace 1024 with a size in kilobytes if it is not unreasonably large
Then, if you need to re-use the file for reading integers, then struct.unpack then.
(my use case is generating a file for a unit test so I just need a file that isn't identical with other generated files).
Another option is to just write a UUID4 to the file, but since I don't know the exact use case, I'm not sure that's viable.
The python code you should write completely depends on the way you intend to use the random binary file. If you just need a "rather good" randomness for multiple purposes, then the code of Jon Clements is probably the best.
However, on Linux OS at least, os.urandom relies on /dev/urandom, which is described in the Linux Kernel (drivers/char/random.c) as follows:
The /dev/urandom device [...] will return as many bytes as are requested. As more and more random bytes are requested without giving time for the entropy pool to recharge, this will result in random numbers that are merely cryptographically strong. For many applications, however, this is acceptable.
So the question is, is this acceptable for your application ? If you prefer a more secure RNG, you could read bytes on /dev/random instead. The main inconvenient of this device: it can block indefinitely if the Linux kernel is not able to gather enough entropy. There are also other cryptographically secure RNGs like EGD.
Alternatively, if your main concern is execution speed and if you just need some "light-randomness" for a Monte-Carlo method (i.e unpredictability doesn't matter, uniform distribution does), you could consider generate your random binary file once and use it many times, at least for development.