Generate large random text files with python and NumPy

stackoverflow.com › questions › 45122635 › generate-large-random-text-files-with-python-and-numpy

Create a numpy array of letters:

In [662]: letters = np.array(list(chr(ord('a') + i) for i in range(26))); letters
Out[662]: 
array(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
       'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'],
      dtype='<U1')

Use np.random.choice to generate random indices b/w 0 and 26, and index letters to generate random text:

np.random.choice(letters, n)

Timings:

In [664]: n = 1024 ** 2

In [701]: %timeit np.random.choice(letters, n)
100 loops, best of 3: 15.1 ms per loop

Alternatively,

In [705]: %timeit np.random.choice(np.fromstring(letters, dtype='<U1'), n)
100 loops, best of 3: 14.1 ms per loop

Answer from coldspeed95 on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 45122635 › generate-large-random-text-files-with-python-and-numpy

string - Generate large random text files with python and NumPy - Stack Overflow

Top answer

1 of 1

Create a numpy array of letters:

In [662]: letters = np.array(list(chr(ord('a') + i) for i in range(26))); letters
Out[662]: 
array(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
       'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'],
      dtype='<U1')

Use np.random.choice to generate random indices b/w 0 and 26, and index letters to generate random text:

np.random.choice(letters, n)

Timings:

In [664]: n = 1024 ** 2

In [701]: %timeit np.random.choice(letters, n)
100 loops, best of 3: 15.1 ms per loop

Alternatively,

In [705]: %timeit np.random.choice(np.fromstring(letters, dtype='<U1'), n)
100 loops, best of 3: 14.1 ms per loop

Chegg

chegg.com › engineering › computer science › computer science questions and answers › assignment 09 files, input, and output q1: use the following function (random_txt_file) to generate a random text file. write a function that read this file and save the number of each character in the generated text file in table format. def random_txt_file(file_name: str, number_of_char: int) : generate random text file es # n = 1024 # 1 mb of text

Solved Assignment 09 Files, Input, and Output Q1: Use the | Chegg.com

January 11, 2026 - Write a function that read this ... int) : Generate random text file ES # n = 1024 # 1 Mb of text '.join(random.choices(string.ascii_uppercase, k = chars - number_of_char)) with open(file_name, 'w+') as f: f.write(chars) The output file: Character Repetition А 300 B 287 с D D : : Z Q2. Write a Python function that ...

Discussions

python - Generating random text strings of a given pattern - Stack Overflow

I need to generate random text strings of a particular format. Would like some ideas so that I can code it up in Python. The format is . More on stackoverflow.com

stackoverflow.com

Best way to generate random file names in Python - Stack Overflow

In Python, what is a good, or the best way to generate some random text to prepend to a file(name) that I'm saving to a server, just to make sure it does not overwrite. Thank you! More on stackoverflow.com

stackoverflow.com

Create random objects and inserting into text file

You can multiply a string with a number x and it'll repeat that string x times like this:

"a"*5 will become "aaaaa"

I also want to point out that your last if-statement is largely redundant. The break is unnecessary because the while condition would have ended the loop anyway and the myFile.close() is also not needed because the with-statement you use to open the file takes care of closing it for you. So i'd suggest removing all of that and move your print to after the while loop instead.

Another thing to consider is that in your file every datatype will repeat every 4 objects, whilst your example seems to sometimes have two objects of the same type after each other, you could build a new function that randomly selects a datatype and then call that four times instead.

Top answer

1 of 5

#!/usr/bin/python

import random
import string

digits = "".join( [random.choice(string.digits) for i in xrange(8)] )
chars = "".join( [random.choice(string.letters) for i in xrange(15)] )
print digits + chars

EDIT: liked the idea of using random.choice better than randint() so I've updated the code to reflect that.

Note: this assumes lowercase and uppercase characters are desired. If lowercase only then change the second list comprehension to read:

chars = "".join( [random.choice(string.letters[:26]) for i in xrange(15)] )

Obviously for uppercase only you can just flip that around so the slice is [26:] instead of the other way around.

2 of 5

See an example - Recipe 59873: Random Password Generation .

Building on the recipe, here is a solution to your question :

from random import choice
import string

def GenPasswd2(length=8, chars=string.letters + string.digits):
    return ''.join([choice(chars) for i in range(length)])

>>> GenPasswd2(8,string.digits) + GenPasswd2(15,string.ascii_letters)
'28605495YHlCJfMKpRPGyAw'
>>>

GitHub

github.com › topics › random-text

random-text · GitHub Topics · GitHub

python writing python3 python2 random-text imagination ... Generate AI-driven or local random text, save it to file, and verify integrity with SHA-256 hashing in an easy-to-use pipeline.

Stack Overflow

stackoverflow.com › questions › 10501247 › best-way-to-generate-random-file-names-in-python

Best way to generate random file names in Python - Stack Overflow

Top answer

1 of 15

212

You could use the UUID module for generating a random string:

import uuid
filename = str(uuid.uuid4())

This is a valid choice, given that an UUID generator is extremely unlikely to produce a duplicate identifier (a file name, in this case):

Only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%. The probability of one duplicate would be about 50% if every person on earth owns 600 million UUIDs.

2 of 15

160

Python has facilities to generate temporary file names, see http://docs.python.org/library/tempfile.html. For instance:

In [4]: import tempfile

Each call to tempfile.NamedTemporaryFile() results in a different temp file, and its name can be accessed with the .name attribute, e.g.:

In [5]: tf = tempfile.NamedTemporaryFile()
In [6]: tf.name
Out[6]: 'c:\\blabla\\locals~1\\temp\\tmptecp3i'

In [7]: tf = tempfile.NamedTemporaryFile()
In [8]: tf.name
Out[8]: 'c:\\blabla\\locals~1\\temp\\tmpr8vvme'

Once you have the unique filename it can be used like any regular file. Note: By default the file will be deleted when it is closed. However, if the delete parameter is False, the file is not automatically deleted.

Full parameter set:

tempfile.NamedTemporaryFile([mode='w+b'[, bufsize=-1[, suffix=''[, prefix='tmp'[, dir=None[, delete=True]]]]]])

it is also possible to specify the prefix for the temporary file (as one of the various parameters that can be supplied during the file creation):

In [9]: tf = tempfile.NamedTemporaryFile(prefix="zz")
In [10]: tf.name
Out[10]: 'c:\\blabla\\locals~1\\temp\\zzrc3pzk'

Additional examples for working with temporary files can be found here

reddit.com › r/learnpython › create random objects and inserting into text file

r/learnpython on Reddit: Create random objects and inserting into text file

September 8, 2015 -

Hey guys, so I am trying to brush up on my Python skills and was given a challenge recently to create random objects and insert them into a text file with the output not exceeding 10MB. Then I have to read the contents of the file and print back to the console every occurrence of those 4 objects and what type they are.

This was the exact challenge given to me:

Write a Python script that will generate four types of printable random objects and store them in a single file. Each object will be separated by a ",". These are the 4 data types: floats, alphabetical strings, integers, alphanumerics.

The alphanumerics should contain a random number of whitespaces before and after it (not exceeding 10 spaces). The output should be 10MB in size.

Sample extracted output:

hisadfnnasd, 126263, assfdgsga12348fas, 13123.123, lizierdjfklaasf, 123192u3kjwekhf, 89181811238,122, nmarcysfa900jkifh , 3.781, 2.11, ....

Create a program that will read the generated file above and print to the console the object and its type. Spaces before and after the alphanumeric object must be stripped.

Sample output:

youruasdifafasd - alphabetical strings 127371237 - integer asdfka12348fas - alphanumeric 13123.123 - real numbers asjdfklasdjfklaasf - alphabetical strings 123192u3kjwekhf - alphanumeric

For the first part, this is the code that I currently have:

import random
import string
import os

minLength = 5
maxLength = 30

fileName = 'output_file.txt'

open(fileName, 'w')

fileSize = os.stat(fileName).st_size

def random_alphanumerics(length):
    key = ''
    for i in range(length):
        key += random.choice(string.ascii_lowercase + string.digits)
    return key

with open(fileName, 'a') as myFile:
    while fileSize < 10485760:
        length = random.randint(minLength, maxLength)
        myAlphabets = ''.join(random.choice(string.ascii_lowercase) for x in range(length))
        myInt = random.randint(0, 10000)
        myReal = round(random.uniform(0.0, 10000.0), length)
        myAlphanumerics = random_alphanumerics(length)
        myFile.write(myAlphabets + ', {}'.format(myInt) + ', {}'.format(myReal) + ', ' + myAlphanumerics + ', ')
        fileSize = os.stat(fileName).st_size
        print(fileSize)
        if fileSize == 10485760:
          break
          print('Done')
          myFile.close()

Now I know this may not be the cleanest code, it does output to a file and creates the objects as needed. The only thing I can't figure out is how to implement the random whitespaces before and after the alphanumeric strings.

And don't even get me started on the second bit... I know regex comes into play but I haven't used Python in so long, I can't even remember Regex commands to use. Any help there would also be greatly appreciated :)

Also, feel free to tear at my code and reprimand me if need be as I want to learn from you guys hehe.

EDIT: So I redid the code from the help of u/TheLiberius and the final version is below. The only issue I have is that it it takes a long time to write the 10MB worth of data into the file. Any way to make it faster?

import random
import string
import os

### FUNCTIONS ###

# let's define a function for creating a string of random alphanumerical characters
def random_alphanumerics():
    length = random.randint(5, 30) # define the length of the alphanumeric string randomly between a range
    output = ''
    for i in range(length):
        output += random.choice(string.ascii_lowercase + string.digits) # create our alphanumeric string
    return output

# let's define a function for creating a string of random alphabetical characters
def random_string():
    length = random.randint(5, 30) # define the length of the alphabetical string randomly between a range
    output = ''.join(random.choice(string.ascii_lowercase) for x in range(length)) # we are using ascii lowercase so that encoding doesn't cause our file size to be unpredictable
    return output

# let's define a function for creating a random integers and converting them to a string
def random_int():
    output = random.randint(0, 10000)
    intToStr = '{}'.format(output)
    return intToStr

# let's define a function for creating random floats and converting them to a string
def random_float():
    length = random.randint(1, 10) # let's make sure the float is between a randomly chosen decimal place
    output = round(random.uniform(0.0, 10000.0), length)
    floatToStr = '{}'.format(output)
    return floatToStr

### time to write to our file!!! ###

# let's define our file name first
fileName = 'output_file.txt'

# we shall create and open our file
open(fileName, 'w')

# let's check the initial size of the file which should be 0
fileSize = os.stat(fileName).st_size

# alright let's open the file and append data to it
with open(fileName, 'a') as myFile:
    while fileSize < 10485760: # run the loop until fileSize is 10485760 bytes which should be shown as 10MB in any OS but in reality it is 10.48MB in actuality
        function_list = [random_alphanumerics, random_string, random_int, random_float] # put our functions into a list
        dataType = random.choice(function_list) # randomly choose a function to run
        if dataType == random_alphanumerics: # our alphanumeric string needs to have whitespaces before and after it so let's use an if statement for that
            output = random_alphanumerics()
            i = random.randint(0, 9) # the whitespaces shouldn't be more than 9
            output = ' '*i + output + ' '*i # put them altogether
        else:
            output = dataType()
        myFile.write(output + ', ')
        fileSize = os.stat(fileName).st_size
        print(fileSize)
    # once loop is done, print final file size and close file
    print('Final file size:', fileSize / 1000000, 'MB')
    myFile.close()

Top answer

1 of 1

You can multiply a string with a number x and it'll repeat that string x times like this:

"a"*5 will become "aaaaa"

Stack Overflow

stackoverflow.com › questions › 28883905 › writing-a-random-number-generator-to-a-text-file

python - Writing a random number generator to a text file - Stack Overflow

Top answer

1 of 3

All you need to do is place your write() statement inside of your for loop.

for count in range(12):
    #Get a random number.
    num = random.randint(1, 100)
    #Write 12 random intergers in the range of 1-100 on one line
    #to the file.
    outfile.write(str(num))

2 of 3

Your write statement needs to be inside your for loop:

for count in range(12):
    #Get a random number.
    num = random.randint(1, 100)
    #Write 12 random intergers in the range of 1-100 on one line
    #to the file.
    outfile.write(str(num) + ' ')#adds a space, unless you want the numbers to be all togerther

Your Write statement should be:

outfile = open('numbersmake.txt', 'a+')

So it will not overwrite the text already written, and it will create a new 'numbersmake.txt' if it doesn't exist.

Find elsewhere

Google Bing Mojeek

PyPI

pypi.org › project › random-file-generator

random-file-generator · PyPI

December 11, 2020 - pip install random-file-generator ## generate_files_parallel Generates random text files concurrently using multiprocessing. ### Parameters - `number_of_files`: Number of files to generate.

      » pip install random-file-generator

Published May 03, 2024

Version 0.0.3

Homepage https://github.com/NishithKashyap/Generate-Random-Files

Javatpoint

javatpoint.com › python-program-to-generate-a-random-string

Python Program to generate a Random String - Javatpoint

July 23, 2025 - Python Program to generate a Random String with python, tutorial, tkinter, button, overview, entry, checkbutton, canvas, frame, environment set-up, first python program, basics, operators, etc.

Quora

quora.com › How-can-I-randomly-choose-a-line-from-a-TXT-file-in-Python-3

How to randomly choose a line from a TXT file in Python 3 - Quora

Answer (1 of 4): You don’t say how big the file is. If it is relatively small - read the whole file in (using readlines(), and then use random.choice() [code]import random def choose_line(file_name): """Choose a line at random from the text file""" with open(file_name, 'r') as file: lines ...

PyPI

pypi.org › project › random-text-generator

random-text-generator

2 weeks ago - JavaScript is disabled in your browser · Please enable JavaScript to proceed · A required part of this site couldn’t load. This may be due to a browser extension, network issues, or browser settings. Please check your connection, disable any ad blockers, or try using a different browser

DaniWeb

daniweb.com › programming › software-development › threads › 136320 › help-generate-random-word-function-from-fiveletter-txt-file

python - help! generate random word function from ... [SOLVED] | DaniWeb

October 9, 2024 - You read the contents of the file into the variable t. At the very least you should be trying to use subscripts on t (even though that would still be wrong). randint returns a random integer, not a word.

Stack Overflow

stackoverflow.com › questions › 27932540 › how-to-make-a-random-text-python

string - How to make a random text python - Stack Overflow

Top answer

1 of 3

The easiest thing is to randomly select letters of the 25k file. Then the resultant has the same probability as the original.

import random
print(''.join(random.choice(original_text) for _ in range(500)))

2 of 3

You could do something like this:

import string
import random

def get_random_letter():
    # depends how you want to randomize getting your letter
    return random.choice(string.letters)

random_letters = []
for i in range(500):
    random_letter = get_random_letter()
    random_letters.append(random_letter)

with open("text.txt", 'w') as f:
    f.write("".join(random_letters))

You would change the "get_random_letter" definition depending on your probability model and return that character (in that case, you do not need to import random or string, these are just used for example).

Edit: To get the letter based on a certain weight you could use this:

import random

inputs = ['e', 'f', 'g', 'h']
weights = [10, 30, 50, 10]

def get_random_letter(inputs, weights):
    r = random.uniform(0, sum(weights))
    current_cutoff = 0
    for index in range(len(weights)):
        current_cutoff = current_cutoff + weights[index]
        if r < current_cutoff:
            return inputs[index]

print get_random_letter(inputs, weights)

which is derived from the post here: Returning a value at random based on a probability weights

PyPI

pypi.org › project › dummy-file-generator

dummy-file-generator · PyPI

April 25, 2018 - You need to generate dummy files based on the content of the text files in your data_files folder, and these source text files need to have this plain text format: This tool picks random item from each of the files configured for your project in config.json and uses these values to populate the data for "columns" for each written row.

      » pip install dummy-file-generator

Published Oct 31, 2022

Version 1.1.21

Homepage https://github.com/datahappy1/dummy_file_generator

PYnative

pynative.com › home › python › random › generate random strings and passwords in python

Generate Random Strings and Passwords in Python

December 8, 2022 - ... Below is the list of string constants you can use to get a different set of characters as a source for creating a random string. ... We can generate the random string using the random module and string module.

Super User

superuser.com › questions › 692175 › how-to-create-a-random-txthuman-readable-text-like-ascii-file-in-linux

How to create a random .txt(Human readable text like ascii) file in linux - Super User

Top answer

1 of 6

155

We can do it by following command

base64 /dev/urandom | head -c 10000000 > file.txt

It creates a file with name file.txt size of 10 MB.

2 of 6

get the output of:

tr -dc A-Za-z0-9 </dev/urandom

and pipe it to a file.

You can use head command with -c or -n to limit the file size

example to generate a 1kB file a.txt:

tr -dc A-Za-z0-9 </dev/urandom | head -c 1024 > a.txt

Stack Overflow

stackoverflow.com › questions › 12701816 › place-randomly-generated-numbers-into-text-file

python - Place randomly generated numbers into text file - Stack Overflow

Top answer

1 of 2

This could be cleaned up a bit more, but the basic changes made are:

adding a points container
changing file to f (you want to avoid defining variables with the same name as a Python built-in),
changing the format parameter to accept the tuple p (it will automatically unpack)
making some basic formatting changes.

All in all, you were very close - just a few basic things that needed tweaking.

import random

numpoints = 512
L = 20
points = set()

# Open f and write
with open("question1.xyz","w") as f:
    f.write("\ncomment goes here\n") #this is for the 2nd line in my xyz f
    while len(points) < numpoints:
        p = (random.randint(0, L), random.randint(0, L), random.randint(0, L))
        if p not in points:
            points.add(p)
            f.write('H %f %f %f\n' % p)

The following isn't any more efficient, but introduces the concept of recursion in order to generate your random point. The previous version works just fine - this is more for fun :)

import random

numpoints = 512
L = 20
points = set()

def RandomPoint(points):
    p = (random.randint(0, L), random.randint(0, L), random.randint(0, L))
    if p in points:
      p = RandomPoint(points)
    return p

# Open f and write
with open("question1.xyz","w") as f:
    f.write("\ncomment goes here\n") #this is for the 2nd line in my xyz f
    for point in xrange(0, numpoints):
        p = RandomPoint(points)
        points.add(p)
        f.write('H %f %f %f\n' % p)

2 of 2

Try something like this:

fh = open('filename', 'w')
fh.write(str(len(points)) + '\n')
fh.write("comment goes here\n")
for point in points:
    fh.write("H %1.6f %1.6f %1.6f\n" % (point[0],point[1],point[2]))
fh.flush()
fh.close()

Medium

stevenzych.medium.com › generate-random-text-in-python-with-numpy-and-string-formatting-9b6cd69de61d

Generate Random Text In Python With NumPy And String Formatting | by Steven Zych | Medium

January 10, 2021 - We’ll get “common” about 40 times in the same number of goes. We’ll use this knowledge to drive our text generator, so that we can tweak which values are more or less likely in any given spell it creates. And in order to do that, we need to talk about string formatting in Python.

Stack Overflow

stackoverflow.com › questions › 14275975 › creating-random-binary-files

python - Creating random binary files - Stack Overflow

Top answer

1 of 3

IMHO - the following is completely redundant:

f.write(struct.pack("=I",random.randint(0,sys.maxint*2+1)))

There's absolutely no need to use struct.pack, just do something like:

import os

fileSizeInBytes = 1024
with open('output_filename', 'wb') as fout:
    fout.write(os.urandom(fileSizeInBytes)) # replace 1024 with a size in kilobytes if it is not unreasonably large

Then, if you need to re-use the file for reading integers, then struct.unpack then.

(my use case is generating a file for a unit test so I just need a file that isn't identical with other generated files).

Another option is to just write a UUID4 to the file, but since I don't know the exact use case, I'm not sure that's viable.

2 of 3

The python code you should write completely depends on the way you intend to use the random binary file. If you just need a "rather good" randomness for multiple purposes, then the code of Jon Clements is probably the best.

However, on Linux OS at least, os.urandom relies on /dev/urandom, which is described in the Linux Kernel (drivers/char/random.c) as follows:

The /dev/urandom device [...] will return as many bytes as are requested. As more and more random bytes are requested without giving time for the entropy pool to recharge, this will result in random numbers that are merely cryptographically strong. For many applications, however, this is acceptable.

So the question is, is this acceptable for your application ? If you prefer a more secure RNG, you could read bytes on /dev/random instead. The main inconvenient of this device: it can block indefinitely if the Linux kernel is not able to gather enough entropy. There are also other cryptographically secure RNGs like EGD.

Alternatively, if your main concern is execution speed and if you just need some "light-randomness" for a Monte-Carlo method (i.e unpredictability doesn't matter, uniform distribution does), you could consider generate your random binary file once and use it many times, at least for development.