tensorflow data generator

Tensorflow2.x custom data generator with multiprocessing

stackoverflow.com › questions › 64356769 › tensorflow2-x-custom-data-generator-with-multiprocessing

With a tf.data pipeline, there are several spots where you can parallelize. Depending on how your data are stored and read, you can parallelize reading. You can also parallelize augmentation, and you can prefetch data as you train, so your GPU (or other hardware) is never hungry for data.

In the code below, I have demonstrated how you can parallelize augmentation and add prefetching.

import numpy as np
import tensorflow as tf

x_shape = (32, 32, 3)
y_shape = ()  # A single item (not array).
classes = 10

# This is tf.data.experimental.AUTOTUNE in older tensorflow.
AUTOTUNE = tf.data.AUTOTUNE

def generator_fn(n_samples):
    """Return a function that takes no arguments and returns a generator."""
    def generator():
        for i in range(n_samples):
            # Synthesize an image and a class label.
            x = np.random.random_sample(x_shape).astype(np.float32)
            y = np.random.randint(0, classes, size=y_shape, dtype=np.int32)
            yield x, y
    return generator

def augment(x, y):
    return x * tf.random.normal(shape=x_shape), y

samples = 10
batch_size = 5
epochs = 2

# Create dataset.
gen = generator_fn(n_samples=samples)
dataset = tf.data.Dataset.from_generator(
    generator=gen, 
    output_types=(np.float32, np.int32), 
    output_shapes=(x_shape, y_shape)
)
# Parallelize the augmentation.
dataset = dataset.map(
    augment, 
    num_parallel_calls=AUTOTUNE,
    # Order does not matter.
    deterministic=False
)
dataset = dataset.batch(batch_size, drop_remainder=True)
# Prefetch some batches.
dataset = dataset.prefetch(AUTOTUNE)

# Prepare model.
model = tf.keras.applications.VGG16(weights=None, input_shape=x_shape, classes=classes)
model.compile(optimizer="adam", loss="sparse_categorical_crossentropy")

# Train. Do not specify batch size because the dataset takes care of that.
model.fit(dataset, epochs=epochs)

Answer from jkr on Stack Overflow

TensorFlow

tensorflow.org › tensorflow core › tf.data: build tensorflow input pipelines

tf.data: Build TensorFlow input pipelines | TensorFlow Core

This allows it to restart the generator when it reaches the end. It takes an optional args argument, which is passed as the callable's arguments. The output_types argument is required because tf.data builds a tf.Graph internally, and graph edges require a tf.dtype.

Medium

medium.com › analytics-vidhya › write-your-own-custom-data-generator-for-tensorflow-keras-1252b64e41c3

Write your own Custom Data Generator for TensorFlow Keras | by Arjun Muraleedharan | Analytics Vidhya | Medium

March 25, 2021 - When you write a for loop with range(start, end, step) , it does not create a list with all the elements from start to end, but instead, it created a generator that can generate values from start to end and then it will create values on the go. Have you ever encountered a problem where the dataset you have is too big to be loaded into memory at once that you run out of RAM?

Discussions

tensorflow - Tensorflow2.x custom data generator with multiprocessing - Stack Overflow

I just upgraded to tensorflow 2.3. I want to make my own data generator for training. With tensorflow 1.x, I did this: def get_data_generator(test_flag): item_list = load_item_list(test_flag) p... More on stackoverflow.com

stackoverflow.com

python - How to train TensorFlow network using a generator to produce inputs? - Stack Overflow

The TensorFlow docs describe a bunch of ways to read data using TFRecordReader, TextLineReader, QueueRunner etc and queues. What I would like to do is much, much simpler: I have a python generator More on stackoverflow.com

stackoverflow.com

python - How to build a Custom Data Generator for Keras/tf.Keras where X images are being augmented and corresponding Y labels are also images - Stack Overflow

I am working on Image Binarization using UNet and have a dataset of 150 images and their binarized versions too. My idea is to augment the images randomly to make them look like they are differents... More on stackoverflow.com

stackoverflow.com

Dataset from generator is far slower than from tensor slices, anything I can improve?

And want some help on how to improve the performance using the generator. If I want to have a dataset with 10 elements for each records, how can I get better performance? Thanks so much in advance! import tensorflow as tf import numpy as np import time tf.compat.v1.enable_eager_execution( ... More on github.com

github.com

May 25, 2022

Videos