This function returns the mini-batches given the inputs and targets:

def iterate_minibatches(inputs, targets, batchsize, shuffle=False):
    assert inputs.shape[0] == targets.shape[0]
    if shuffle:
        indices = np.arange(inputs.shape[0])
        np.random.shuffle(indices)
    for start_idx in range(0, inputs.shape[0] - batchsize + 1, batchsize):
        if shuffle:
            excerpt = indices[start_idx:start_idx + batchsize]
        else:
            excerpt = slice(start_idx, start_idx + batchsize)
        yield inputs[excerpt], targets[excerpt]

and this tells you how to use that for training:

for n in xrange(n_epochs):
    for batch in iterate_minibatches(X, Y, batch_size, shuffle=True):
        x_batch, y_batch = batch
        l_train, acc_train = f_train(x_batch, y_batch)

    l_val, acc_val = f_val(Xt, Yt)
    logging.info('epoch ' + str(n) + ' ,train_loss ' + str(l_train) + ' ,acc ' + str(acc_train) + ' ,val_loss ' + str(l_val) + ' ,acc ' + str(acc_val))

Obviously you need to define the f_train, f_val and other functions yourself given the optimisation library (e.g. Lasagne, Keras) you are using.

Answer from Ash on Stack Overflow
๐ŸŒ
GeeksforGeeks
geeksforgeeks.org โ€บ machine learning โ€บ ml-mini-batch-gradient-descent-with-python
ML | Mini-Batch Gradient Descent with Python - GeeksforGeeks
July 5, 2025 - Depending on how much data is used ... ... Mini-batch gradient descent is a optimization method that updates model parameters using small subsets of the training data called mini-batches....
๐ŸŒ
Kenndanielso
kenndanielso.github.io โ€บ mlrefined โ€บ blog_posts โ€บ 13_Multilayer_perceptrons โ€บ 13_6_Stochastic_and_minibatch_gradient_descent.html
13.6 Stochastic and mini-batch gradient descent
Ideally we want all mini-batches to have the same size - a parameter we call the batch size - or be as equally-sized as possible when $J$ does not divide $P$. Notice, a batch size of $1$ turns mini-batch gradient descent into stochastic gradient descent, whereas a batch size of $P$ turns it into the standard or batch gradient descent. The code cell below contains Python implementation of the mini-batch gradient descent algorithm based on the standard gradient descent algorithm we saw previously in Chapter 6, where it is now slightly adjusted to take in the total number of data points as well as the size of each mini-batch via the input variables num_pts and batch_size, respectively.
๐ŸŒ
Medium
medium.com โ€บ @lomashbhuva โ€บ mini-batch-gradient-descent-a-comprehensive-guide-ba27a6dc4863
Mini-Batch Gradient Descent: A Comprehensive Guide๐ŸŒŸ๐Ÿš€ | by Lomash Bhuva | Medium
February 25, 2025 - In this guide, weโ€™ll explore mini-batch gradient descent, understand how it differs from other optimization techniques, and implement it in Python.
Top answer
1 of 2
13

This function returns the mini-batches given the inputs and targets:

def iterate_minibatches(inputs, targets, batchsize, shuffle=False):
    assert inputs.shape[0] == targets.shape[0]
    if shuffle:
        indices = np.arange(inputs.shape[0])
        np.random.shuffle(indices)
    for start_idx in range(0, inputs.shape[0] - batchsize + 1, batchsize):
        if shuffle:
            excerpt = indices[start_idx:start_idx + batchsize]
        else:
            excerpt = slice(start_idx, start_idx + batchsize)
        yield inputs[excerpt], targets[excerpt]

and this tells you how to use that for training:

for n in xrange(n_epochs):
    for batch in iterate_minibatches(X, Y, batch_size, shuffle=True):
        x_batch, y_batch = batch
        l_train, acc_train = f_train(x_batch, y_batch)

    l_val, acc_val = f_val(Xt, Yt)
    logging.info('epoch ' + str(n) + ' ,train_loss ' + str(l_train) + ' ,acc ' + str(acc_train) + ' ,val_loss ' + str(l_val) + ' ,acc ' + str(acc_val))

Obviously you need to define the f_train, f_val and other functions yourself given the optimisation library (e.g. Lasagne, Keras) you are using.

2 of 2
6

The following function returns (yields) mini-batches. It is based on the function provided by Ash, but correctly handles the last minibatch.

def iterate_minibatches(inputs, targets, batchsize, shuffle=False):
    assert inputs.shape[0] == targets.shape[0]
    if shuffle:
        indices = np.arange(inputs.shape[0])
        np.random.shuffle(indices)
    for start_idx in range(0, inputs.shape[0], batchsize):
        end_idx = min(start_idx + batchsize, inputs.shape[0])
        if shuffle:
            excerpt = indices[start_idx:end_idx]
        else:
            excerpt = slice(start_idx, end_idx)
        yield inputs[excerpt], targets[excerpt]
๐ŸŒ
MachineLearningMastery
machinelearningmastery.com โ€บ home โ€บ blog โ€บ a gentle introduction to mini-batch gradient descent and how to configure batch size
A Gentle Introduction to Mini-Batch Gradient Descent and How to Configure Batch Size - MachineLearningMastery.com
August 19, 2019 - That mini-batch gradient descent is the go-to method and how to configure it on your applications. Kick-start your project with my new book Deep Learning With Python, including step-by-step tutorials and the Python source code files for all examples.
๐ŸŒ
GeeksforGeeks
geeksforgeeks.org โ€บ deep learning โ€บ mini-batch-gradient-descent-in-deep-learning
Mini-Batch Gradient Descent in Deep Learning - GeeksforGeeks
September 30, 2025 - Instead of updating weights after calculating the error for each data point (in stochastic gradient descent) or after the entire dataset (in batch gradient descent), mini-batch gradient descent updates the modelโ€™s parameters after processing a mini-batch of data.
๐ŸŒ
Medium
medium.com โ€บ @zhaoyi0113 โ€บ python-implementation-of-batch-gradient-descent-379fa19eb428
Python implementation of batch gradient descent | by Joey Yi Zhao | Medium
July 26, 2023 - The parameter in this model is ฮธ values. These are the values we need to minimise the cost function J(ฮธ). One way to do this is to use batch gradient decent algorithm.
๐ŸŒ
MachineLearningMastery
machinelearningmastery.com โ€บ home โ€บ blog โ€บ mini-batch gradient descent and dataloader in pytorch
Mini-Batch Gradient Descent and DataLoader in PyTorch - MachineLearningMastery.com
April 8, 2023 - Mini-batch gradient descent is a variant of gradient descent algorithm that is commonly used to train deep learning models. The idea behind this algorithm is to divide the training data into batches, which are then processed sequentially.
Find elsewhere
๐ŸŒ
Real Python
realpython.com โ€บ gradient-descent-algorithm-python
Stochastic Gradient Descent Algorithm With Python and NumPy โ€“ Real Python
October 21, 2023 - Python has the built-in random module, and NumPy has its own random generator. The latter is more convenient when you work with arrays. Youโ€™ll create a new function called sgd() that is very similar to gradient_descent() but uses randomly selected minibatches to move along the search space: ... 1import numpy as np 2 3def sgd( 4 gradient, x, y, start, learn_rate=0.1, batch_size=1, n_iter=50, 5 tolerance=1e-06, dtype="float64", random_state=None 6): 7 # Checking if the gradient is callable 8 if not callable(gradient): 9 raise TypeError("'gradient' must be callable") 10 11 # Setting up the data
๐ŸŒ
Dive into Deep Learning
d2l.ai โ€บ chapter_optimization โ€บ minibatch-sgd.html
12.5. Minibatch Stochastic Gradient Descent โ€” Dive into Deep Learning 1.0.3 documentation
Letโ€™s see how optimization proceeds for batch gradient descent. This can be achieved by setting the minibatch size to 1500 (i.e., to the total number of examples). As a result the model parameters are updated only once per epoch. There is little progress.
๐ŸŒ
Medium
medium.com โ€บ @jaleeladejumo โ€บ gradient-descent-from-scratch-batch-gradient-descent-stochastic-gradient-descent-and-mini-batch-def681187473
Gradient Descent From Scratch- Batch Gradient Descent, Stochastic Gradient Descent, and Mini-Batch Gradient Descent. | by Jaleel Adejumo | Medium
April 12, 2023 - In this article, I will take you through the implementation of Batch Gradient Descent, Stochastic Gradient Descent, and Mini-Batch Gradient Descent coding from scratch in python. This will be beginners friendly. Understanding gradient descent method will help you in optimising your loss during ML model training.
๐ŸŒ
Medium
medium.com โ€บ @morepravin1989 โ€บ mini-batch-gradient-descent-the-balance-between-batch-and-stochastic-updates-aa900afc35e7
Mini-Batch Gradient Descent โ€” The Balance Between Batch and Stochastic Updates | by Pravin More | Medium
October 10, 2025 - import numpy as np # Generate synthetic data np.random.seed(42) X = 2 * np.random.rand(100, 1) y = 4 + 3 * X + np.random.randn(100, 1) # Initialize parameters theta = np.random.randn(2, 1) X_b = np.c_[np.ones((100, 1)), X] # add bias term # Hyperparameters learning_rate = 0.05 n_epochs = 50 batch_size = 20 m = len(X_b) for epoch in range(n_epochs): shuffled_indices = np.random.permutation(m) X_b_shuffled = X_b[shuffled_indices] y_shuffled = y[shuffled_indices] for i in range(0, m, batch_size): xi = X_b_shuffled[i:i+batch_size] yi = y_shuffled[i:i+batch_size] gradients = 2/batch_size * xi.T.dot
๐ŸŒ
CodeSignal
codesignal.com โ€บ learn โ€บ courses โ€บ pytorch-techniques-for-model-optimization โ€บ lessons โ€บ model-training-with-mini-batches-in-pytorch
Model Training with Mini-Batches in PyTorch
Full-Batch Gradient Descent: This ... because you have to shoot all the balls before making any adjustments. Mini-Batch Gradient Descent: This method is a middle ground....
๐ŸŒ
SERP
serp.co โ€บ posts โ€บ mini-batch-gradient-descent
Mini-Batch Gradient Descent - AI Glossary Definition - SERP Co
It is a variation of the gradient descent algorithm that splits the training dataset into small batches that are used to calculate model error and update model coefficients. This allows for faster convergence and better generalization. If you are interested in getting started with Mini- Batch ...
๐ŸŒ
Bogotobogo
bogotobogo.com โ€บ python โ€บ python_numpy_batch_gradient_descent_algorithm.php
Python Tutorial: batch gradient descent algorithm - 2020
Batch gradient descent algorithm Single Layer Neural Network - Perceptron model on the Iris dataset using Heaviside step activation function Batch gradient descent versus stochastic gradient descent Single Layer Neural Network - Adaptive Linear Neuron using linear (identity) activation function with batch gradient descent method Single Layer Neural Network : Adaptive Linear Neuron using linear (identity) activation function with stochastic gradient descent (SGD) Logistic Regression VC (Vapnik-Chervonenkis) Dimension and Shatter Bias-variance tradeoff Maximum Likelihood Estimation (MLE) Neural
๐ŸŒ
Kaggle
kaggle.com โ€บ code โ€บ pratinavseth โ€บ regression-mini-batch-gradient-descent-scratch
Regression + Mini Batch Gradient Descent Scratch
Checking your browser before accessing www.kaggle.com ยท Click here if you are not automatically redirected after 5 seconds
๐ŸŒ
Towards Data Science
towardsdatascience.com โ€บ home โ€บ latest โ€บ gradient descent, clearly explained in python, part 2: the compelling code.
Gradient Descent, clearly explained in Python, Part 2: The compelling code. | Towards Data Science
January 19, 2025 - Ok, almost there, just one more to get through! Now, in Mini Batch Gradient Descent, instead of computing the partial derivatives on the entire training set or a random example, we compute it on small subsets of the full training set.