I am going to present four different methods for computing such a kernel, followed by a comparison of their run-time.

Using pure numpy

Here, I use the fact that ||x-y||^2 = ||x||^2 + ||y||^2 - 2 * x^T * y.

import numpy as np

X_norm = np.sum(X ** 2, axis = -1)
K = var * np.exp(-gamma * (X_norm[:,None] + X_norm[None,:] - 2 * np.dot(X, X.T)))

Using numexpr

numexpr is a python package that allows for efficient and parallelized array operations on numpy arrays. We can use it as follows to perform the same computation as above:

import numpy as np
import numexpr as ne

X_norm = np.sum(X ** 2, axis = -1)
K = ne.evaluate('v * exp(-g * (A + B - 2 * C))', {
        'A' : X_norm[:,None],
        'B' : X_norm[None,:],
        'C' : np.dot(X, X.T),
        'g' : gamma,
        'v' : var
})

Using scipy.spatial.distance.pdist

We could also use scipy.spatial.distance.pdist to compute a non-redundant array of pairwise squared euclidean distances, compute the kernel on that array and then transform it to a square matrix:

import numpy as np
from scipy.spatial.distance import pdist, squareform

K = squareform(var * np.exp(-gamma * pdist(X, 'sqeuclidean')))
K[np.arange(K.shape[0]), np.arange(K.shape[1])] = var

Using sklearn.metrics.pairwise.rbf_kernel

sklearn provides a built-in method for direct computation of an RBF kernel:

import numpy as np
from sklearn.metrics.pairwise import rbf_kernel

K = var * rbf_kernel(X, gamma = gamma)

Run-time comparison

I use 25,000 random samples of 512 dimensions for testing and perform experiments on an Intel Core i7-7700HQ (4 cores @ 2.8 GHz). More precisely:

X = np.random.randn(25000, 512)
gamma = 0.01
var = 5.0

Each method is run 7 times and the mean and standard deviation of the time per execution is reported.

|               Method                |       Time        |
|-------------------------------------|-------------------|
| numpy                               | 24.2 s ± 1.06 s   |
| numexpr                             | 8.89 s ± 314 ms   |
| scipy.spatial.distance.pdist        | 2min 59s ± 312 ms |
| sklearn.metrics.pairwise.rbf_kernel | 13.9 s ± 757 ms   |

First of all, scipy.spatial.distance.pdist is surprisingly slow.

numexpr is almost 3 times faster than the pure numpy method, but this speed-up factor will vary with the number of available CPUs.

sklearn.metrics.pairwise.rbf_kernel is not the fastest way, but only a bit slower than numexpr.

Answer from Callidior on Stack Overflow
🌐
Medium
ujangriswanto08.medium.com › step-by-step-implementation-of-the-rbf-kernel-in-python-or-r-a498b3acf9d6
Step-by-Step Implementation of the RBF Kernel in Python (or R) | by Ujang Riswanto | Medium
July 11, 2025 - In this guide, we’ll break down the RBF kernel step by step, implementing it from scratch in both Python and R. We’ll also see how it performs when used with an SVM and discuss how to fine-tune it for the best results. Let’s dive in!
Top answer
1 of 4
22

I am going to present four different methods for computing such a kernel, followed by a comparison of their run-time.

Using pure numpy

Here, I use the fact that ||x-y||^2 = ||x||^2 + ||y||^2 - 2 * x^T * y.

import numpy as np

X_norm = np.sum(X ** 2, axis = -1)
K = var * np.exp(-gamma * (X_norm[:,None] + X_norm[None,:] - 2 * np.dot(X, X.T)))

Using numexpr

numexpr is a python package that allows for efficient and parallelized array operations on numpy arrays. We can use it as follows to perform the same computation as above:

import numpy as np
import numexpr as ne

X_norm = np.sum(X ** 2, axis = -1)
K = ne.evaluate('v * exp(-g * (A + B - 2 * C))', {
        'A' : X_norm[:,None],
        'B' : X_norm[None,:],
        'C' : np.dot(X, X.T),
        'g' : gamma,
        'v' : var
})

Using scipy.spatial.distance.pdist

We could also use scipy.spatial.distance.pdist to compute a non-redundant array of pairwise squared euclidean distances, compute the kernel on that array and then transform it to a square matrix:

import numpy as np
from scipy.spatial.distance import pdist, squareform

K = squareform(var * np.exp(-gamma * pdist(X, 'sqeuclidean')))
K[np.arange(K.shape[0]), np.arange(K.shape[1])] = var

Using sklearn.metrics.pairwise.rbf_kernel

sklearn provides a built-in method for direct computation of an RBF kernel:

import numpy as np
from sklearn.metrics.pairwise import rbf_kernel

K = var * rbf_kernel(X, gamma = gamma)

Run-time comparison

I use 25,000 random samples of 512 dimensions for testing and perform experiments on an Intel Core i7-7700HQ (4 cores @ 2.8 GHz). More precisely:

X = np.random.randn(25000, 512)
gamma = 0.01
var = 5.0

Each method is run 7 times and the mean and standard deviation of the time per execution is reported.

|               Method                |       Time        |
|-------------------------------------|-------------------|
| numpy                               | 24.2 s ± 1.06 s   |
| numexpr                             | 8.89 s ± 314 ms   |
| scipy.spatial.distance.pdist        | 2min 59s ± 312 ms |
| sklearn.metrics.pairwise.rbf_kernel | 13.9 s ± 757 ms   |

First of all, scipy.spatial.distance.pdist is surprisingly slow.

numexpr is almost 3 times faster than the pure numpy method, but this speed-up factor will vary with the number of available CPUs.

sklearn.metrics.pairwise.rbf_kernel is not the fastest way, but only a bit slower than numexpr.

2 of 4
8

Well you are doing a lot of optimizations in your answer post. I would like to add few more (mostly tweaks). I would build upon the winner from the answer post, which seems to be numexpr based on.

Tweak #1

First off, np.sum(X ** 2, axis = -1) could be optimized with np.einsum. Though this part isn't the biggest overhead, but optimization of any sort won't hurt. So, that summation could be expressed as -

X_norm = np.einsum('ij,ij->i',X,X)

Tweak #2

Secondly, we could leverage Scipy supported blas functions and if allowed use single-precision dtype for noticeable performance improvement over its double precision one. Hence, np.dot(X, X.T) could be computed with SciPy's sgemm like so -

sgemm(alpha=1.0, a=X, b=X, trans_b=True)

Few more tweaks on rearranging the negative sign with gamma lets us feed more to sgemm. Also, we would push in gamma into the alpha term.

Tweaked implementations

Thus, with these two optimizations, we would have two more variants (if I could put it that way) of the numexpr method, listed below -

from scipy.linalg.blas import sgemm

def app1(X, gamma, var):
    X_norm = -np.einsum('ij,ij->i',X,X)
    return ne.evaluate('v * exp(g * (A + B + 2 * C))', {\
        'A' : X_norm[:,None],\
        'B' : X_norm[None,:],\
        'C' : np.dot(X, X.T),\
        'g' : gamma,\
        'v' : var\
    })

def app2(X, gamma, var):
    X_norm = -gamma*np.einsum('ij,ij->i',X,X)
    return ne.evaluate('v * exp(A + B + C)', {\
        'A' : X_norm[:,None],\
        'B' : X_norm[None,:],\
        'C' : sgemm(alpha=2.0*gamma, a=X, b=X, trans_b=True),\
        'g' : gamma,\
        'v' : var\
    })

Runtime test

Numexpr based one from your answer post -

def app0(X, gamma, var):
    X_norm = np.sum(X ** 2, axis = -1)
    return ne.evaluate('v * exp(-g * (A + B - 2 * C))', {
            'A' : X_norm[:,None],
            'B' : X_norm[None,:],
            'C' : np.dot(X, X.T),
            'g' : gamma,
            'v' : var
    })

Timings and verification -

In [165]: # Setup
     ...: X = np.random.randn(10000, 512)
     ...: gamma = 0.01
     ...: var = 5.0

In [166]: %timeit app0(X, gamma, var)
     ...: %timeit app1(X, gamma, var)
     ...: %timeit app2(X, gamma, var)
1 loop, best of 3: 1.25 s per loop
1 loop, best of 3: 1.24 s per loop
1 loop, best of 3: 973 ms per loop

In [167]: np.allclose(app0(X, gamma, var), app1(X, gamma, var))
Out[167]: True

In [168]: np.allclose(app0(X, gamma, var), app2(X, gamma, var))
Out[168]: True
🌐
GitHub
github.com › xbeat › Machine-Learning › blob › main › The Mathematics of RBF Kernel in Python.md
Machine-Learning/The Mathematics of RBF Kernel in Python.md at main · xbeat/Machine-Learning
Low gamma values mean far away points have a high influence, while high gamma values mean only nearby points have a high influence. import numpy as np import matplotlib.pyplot as plt def rbf_kernel(x, y, gamma): return np.exp(-gamma * ...
Author   xbeat
🌐
scikit-learn
scikit-learn.org › stable › modules › generated › sklearn.metrics.pairwise.rbf_kernel.html
rbf_kernel — scikit-learn 1.8.0 documentation
>>> from sklearn.metrics.pairwise import rbf_kernel >>> X = [[0, 0, 0], [1, 1, 1]] >>> Y = [[1, 0, 0], [1, 1, 0]] >>> rbf_kernel(X, Y) array([[0.71, 0.51], [0.51, 0.71]])
🌐
scikit-learn
scikit-learn.org › stable › auto_examples › svm › plot_rbf_parameters.html
RBF SVM parameters — scikit-learn 1.8.0 documentation
Download Jupyter notebook: plot_rbf_parameters.ipynb · Download Python source code: plot_rbf_parameters.py
🌐
GitHub
github.com › topics › rbf-kernel
rbf-kernel · GitHub Topics · GitHub
Open Source Code for Data-Driven Dimensional Analysis. data-driven physics rbf-kernel dimensional-analysis voronoi mechanics realworld fuzzy-clustering noisy-data ... deep-neural-networks gpu rbf-kernel vgg image-classification object-detection nas bert-model training-free-nas large-language-model ... Image Processing and classification using Machine Learning : Image Classification using Open CV and SVM machine learning model · python opencv machine-learning svm rbf-kernel scikit-learn pandas classification indian one-to-one rbf hu-moments one-vs-rest dances
Find elsewhere
🌐
DZone
dzone.com › data engineering › ai/ml › svm rbf kernel parameters with code examples
SVM RBF Kernel Parameters With Code Examples
July 28, 2020 - In this post, you will learn about SVM RBF (Radial Basis Function) kernel hyperparameters with the python code example.
🌐
Bogotobogo
bogotobogo.com › python › scikit-learn › scikit_machine_learning_Linearly_Separable_NonLinearly_RBF_Separable_Data_SVM_GUI.php
scikit-learn : Supervised Learning - Radial Basis Function kernel, RBF - 2020
"In machine learning, the (Gaussian) radial basis function kernel, or RBF kernel, is a popular kernel function used in support vector machine classification." - Radial basis function kernel ... Let's see how a nonlinear classification problem looks like using a sample dataset created by XOR logical operation (outputs true only when inputs differ - one is true, the other is false). In the code below, we create XOR gate dataset (500 samples with either a class label of 1 or -1) using NumPy's logical_xor function:
🌐
scikit-learn
scikit-learn.org › 1.5 › modules › generated › sklearn.metrics.pairwise.rbf_kernel.html
rbf_kernel — scikit-learn 1.5.2 documentation
>>> from sklearn.metrics.pairwise import rbf_kernel >>> X = [[0, 0, 0], [1, 1, 1]] >>> Y = [[1, 0, 0], [1, 1, 0]] >>> rbf_kernel(X, Y) array([[0.71..., 0.51...], [0.51..., 0.71...]])
🌐
VitalFlux
vitalflux.com › home › data science › svm rbf kernel parameters: python examples
SVM RBF Kernel Parameters: Python Examples - Analytics Yogi
April 15, 2023 - It can thus be understood that the selection of appropriate values of Gamma is important. Here is the code which is used. svm = SVC(kernel='rbf', random_state=1, gamma=0.008, C=0.1) svm.fit(X_train_std, y_train)
🌐
scikit-learn
scikit-learn.org › 1.5 › modules › generated › sklearn.gaussian_process.kernels.RBF.html
scikit-learn: RBF Kernel
Determines whether the gradient with respect to the log of the kernel hyperparameter is computed.
🌐
Kaggle
kaggle.com › code › manmohan291 › 16-sklearn-svm-rbf-kernel
16 SKLearn - SVM RBF Kernel
Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds
🌐
Towards Data Science
towardsdatascience.com › home › latest › svm classifier and rbf kernel – how to make better models in python
SVM Classifier and RBF Kernel - How to Make Better Models in Python | Towards Data Science
January 23, 2025 - SVM with RBF kernel and high gamma. See how it was created in the Python section at the end of this story. Image by author. It is essential to understand how different Machine Learning algorithms work to succeed in your Data Science projects. I have written this story as part of the series that dives into each ML algorithm explaining its mechanics, supplemented by Python code examples and intuitive visualizations.
🌐
Read the Docs
gpy.readthedocs.io › en › deploy › _modules › GPy › kern › src › rbf.html
Source code for GPy.kern.src.rbf
[docs] def parameters_changed(self): if self.use_invLengthscale: self.lengthscale[:] = 1./np.sqrt(self.inv_l+1e-200) super(RBF,self).parameters_changed() [docs] def get_one_dimensional_kernel(self, dim): """ Specially intended for Grid regression.
🌐
Quark Machine Learning
quarkml.com › home › data science › machine learning
The RBF kernel in SVM: A Complete Guide - Quark Machine Learning
April 6, 2025 - In this article, we’ll discuss what exactly makes this kernel so powerful, look at its working, and study examples of it in action. We’ll also provide code samples for implementing the RBF kernel from scratch in Python that illustrates how to use the RBF kernel on your own data sets.
Top answer
1 of 1
5

This type of SVM is often implemented with the SMO algorithm. You may want check for the original published version (Platt, John. Fast Training of Support Vector Machines using Sequential Minimal Optimization, in Advances in Kernel Methods Support Vector Learning, B. Scholkopf, C. Burges, A. Smola, eds., MIT Press (1998)), but it is quite complicated as for me.

A bit simplified version is presented in Stanford Lecture Notes, but derivation of all the formulas should be found somewhere else (e.g. this random notes I found on the Internet).

As an alternative I can propose you my own variation of the SMO algorithm. It is highly simplified, implementation contains a bit more than 30 lines of code

class SVM:
  def __init__(self, kernel='linear', C=10000.0, max_iter=100000, degree=3, gamma=1):
    self.kernel = {'poly':lambda x,y: np.dot(x, y.T)**degree,
                   'rbf':lambda x,y:np.exp(-gamma*np.sum((y-x[:,np.newaxis])**2,axis=-1)),
                   'linear':lambda x,y: np.dot(x, y.T)}[kernel]
    self.C = C
    self.max_iter = max_iter

  def restrict_to_square(self, t, v0, u):
    t = (np.clip(v0 + t*u, 0, self.C) - v0)[1]/u[1]
    return (np.clip(v0 + t*u, 0, self.C) - v0)[0]/u[0]

  def fit(self, X, y):
    self.X = X.copy()
    self.y = y * 2 - 1
    self.lambdas = np.zeros_like(self.y, dtype=float)
    self.K = self.kernel(self.X, self.X) * self.y[:,np.newaxis] * self.y
    
    for _ in range(self.max_iter):
      for idxM in range(len(self.lambdas)):
        idxL = np.random.randint(0, len(self.lambdas))
        Q = self.K[[[idxM, idxM], [idxL, idxL]], [[idxM, idxL], [idxM, idxL]]]
        v0 = self.lambdas[[idxM, idxL]]
        k0 = 1 - np.sum(self.lambdas * self.K[[idxM, idxL]], axis=1)
        u = np.array([-self.y[idxL], self.y[idxM]])
        t_max = np.dot(k0, u) / (np.dot(np.dot(Q, u), u) + 1E-15)
        self.lambdas[[idxM, idxL]] = v0 + u * self.restrict_to_square(t_max, v0, u)
    
    idx, = np.nonzero(self.lambdas > 1E-15)
    self.b = np.sum((1.0-np.sum(self.K[idx]*self.lambdas, axis=1))*self.y[idx])/len(idx)
  
  def decision_function(self, X):
    return np.sum(self.kernel(X, self.X) * self.y * self.lambdas, axis=1) + self.b

In simple cases it works not much worth than sklearn.svm.SVC, comparison shown below

I have posted this code with some more code producing images for comparison on GitHub. For more elaborate explanation with formulas you may want to refer to my preprint on ResearchGate.

UPDATE: now live version is available, see Github Pages

🌐
Machinecurve
machinecurve.com › index.php › 2020 › 11 › 25 › using-radial-basis-functions-for-svms-with-python-and-scikit-learn
Using Radial Basis Functions for SVMs with Python and Scikit-learn | MachineCurve.com
November 25, 2020 - We saw that Radial Basis Functions, which measure the distance of a sample to a point, can be used as a kernel functon and hence allow for learning a linear decision boundary in nonlinear data, applying the kernel trick. Using a variety of visual and code examples, we explained step-by-step how we can use Scikit-learn and Python to apply RBFs for your Support Vector Machine based Machine Learning model.
🌐
GeeksforGeeks
geeksforgeeks.org › how-to-make-better-models-in-python-using-svm-classifier-and-rbf-kernel
How to Make Better Models in Python using SVM Classifier and RBF Kernel | GeeksforGeeks
April 28, 2025 - One powerful tool that can be used ... data types. In this article, we will focus on how to use the SVM classifier and the radial basis function (RBF) kernel in Python to build better models for your data....