Not sure how this is done in the any specific library, but here is what I would try.

Given two sets of points ( points in the first set and in the second set): and , all in -dimensional space ()

I would compute the 'centres of mass' of the two sets:

I would then use the mid-point between the two centres of mass,

as the point for the hyper-plane.

Then I would use the vector connecting the two centres of mass,

as the normal for the hyper-plane. Lets define

A single point and a normal vector, in -dimensional space, will uniquely define an dimensional hyper-plane. To actually do it you will need to find a set of vectors

This set can be created by Gram-Schmidt type process, starting from your trivial basis and then ensuring that every new vector is orthogonal to all vectors in the set and to .

Once you did that, any point on the hyper-plane will be uniquely described by coordinates , and will correspond to the following point in the original -dimensional space

Answer from Cryo on Stack Exchange
🌐
SVM Tutorial
svm-tutorial.com › home › svm - understanding the math - the optimal hyperplane
SVM - Understanding the math : the optimal hyperplane
April 30, 2023 - Finding the biggest margin, is the same thing as finding the optimal hyperplane. ... The region bounded by the two hyperplanes will be the biggest possible margin. If it is so simple why does everybody have so much pain understanding SVM ?
🌐
MIT
web.mit.edu › 6.034 › wwwbob › svm-notes-long-08.pdf pdf
1 An Idiot’s guide to Support vector machines (SVMs) R. Berwick, Village Idiot
• We can show that the optimal hyperplane stems · from the function class with the lowest · “capacity”= # of independent features/parameters · we can twiddle [note this is ‘extra’ material not · covered in the lectures… you don’t have to know · this] Recall from 1-layer nets : Which Separating · Hyperplane? • In general, lots of possible · solutions for a,b,c (an · infinite number!) • Support Vector Machine · (SVM) finds an optimal ·
🌐
GeeksforGeeks
geeksforgeeks.org › separating-hyperplanes-in-svm
Separating Hyperplanes in SVM - GeeksforGeeks
September 15, 2021 - ... Support vectors are the data ... surface. The optimal hyperplane comes from the function class with the lowest capacity i.e minimum number of independent features/parameters....
🌐
Medium
medium.com › @apurvjain37 › support-vector-machines-s-v-m-hyperplane-and-margins-ee2f083381b4
Support Vector Machines(S.V.M) — Hyperplane and Margins | by apurv jain | Medium
September 25, 2020 - The hyperplane will be generated in an iterative manner by SVM so that the error can be minimized. The goal of SVM is to divide the datasets into classes to find a maximum marginal hyperplane (MMH).
🌐
Saedsayad
saedsayad.com › support_vector_machine.htm
Support Vector Machine - Classification (SVM)
Copyright © 2010-2024, Dr. Saed Sayad · We reached a milestone, "one million pageviews" in 2018
🌐
ResearchGate
researchgate.net › figure › The-support-vector-machines-SVM-method-the-optimal-hyperplane-separates-the-two_fig1_343092219
The support vector machines (SVM) method: the optimal hyperplane... | Download Scientific Diagram
Content may be subject to copyright. ... A. C. Teodoro · A. M. C. Lima ... Machine learning (ML) algorithms have shown great performance in geological remote sensing applications. The study area of this work was the Fregeneda–Almendra region (Spain–Portugal) where the support vector machine (SVM) was employed. Lithium (Li)-pegmatite exploration using satellite data presents some challenges since pegmatites are, by nature,... ... ... optimal hyperplane is the decision boundary that maximizes the distance of the margin between the class bounding hyperplanes (also called supporting hyperplanes).
🌐
ScienceDirect
sciencedirect.com › topics › computer-science › separating-hyperplane
Separating Hyperplane - an overview | ScienceDirect Topics
The geometric margin quantifies the distance between a data point and the separating hyperplane, and maximizing this margin leads to the optimal hyperplane. 3 8 The margin is given by (1/|w|) for each side, and the maximal margin is (2/|w|), where (|w|) is the Euclidean norm of the normal vector.
Find elsewhere
🌐
Analytics Vidhya
analyticsvidhya.com › home › support vector machine (svm)
Support Vector Machine (SVM)
April 21, 2025 - That’s why ‘c’ is a hyperparameter and we find the optimal value of ‘c’ using GridsearchCV and cross-validation. The most interesting feature of SVM is that it can even work with a non-linear dataset and for this, we use “Kernel ...
🌐
Shuzhan Fan
shuzhanfan.github.io › 2018 › 05 › understanding-mathematics-behind-support-vector-machines
Understanding the mathematics behind Support Vector Machines
May 7, 2018 - If you are familiar with the perceptron, it finds the hyperplane by iteratively updating its weights and trying to minimize the cost function. However, if you run the algorithm multiple times, you probably will not get the same hyperplane every time. SVM doesn’t suffer from this problem.
🌐
ResearchGate
researchgate.net › figure › Optimal-hyperplane-in-Support-Vector-Machine_fig1_264534919
Optimal hyperplane in Support Vector Machine | Download Scientific Diagram
Figure 1 shows the optimal hyperplane in SVM that separates two datasets, the vectors near the hyperplane are called the Support Vectors (SVs). The accuracy of a SVM model largely depends on the selection of the kernel parameters [16], since these parameters have a significant impact on the performance of kernel method.
🌐
ResearchGate
researchgate.net › figure › How-to-find-the-optimal-hyperplane_fig1_4279258
How to find the optimal hyperplane | Download Scientific Diagram
This family of classifiers has ... for two point sets can be found by constructing the convex hulls of the two classes and then equally divid- ing the shortest distance between the hulls....
🌐
ScienceDirect
sciencedirect.com › topics › engineering › hyperplanes
Hyperplanes - an overview | ScienceDirect Topics
Such data points are known as the support vectors, and they influence the position and orientation of the separation surface (hyperplane). The distance from the support vectors to the hyperplane is known as the margin, and the SVM algorithms maximize it to identify the optimal hyperplane [51–53]. ...
🌐
Stack Overflow
stackoverflow.com › questions › 37998045 › hyperplane-equation-in-svm
vector - hyperplane equation in SVM - Stack Overflow
Find the midpoint between them, convolve with the normal vector, and there's your optimum plane. ... Sign up to request clarification or add additional context in comments. ... I am not clear about your points in first 2 para of your answer, ...
🌐
Analytics Vidhya
analyticsvidhya.com › home › beginner’s guide to support vector machine(svm)
SVM | Support Vector Machine | How does SVM work
March 12, 2021 - We have to select a hyperplane, for which the margin, i.e the distance between support vectors and hyper-plane is maximum. Even a little interference in the position of these support vectors can change the hyper-plane.
🌐
Stanford NLP Group
nlp.stanford.edu › IR-book › html › htmledition › support-vector-machines-the-linearly-separable-case-1.html
Support vector machines: The linearly separable case
Since each example's distance from the hyperplane is , the geometric margin is . Our desire is still to maximize this geometric margin. That is, we want to find and such that: ... For all , Maximizing is the same as minimizing . This gives the final standard formulation of an SVM as a minimization problem: We are now optimizing ...
🌐
Stack Exchange
stats.stackexchange.com › questions › 107722 › finding-optimal-hyperplane
svm - Finding optimal hyperplane - Cross Validated
July 12, 2014 - This looks like a NP hard problem. If no addtional info about alpha, n and Vi are available, I would try to use simulated annealing to find a good enough solution. ... $\begingroup$ Maybe not! A hyperplane is not able to separate any set of points. So there is a clear constraint to the problem.
🌐
ScienceDirect
sciencedirect.com › topics › computer-science › maximum-margin-hyperplane
Maximum Margin Hyperplane - an overview | ScienceDirect Topics
This feature is important for the discussion of support vector classifiers and SVMs, which we will discuss later in this chapter. Quite simply, we can define the maximum margin hyperplane through an optimization problem, with the determination of an objective function that respects the condition that the value of the margin (M, hereinafter) is maximized.
🌐
Medium
medium.com › deep-math-machine-learning-ai › chapter-3-support-vector-machine-with-math-47d6193c82be
Chapter 3: Support Vector machine with Math. | by Madhu Sanjeevi ( Mady ) | Deep Math Machine learning.ai | Medium
June 25, 2018 - if we maximize the margin(distance) between two hyperplanes then divide by 2 we get the decision boundary. ... Observe this picture. ... so either we save the w and b values and keep going or we adjust the parameter (w and b) and keep going.
Top answer
1 of 2
5

This is an excellent question and one I struggled with as well.

Firstly, the margin is not fixed. As your diagram shows, the margin $m = \frac{2}{\lVert w \rVert}$, which is a function of the 2-norm of the $w$ parameter, nothing else. So the margin is maximized by minimizing the norm of $w$.

But let's back up to see why.

We have some data represented as vectors $x_i \in \mathbb{R}^n$ and each $x_i$ is associated with a binary label $y_i \in \{-1,1\}$, for $i \in \mathbb{N}$. We could have made the labels anything, but choosing -1 and 1 is mathematically convenient.

An (affine) hyperplane is the generalization of a line in n-dimensional space defined as the set of points $x$ such that $w^Tx + b = 0$, where $w, x \in \mathbb{R}^n$ and $w^Tx$ is the dot (inner) product between these vectors. The choice of $w$ will change the orientation of the hyperplane and the choice of $b$ will determine its offset from the origin.

We want to find a hyperplane (i.e. choice of $w, b$) that separates the data $x_i$ according to their class labels. We assume our data can be perfectly separated by a line (or hyperplane), i.e. it is linearly separable. So we want a hyperplane such that when $y_i = -1$ then $w^Tx_i + b \le 0$ and when $y_i = 1$ then $w^Tx_i + b \ge 0$. There's actually a fairly straightforward algorithm called the perceptron algorithm that can find a hyperplane meeting those constraints.

But there are an infinite number of hyperplanes (i.e. choices of $w, b$) that can satisfy the constraints of separating the data classes, so in order for our hyperplane to work well in classifying future data, we want it to optimally separate the data classes such that it is not biased toward one class or another and is situated perfectly between them with maximal space on either side (maximal margins).

The easier way to set this up is that what we really want is to define two parallel hyperplanes, one just on the inside boundary of class $y_i = -1$ and the other just on the inside boundary of the class $y_i = 1$, then the actual decision boundary will be the parallel hyperplane exactly in the middle of these.

In other words, if we have our decision boundary hyperplane as $w^Tx + b = 0$, then we want to find a $\delta > 0$ such that we can define two parallel hyperplanes on either side: $w^Tx + b = 0 \pm \delta$. We'd like to maximize $\delta$ so that the distance between these two boundary hyperplanes is maximal, that will give us maximal separation between the classes.

But if we keep $\delta$ a variable, then changing the norm of $w$, changing $b$ or changing $\delta$ will change the margin between the two hyperplanes. We really only want to optimize $w, b$. Moreover, if we change $\delta$ by any scalar amount, then we can just scale $w, b$ an opposite amount, so we can fix $\delta$ to anything we want and still be able to adjust the margin by modifying $w, b$.

This is where the $\pm 1$ comes from, it is from arbitrarily fixing $\delta = \pm 1$ because it is a convenient choice.

So we define our two parallel separating hyperplanes as: $$ w^Tx + b = \pm 1$$

Then the margin is the distance between these two parallel hyperplanes. The displacement from the origin to a hyperplane is $\frac{b}{\Vert w \Vert}$, and we can use this fact to compute the distance between our hyperplanes. The distance from the origin to the first hyperplane is $\frac{b+1}{\Vert w \Vert}$ and to the second hyperplane is $\frac{b-1}{\Vert w \Vert}$, so the distance between them is (aka margin $m$): $$m = \frac{b+1}{\Vert w \Vert} - \frac{b-1}{\Vert w \Vert}$$ $$m = \frac{2}{\Vert w \Vert} $$

So the optimal hyperplane will be found by maximizing $m$, i.e. $$ \underset{w}{max}\frac{2}{\Vert w \Vert} = \underset{w}{min}\frac{1}{2}{\Vert w \Vert}$$ subject to the constraint $y_i(w^Tx_i + b) \ge 1$.

Now, if we had arbitrarily chosen $\delta = \pi$ instead of 1, the margin would be $\frac{2\pi}{\Vert w \Vert}$, but this is just a change in scaling and the optimization problem is identical in form.

2 of 2
0

1 and -1 are just the standard, practical choice (if ultimately arbitrary). You could theoretically replace them with h and -h, where h is any positive number. Whichever values you choose, the weights will be massaged accordingly during the optimization process and the relative result will be the same. And that's the key - it's all relative - the only difference would be, in a manner of speaking, the relative "units". The size of the margin may be fixed at "1", but what "1" "means" is relative to the values of the decision function, which depend on the weights.