Not sure how this is done in the any specific library, but here is what I would try.

Given two sets of points ( points in the first set and in the second set): and , all in -dimensional space ()

I would compute the 'centres of mass' of the two sets:

I would then use the mid-point between the two centres of mass,

as the point for the hyper-plane.

Then I would use the vector connecting the two centres of mass,

as the normal for the hyper-plane. Lets define

A single point and a normal vector, in -dimensional space, will uniquely define an dimensional hyper-plane. To actually do it you will need to find a set of vectors

This set can be created by Gram-Schmidt type process, starting from your trivial basis and then ensuring that every new vector is orthogonal to all vectors in the set and to .

Once you did that, any point on the hyper-plane will be uniquely described by coordinates , and will correspond to the following point in the original -dimensional space

Answer from Cryo on Stack Exchange
🌐
Medium
medium.com › @apurvjain37 › support-vector-machines-s-v-m-hyperplane-and-margins-ee2f083381b4
Support Vector Machines(S.V.M) — Hyperplane and Margins | by apurv jain | Medium
September 25, 2020 - The hyperplane will be generated in an iterative manner by SVM so that the error can be minimized. The goal of SVM is to divide the datasets into classes to find a maximum marginal hyperplane (MMH).
🌐
SVM Tutorial
svm-tutorial.com › home › svm - understanding the math - the optimal hyperplane
SVM - Understanding the math : the optimal hyperplane
April 30, 2023 - If I have a margin delimited by two hyperplanes (the dark blue lines in Figure 2), I can find a third hyperplane passing right in the middle of the margin. Finding the biggest margin, is the same thing as finding the optimal hyperplane. ... The region bounded by the two hyperplanes will be the biggest possible margin. If it is so simple why does everybody have so much pain understanding SVM ? It is because as always the simplicity requires some abstraction and mathematical terminology to be well understood.
🌐
MIT
web.mit.edu › 6.034 › wwwbob › svm-notes-long-08.pdf pdf
1 An Idiot’s guide to Support vector machines (SVMs) R. Berwick, Village Idiot
Hyperplane? • In general, lots of possible · solutions for a,b,c (an · infinite number!) • Support Vector Machine · (SVM) finds an optimal · solution · 4 · Support Vector Machine (SVM) Support vectors · Maximize · margin · • SVMs maximize the margin ·
🌐
GeeksforGeeks
geeksforgeeks.org › separating-hyperplanes-in-svm
Separating Hyperplanes in SVM - GeeksforGeeks
September 15, 2021 - ... Support vectors are the data ... optimal decision surface. The optimal hyperplane comes from the function class with the lowest capacity i.e minimum number of independent features/parameters....
🌐
YouTube
youtube.com › mahesh huddar
How to Select Best Hyperplane in SVM | Support Vector Machine in Machine Learning by Mahesh Huddar - YouTube
How to Select Best Hyperplane in SVM | Support Vector Machine in Machine Learning by Mahesh HuddarThe hyperplane function for two variables is 𝒃+𝒂𝟏𝒙𝟏+𝒂...
Published   May 12, 2023
Views   21K
🌐
Stack Overflow
stackoverflow.com › questions › 37998045 › hyperplane-equation-in-svm
vector - hyperplane equation in SVM - Stack Overflow
Find the midpoint between them, convolve with the normal vector, and there's your optimum plane. ... Sign up to request clarification or add additional context in comments. ... I am not clear about your points in first 2 para of your answer, ...
🌐
Saedsayad
saedsayad.com › support_vector_machine.htm
Support Vector Machine - Classification (SVM)
Copyright © 2010-2024, Dr. Saed Sayad · We reached a milestone, "one million pageviews" in 2018
Find elsewhere
🌐
Analytics Vidhya
analyticsvidhya.com › home › support vector machine (svm)
Support Vector Machine (SVM)
April 21, 2025 - NOTE: Since we are plotting the ... boundary a “hyperplane” · The best hyperplane is that plane that has the maximum distance from both the classes, and this is the main aim of SVM....
🌐
Analytics Vidhya
analyticsvidhya.com › home › beginner’s guide to support vector machine(svm)
SVM | Support Vector Machine | How does SVM work
March 12, 2021 - We have to select a hyperplane, for which the margin, i.e the distance between support vectors and hyper-plane is maximum. Even a little interference in the position of these support vectors can change the hyper-plane. As we have a clear idea of the terminologies related to SVM, let’s now see how the algorithm works.
Top answer
1 of 2
5

This is an excellent question and one I struggled with as well.

Firstly, the margin is not fixed. As your diagram shows, the margin $m = \frac{2}{\lVert w \rVert}$, which is a function of the 2-norm of the $w$ parameter, nothing else. So the margin is maximized by minimizing the norm of $w$.

But let's back up to see why.

We have some data represented as vectors $x_i \in \mathbb{R}^n$ and each $x_i$ is associated with a binary label $y_i \in \{-1,1\}$, for $i \in \mathbb{N}$. We could have made the labels anything, but choosing -1 and 1 is mathematically convenient.

An (affine) hyperplane is the generalization of a line in n-dimensional space defined as the set of points $x$ such that $w^Tx + b = 0$, where $w, x \in \mathbb{R}^n$ and $w^Tx$ is the dot (inner) product between these vectors. The choice of $w$ will change the orientation of the hyperplane and the choice of $b$ will determine its offset from the origin.

We want to find a hyperplane (i.e. choice of $w, b$) that separates the data $x_i$ according to their class labels. We assume our data can be perfectly separated by a line (or hyperplane), i.e. it is linearly separable. So we want a hyperplane such that when $y_i = -1$ then $w^Tx_i + b \le 0$ and when $y_i = 1$ then $w^Tx_i + b \ge 0$. There's actually a fairly straightforward algorithm called the perceptron algorithm that can find a hyperplane meeting those constraints.

But there are an infinite number of hyperplanes (i.e. choices of $w, b$) that can satisfy the constraints of separating the data classes, so in order for our hyperplane to work well in classifying future data, we want it to optimally separate the data classes such that it is not biased toward one class or another and is situated perfectly between them with maximal space on either side (maximal margins).

The easier way to set this up is that what we really want is to define two parallel hyperplanes, one just on the inside boundary of class $y_i = -1$ and the other just on the inside boundary of the class $y_i = 1$, then the actual decision boundary will be the parallel hyperplane exactly in the middle of these.

In other words, if we have our decision boundary hyperplane as $w^Tx + b = 0$, then we want to find a $\delta > 0$ such that we can define two parallel hyperplanes on either side: $w^Tx + b = 0 \pm \delta$. We'd like to maximize $\delta$ so that the distance between these two boundary hyperplanes is maximal, that will give us maximal separation between the classes.

But if we keep $\delta$ a variable, then changing the norm of $w$, changing $b$ or changing $\delta$ will change the margin between the two hyperplanes. We really only want to optimize $w, b$. Moreover, if we change $\delta$ by any scalar amount, then we can just scale $w, b$ an opposite amount, so we can fix $\delta$ to anything we want and still be able to adjust the margin by modifying $w, b$.

This is where the $\pm 1$ comes from, it is from arbitrarily fixing $\delta = \pm 1$ because it is a convenient choice.

So we define our two parallel separating hyperplanes as: $$ w^Tx + b = \pm 1$$

Then the margin is the distance between these two parallel hyperplanes. The displacement from the origin to a hyperplane is $\frac{b}{\Vert w \Vert}$, and we can use this fact to compute the distance between our hyperplanes. The distance from the origin to the first hyperplane is $\frac{b+1}{\Vert w \Vert}$ and to the second hyperplane is $\frac{b-1}{\Vert w \Vert}$, so the distance between them is (aka margin $m$): $$m = \frac{b+1}{\Vert w \Vert} - \frac{b-1}{\Vert w \Vert}$$ $$m = \frac{2}{\Vert w \Vert} $$

So the optimal hyperplane will be found by maximizing $m$, i.e. $$ \underset{w}{max}\frac{2}{\Vert w \Vert} = \underset{w}{min}\frac{1}{2}{\Vert w \Vert}$$ subject to the constraint $y_i(w^Tx_i + b) \ge 1$.

Now, if we had arbitrarily chosen $\delta = \pi$ instead of 1, the margin would be $\frac{2\pi}{\Vert w \Vert}$, but this is just a change in scaling and the optimization problem is identical in form.

2 of 2
0

1 and -1 are just the standard, practical choice (if ultimately arbitrary). You could theoretically replace them with h and -h, where h is any positive number. Whichever values you choose, the weights will be massaged accordingly during the optimization process and the relative result will be the same. And that's the key - it's all relative - the only difference would be, in a manner of speaking, the relative "units". The size of the margin may be fixed at "1", but what "1" "means" is relative to the values of the decision function, which depend on the weights.

🌐
ScienceDirect
sciencedirect.com › topics › engineering › hyperplanes
Hyperplanes - an overview | ScienceDirect Topics
A hyperplane is a decision boundary to classify the samples, its location depends upon support vectors, that is, the closest data points to the hyperplane. The dimension of a hyperplane is governed by the feature space's dimension. The challenge of finding the hyperplane in nonlinear spaces ...
🌐
Medium
medium.com › deep-math-machine-learning-ai › chapter-3-support-vector-machine-with-math-47d6193c82be
Chapter 3: Support Vector machine with Math. | by Madhu Sanjeevi ( Mady ) | Deep Math Machine learning.ai | Medium
June 25, 2018 - How can we find the optimal hyperplane(yellow line) ? if we maximize the margin(distance) between two hyperplanes then divide by 2 we get the decision boundary. ... Observe this picture.
🌐
Wikipedia
en.wikipedia.org › wiki › Support_vector_machine
Support vector machine - Wikipedia
1 week ago - The parameters of the maximum-margin hyperplane are derived by solving the optimization. There exist several specialized algorithms for quickly solving the quadratic programming (QP) problem that arises from SVMs, mostly relying on heuristics for breaking the problem down into smaller, more manageable chunks. Another approach is to use an interior-point method that uses Newton-like iterations to find ...
🌐
Giskard
giskard.ai › knowledge › glossary › hyperplane
What is Hyperplane | Giskard
March 14, 2024 - Approaches for finding an optimal hyperplane vary among algorithm types; for example, minimizing the sum of squared errors is used in linear regression, while maximizing the margin between hyperplanes and the nearest data points is used in SVMs.
🌐
Deepchecks
deepchecks.com › glossary › hyperplane
What is Hyperplane? Role in Machine Learning
May 27, 2024 - In support vector machines (SVMs), the maximum margin hyperplane is found by solving a convex optimization problem that seeks to maximize the margin while also ensuring that all data points are classified correctly.
🌐
Towards Data Science
towardsdatascience.com › home › latest › understand support vector machines
Understand Support Vector Machines | Towards Data Science
January 16, 2025 - The hyperplane equation is represented by a vector and a bias term. The vector in the equation is orthogonal to the hyperplane and the bias term represents the amount of offset from the origin. We are almost done now. Remember the purpose of SVM: to find the hyperplane that leads to the biggest ...
Top answer
1 of 2
6

The prediction function $f(\mathbf{z})$ for an SVM model is exactly the signed distance of $\mathbf{z}$ to the separating hyperplane. The separating hyperplane itself is the geometric place $f(\mathbf{z}) = 0$.

For a linear SVM, the separating hyperplane's normal vector $\mathbf{w}$ can be written in input space, and we get:

$$f(\mathbf{z}) = \langle \mathbf{w}, \mathbf{z} \rangle + \rho = \mathbf{w}^T\mathbf{z} + \rho,$$

with $\rho$ the model's bias term.

If a kernel function $\kappa(\mathbf{u},\mathbf{v})=\langle \varphi(\mathbf{u}), \varphi(\mathbf{v})\rangle$ is used, $\mathbf{w}$ typically can no longer be expressed in input space, but only in the space spanned by the embedding function $\varphi(\cdot)$. Then we obtain the following:

$$\begin{align} f(\mathbf{z}) &= \langle \mathbf{w}, \varphi(\mathbf{z})\rangle + \rho = \mathbf{w}^T\varphi(\mathbf{z}) + \rho, \\ &= \sum_{i\in SV} y_i\alpha_i \kappa(\mathbf{x}_i,\mathbf{z}) + \rho, \end{align}$$ with $y$ the vector of labels, $\alpha$ the vector of support values, $\mathbf{x}$'s the support vectors.

2 of 2
1

you seem a little bit confuse. First of all try to read this tutorial that in my opinion is a good introduction. Anyway we want to find an hyperplane because we want to find a rule to discriminate different classes. So at the end you put your test set in the hyperspace and see where every sample is located respect the hyperplane. For example if an element of test set is on the "right" of the hyperplane, you label it "class1", if the sample is on the "left" you label it as "class2". Obviously the stuff is more complex, but this is the basic idea behind svm concept.