Very simply, if you choose any number other than 1, you can simply scale it away again. Consider Now, divide both sides of the equation by , and we get Which means that we can define and , and we have And because the goal is to minimise (in order to maximise the size of the margin, ), it doesn't matter if we scale by some constant such as first - and so, we can use in place of , and the choice of is irrelevant (aside from needing to be a positive real number).

Since it's irrelevant, might as well make it the simplest possible positive real number, 1.

As for why it "sets the scale" of the problem, think of it this way: changing would change the scaling of (that is, choosing would make twice as big, for example). And so, just as changing changes the scale, so too does setting set the scale - it keeps it fixed, rather than having it vary from instance to instance.

Answer from Glen O on Stack Exchange
🌐
SVM Tutorial
svm-tutorial.com › home › svm - understanding the math - the optimal hyperplane
SVM - Understanding the math : the optimal hyperplane
April 30, 2023 - Our goal is to maximize the margin. Among all possible hyperplanes meeting the constraints, we will choose the hyperplane with the smallest because it is the one which will have the biggest margin. ... Solving this problem is like solving and equation. Once we have solved it, we will have found the couple () for which is the smallest possible and the constraints we fixed are met.
🌐
Stack Overflow
stackoverflow.com › questions › 37998045 › hyperplane-equation-in-svm
vector - hyperplane equation in SVM - Stack Overflow
The positive margin hyperplane equation is w.x-b=1, the negative margin hyperplane equation is w.x-b=-1, and the middle(optimum) hyperplane equation is w.x-b=0). I understand how a hyperplane equation can be got by using a normal vector of that ...
🌐
MIT
web.mit.edu › 6.034 › wwwbob › svm-notes-long-08.pdf pdf
1 An Idiot’s guide to Support vector machines (SVMs) R. Berwick, Village Idiot
Hyperplane? • In general, lots of possible · solutions for a,b,c (an · infinite number!) • Support Vector Machine · (SVM) finds an optimal · solution · 4 · Support Vector Machine (SVM) Support vectors · Maximize · margin · • SVMs maximize the margin ·
🌐
GeeksforGeeks
geeksforgeeks.org › machine learning › support-vector-machine-algorithm
Support Vector Machine (SVM) Algorithm - GeeksforGeeks
The larger the margin the better the model performs on new and unseen data. Hyperplane: A decision boundary separating different classes in feature space and is represented by the equation wx + b = 0 in linear classification.
Published   3 weeks ago
🌐
Towards Data Science
towardsdatascience.com › home › latest › understand support vector machines
Understand Support Vector Machines | Towards Data Science
January 16, 2025 - The hyperplane equation is represented by a vector and a bias term. The vector in the equation is orthogonal to the hyperplane and the bias term represents the amount of offset from the origin.
🌐
Shuzhan Fan
shuzhanfan.github.io › 2018 › 05 › understanding-mathematics-behind-support-vector-machines
Understanding the mathematics behind Support Vector Machines
May 7, 2018 - SVM works by finding the optimal hyperplane which could best separate the data. The question then comes up as how do we choose the optimal hyperplane and how do we compare the hyperplanes. Let’s first consider the equation of the hyperplane \(w\cdot x + b=0\).
🌐
GeeksforGeeks
geeksforgeeks.org › separating-hyperplanes-in-svm
Separating Hyperplanes in SVM - GeeksforGeeks
September 15, 2021 - Support vectors are the data points that are close to the decision boundary, they are the data points most difficult to classify, they hold the key for SVM to be optimal decision surface. The optimal hyperplane comes from the function class with the lowest capacity i.e minimum number of independent features/parameters.
Find elsewhere
Top answer
1 of 2
5

This is an excellent question and one I struggled with as well.

Firstly, the margin is not fixed. As your diagram shows, the margin $m = \frac{2}{\lVert w \rVert}$, which is a function of the 2-norm of the $w$ parameter, nothing else. So the margin is maximized by minimizing the norm of $w$.

But let's back up to see why.

We have some data represented as vectors $x_i \in \mathbb{R}^n$ and each $x_i$ is associated with a binary label $y_i \in \{-1,1\}$, for $i \in \mathbb{N}$. We could have made the labels anything, but choosing -1 and 1 is mathematically convenient.

An (affine) hyperplane is the generalization of a line in n-dimensional space defined as the set of points $x$ such that $w^Tx + b = 0$, where $w, x \in \mathbb{R}^n$ and $w^Tx$ is the dot (inner) product between these vectors. The choice of $w$ will change the orientation of the hyperplane and the choice of $b$ will determine its offset from the origin.

We want to find a hyperplane (i.e. choice of $w, b$) that separates the data $x_i$ according to their class labels. We assume our data can be perfectly separated by a line (or hyperplane), i.e. it is linearly separable. So we want a hyperplane such that when $y_i = -1$ then $w^Tx_i + b \le 0$ and when $y_i = 1$ then $w^Tx_i + b \ge 0$. There's actually a fairly straightforward algorithm called the perceptron algorithm that can find a hyperplane meeting those constraints.

But there are an infinite number of hyperplanes (i.e. choices of $w, b$) that can satisfy the constraints of separating the data classes, so in order for our hyperplane to work well in classifying future data, we want it to optimally separate the data classes such that it is not biased toward one class or another and is situated perfectly between them with maximal space on either side (maximal margins).

The easier way to set this up is that what we really want is to define two parallel hyperplanes, one just on the inside boundary of class $y_i = -1$ and the other just on the inside boundary of the class $y_i = 1$, then the actual decision boundary will be the parallel hyperplane exactly in the middle of these.

In other words, if we have our decision boundary hyperplane as $w^Tx + b = 0$, then we want to find a $\delta > 0$ such that we can define two parallel hyperplanes on either side: $w^Tx + b = 0 \pm \delta$. We'd like to maximize $\delta$ so that the distance between these two boundary hyperplanes is maximal, that will give us maximal separation between the classes.

But if we keep $\delta$ a variable, then changing the norm of $w$, changing $b$ or changing $\delta$ will change the margin between the two hyperplanes. We really only want to optimize $w, b$. Moreover, if we change $\delta$ by any scalar amount, then we can just scale $w, b$ an opposite amount, so we can fix $\delta$ to anything we want and still be able to adjust the margin by modifying $w, b$.

This is where the $\pm 1$ comes from, it is from arbitrarily fixing $\delta = \pm 1$ because it is a convenient choice.

So we define our two parallel separating hyperplanes as: $$ w^Tx + b = \pm 1$$

Then the margin is the distance between these two parallel hyperplanes. The displacement from the origin to a hyperplane is $\frac{b}{\Vert w \Vert}$, and we can use this fact to compute the distance between our hyperplanes. The distance from the origin to the first hyperplane is $\frac{b+1}{\Vert w \Vert}$ and to the second hyperplane is $\frac{b-1}{\Vert w \Vert}$, so the distance between them is (aka margin $m$): $$m = \frac{b+1}{\Vert w \Vert} - \frac{b-1}{\Vert w \Vert}$$ $$m = \frac{2}{\Vert w \Vert} $$

So the optimal hyperplane will be found by maximizing $m$, i.e. $$ \underset{w}{max}\frac{2}{\Vert w \Vert} = \underset{w}{min}\frac{1}{2}{\Vert w \Vert}$$ subject to the constraint $y_i(w^Tx_i + b) \ge 1$.

Now, if we had arbitrarily chosen $\delta = \pi$ instead of 1, the margin would be $\frac{2\pi}{\Vert w \Vert}$, but this is just a change in scaling and the optimization problem is identical in form.

2 of 2
0

1 and -1 are just the standard, practical choice (if ultimately arbitrary). You could theoretically replace them with h and -h, where h is any positive number. Whichever values you choose, the weights will be massaged accordingly during the optimization process and the relative result will be the same. And that's the key - it's all relative - the only difference would be, in a manner of speaking, the relative "units". The size of the margin may be fixed at "1", but what "1" "means" is relative to the values of the decision function, which depend on the weights.

🌐
Medium
medium.com › @priyankaparashar54 › support-vector-machine-and-its-mathematical-implementation-c0bdd8b4c699
Support Vector Machine and it’s Mathematical Implementation | by Priyanka Parashar | Medium
June 20, 2020 - SV points are very critical in determining the hyperplane because if the position of the vectors changes the hyperplane’s position is altered. Technically this hyperplane can also be called as margin maximizing hyperplane. These are the points that help us build our SVM’s.
🌐
Bbk
titan.dcs.bbk.ac.uk › ~ale › dsta+dsat › dsta+dsat-6 › dsta-ZM-21-SVMs-excerpts-v2.pdf pdf
CHAPTER 21 Support Vector Machines
Note that δ∗̸= 0, since h(x) is assumed to be a separating hyperplane, and Eq. (21.3) ... Consider the equation of the hyperplane [Eq.
🌐
Analytics Vidhya
analyticsvidhya.com › home › support vector machine (svm)
Support Vector Machine (SVM)
April 21, 2025 - By this I wanted to show you that the parallel lines depend on (w,b) of our hyperplane, if we multiply the equation of hyperplane with a factor greater than 1 then the parallel lines will shrink and if we multiply with a factor less than 1, they expand. We can now say that these lines will move as we do changes in (w,b) and this is how this gets optimized. But what is the optimization function? Let’s calculate it. We know that the aim of SVM is to maximize this margin that means distance (d).
🌐
Medium
medium.com › deep-math-machine-learning-ai › chapter-3-support-vector-machine-with-math-47d6193c82be
Chapter 3: Support Vector machine with Math. | by Madhu Sanjeevi ( Mady ) | Deep Math Machine learning.ai | Medium
June 25, 2018 - This is a high level view of what SVM does, The yellow dashed line is the line which separates the data (we call this line ‘Decision Boundary’ (Hyperplane) in SVM), The other two lines (also Hyperplanes) help us make the right decision boundary.
🌐
Saedsayad
saedsayad.com › support_vector_machine.htm
Support Vector Machine - Classification (SVM)
Copyright © 2010-2024, Dr. Saed Sayad · We reached a milestone, "one million pageviews" in 2018
🌐
Analytics Vidhya
analyticsvidhya.com › home › support vector machine (svm)
Guide on Support Vector Machine (SVM) Algorithm
April 21, 2025 - To just get the projection we can simply take the unit vector of B because it will be in the direction of B but its magnitude will be 1. Hence now the equation becomes: ... Now let’s move to the next part and see how we will use this in SVM. Consider a random point X and we want to know whether it lies on the right side of the plane or the left side of the plane (positive or negative). To find this first we assume this point is a vector (X) and then we make a vector (w) which is perpendicular to the hyperplane.
🌐
MathWorks
mathworks.com › matlabcentral › answers › 713438-finding-svm-hyperplane-equation-for-2nd-order-polynomial
Finding SVM hyperplane equation for 2nd order polynomial - MATLAB Answers - MATLAB Central
January 10, 2021 - Hello, I was using the Support Vector Machine model calculation for a 2nd order hyperplane to separate two classes: SVMmodel = fitcsvm(predictors,response, ... 'KernelFunction', 'polynomial',...
🌐
Stack Overflow
stackoverflow.com › questions › 37946465 › visualizing-hyperplane-equation-of-svm
machine learning - visualizing hyperplane equation of SVM - Stack Overflow
If x is a vector of an hyperplane, then x.w = 0 is the equation of the hyperplane. Unfortunately, we do not want any of your x to be on the hyperplane. In the case of SVM, you do not know any vector x on the hyperplane.
🌐
Analytics Vidhya
analyticsvidhya.com › home › the mathematics behind support vector machine algorithm (svm)
The Mathematics Behind Support Vector Machine Algorithm (SVM)
January 16, 2025 - (How and why do I say, main separator and not just any separator will we cover while understanding the math behind SVM) The equation of the main separator line is called a hyperplane equation.