The optimization objective of SVM is to reduce w, b in such a way that we have the maximum margin with the hyperplane.

Mathematically speaking, it is a nonlinear optimization task which is solved by KKT (Karush-Kunn-Tucker) conditions, using lagrange multipliers.

The following video explains this in simple terms for linearly seperable case

https://www.youtube.com/watch?v=1NxnPkZM9bc

Also how this is calculated is better explained here for both linear and primal cases.

https://www.csie.ntu.edu.tw/~cjlin/talks/rome.pdf

Answer from codeslord on Stack Overflow
๐ŸŒ
GeeksforGeeks
geeksforgeeks.org โ€บ machine learning โ€บ using-a-hard-margin-vs-soft-margin-in-svm
Using a Hard Margin vs Soft Margin in SVM - GeeksforGeeks
July 23, 2025 - In the image, the hyperplane is defined by black solid line and the dashed lines on both the sides of hyperplane are margin. The data points falling on the margin are support vectors. The image illustrates a hard margin scenario, here there is no data point falling between the margins hence, ensuring perfect separation. Mathematically, for a linearly separable dataset, the decision function of a hard margin SVM can be expressed as:
๐ŸŒ
scikit-learn
scikit-learn.org โ€บ stable โ€บ auto_examples โ€บ svm โ€บ plot_svm_margin.html
SVM Margins Example โ€” scikit-learn 1.8.0 documentation
This is sqrt(1+a^2) away vertically in # 2-d. margin = 1 / np.sqrt(np.sum(clf.coef_**2)) yy_down = yy - np.sqrt(1 + a**2) * margin yy_up = yy + np.sqrt(1 + a**2) * margin # plot the line, the points, and the nearest vectors to the plane plt.figure(fignum, figsize=(4, 3)) plt.clf() plt.plot(xx, yy, "k-") plt.plot(xx, yy_down, "k--") plt.plot(xx, yy_up, "k--") plt.scatter( clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], s=80, facecolors="none", zorder=10, edgecolors="k", ) plt.scatter( X[:, 0], X[:, 1], c=Y, zorder=10, cmap=plt.get_cmap("RdBu"), edgecolors="k" ) plt.axis("tight") x_min =
๐ŸŒ
GeeksforGeeks
geeksforgeeks.org โ€บ machine learning โ€บ support-vector-machine-algorithm
Support Vector Machine (SVM) Algorithm - GeeksforGeeks
So from the above figure, we choose L2 as hard margin. Let's consider a scenario like shown below: ... Here, we have one blue ball in the boundary of the red ball. The blue ball in the boundary of red ones is an outlier of blue balls. The SVM algorithm has the characteristics to ignore the outlier and finds the best hyperplane that maximizes the margin.
Published ย  3 weeks ago
๐ŸŒ
Baeldung
baeldung.com โ€บ home โ€บ artificial intelligence โ€บ deep learning โ€บ using a hard margin vs. soft margin in svm
Using a Hard Margin vs. Soft Margin in SVM | Baeldung on Computer Science
February 13, 2025 - Letโ€™s start with a set of data points that we want to classify into two groups. We can consider two cases for these data: either they are linearly separable, or the separating hyperplane is non-linear. When the data is linearly separable, and we donโ€™t want to have any misclassifications, we use SVM with a hard margin.
๐ŸŒ
MathWorks
mathworks.com โ€บ statistics and machine learning toolbox โ€บ classification โ€บ support vector machine classification
margin - Find classification margins for support vector machine (SVM) classifier - MATLAB
Approximately 25% of the margins from the full model are less than those from the model with fewer predictors. This result suggests that the model trained with all the predictors is better. ... SVM classification model, specified as a ClassificationSVM model object or CompactClassificationSVM model object returned by fitcsvm or compact, respectively.
๐ŸŒ
EITCA
eitca.org โ€บ home โ€บ what is the significance of the margin in svm and how is it related to support vectors?
What is the significance of the margin in SVM and how is it related to support vectors? - EITCA Academy
August 7, 2023 - The margin is defined as the distance between the hyperplane and the nearest data points from each class. The larger the margin, the better the generalization performance of the SVM model. The significance of the margin lies in its ability to handle the trade-off between model complexity and ...
๐ŸŒ
MIT
web.mit.edu โ€บ 6.034 โ€บ wwwbob โ€บ svm-notes-long-08.pdf pdf
1 An Idiotโ€™s guide to Support vector machines (SVMs) R. Berwick, Village Idiot
Typically, there can be lots of input features xi. Output: set of weights w (or wi), one for each feature, whose linear combination predicts the value of y. (So far, ... The margin (gutter) of a separating hyperplane is d+ + dโ€“.
๐ŸŒ
Stanford NLP Group
nlp.stanford.edu โ€บ IR-book โ€บ html โ€บ htmledition โ€บ support-vector-machines-the-linearly-separable-case-1.html
Support vector machines: The linearly separable case
While some learning methods such ... be looking for a decision surface that is maximally far away from any data point. This distance from the decision surface to the closest data point determines the margin of the classifier....
Find elsewhere
๐ŸŒ
Medium
medium.com โ€บ @apurvjain37 โ€บ support-vector-machines-s-v-m-hyperplane-and-margins-ee2f083381b4
Support Vector Machines(S.V.M) โ€” Hyperplane and Margins | by apurv jain | Medium
September 25, 2020 - An SVM model is basically a representation of different classes in a hyperplane in multidimensional space. The hyperplane will be generated in an iterative manner by SVM so that the error can be minimized. The goal of SVM is to divide the datasets into classes to find a maximum marginal hyperplane ...
Top answer
1 of 2
1

The optimization objective of SVM is to reduce w, b in such a way that we have the maximum margin with the hyperplane.

Mathematically speaking, it is a nonlinear optimization task which is solved by KKT (Karush-Kunn-Tucker) conditions, using lagrange multipliers.

The following video explains this in simple terms for linearly seperable case

https://www.youtube.com/watch?v=1NxnPkZM9bc

Also how this is calculated is better explained here for both linear and primal cases.

https://www.csie.ntu.edu.tw/~cjlin/talks/rome.pdf

2 of 2
0

The margin between the separating hyperplane and the class boundaries of an SVM is an essential feature of this algorithm.

See, you have two hyperplanes (1) w^tx+b>=1, if y=1 and (2) w^tx+b<=-1, if y=-1. This says that any vector with a label y=1 must lie ether on or behind the hyperplane (1). The same applies to the vectors with label y=-1 and hyperplane (2).

Note: If those requirements can be fulfilled, it implicitly means the dataset is linearly separatable. This makes sense because otherwise no such margin can be constructed.

So, what an SVM tries to find is a decision boundary which ist half-way between (1) and (2). Let's define this boundary as (3) w^tx+b=0. What you see here is that (1), (2) and (3) are parallel hyperplanes because they share the same parameters w and b. The parameters w holds the direction of those planes. Recall that a vector always has a direction and a magnitude/length.

The question is now: How can one calculate the hyperplane (3)? The equations (1) and (2) tell us that any vector with a label y=1 which is closest to (3) lies exactly on the hyperplane (1), hence (1) becomes w^tx+b=1 for such x. The similar applies for the closest vectors with a negative label and (2). Those vectors on the planes called 'support vectors' and the decision boundary (3) only depends on those, because one simply can subtract (2) from (1) for the support vectors and gets:

w^tx+b-w^tx+b=1-(-1) => wt^x-w^tx=2

Note: x for the two planes are different support vectors.

Now, we want to get the direction of w but ignoring it's length to get the shortest distance between (3) and the other planes. This distance is a perpendicular line segment from (3) to the others. To do so, one can divide by the length of w to get the norm vector which is perpendicular to (3), hence (wt^x-w^tx)/||w||=2/||w||. By ignoring the left hand site (it's equal) we see that the distance between the two planes is actually 2/||w||. This distance must be maximized.

Edit: As others state here, use Lagrange multipliers or the SMO algorithm to minimize the term 1/2 ||w||^2 s.t. y(w^tx+b)>=1 This is the convex form of the optimization problem for the primal svm.

๐ŸŒ
Medium
medium.com โ€บ @nandiniverma78988 โ€บ support-vector-machines-optimizing-margin-for-classification-9de241ec2c0c
Support Vector Machines: Optimizing Margin for Classification | by NANDINI VERMA | Medium
December 2, 2023 - 3. Margin: โ€” The margin is the distance between the hyperplane and the nearest data point from either class. โ€” SVM aims to maximize this margin, which helps improve the generalization of the model.
Top answer
1 of 3
42

Let $\textbf{x}_0$ be a point in the hyperplane $\textbf{wx} - b = -1$, i.e., $\textbf{wx}_0 - b = -1$. To measure the distance between hyperplanes $\textbf{wx}-b=-1$ and $\textbf{wx}-b=1$, we only need to compute the perpendicular distance from $\textbf{x}_0$ to plane $\textbf{wx}-b=1$, denoted as $r$.

Note that $\frac{\textbf{w}}{\|\textbf{w}\|}$ is a unit normal vector of the hyperplane $\textbf{wx}-b=1$. We have $$ \textbf{w}(\textbf{x}_0 + r\frac{\textbf{w}}{\|\textbf{w}\|}) - b = 1 $$ since $\textbf{x}_0 + r\frac{\textbf{w}}{\|\textbf{w}\|}$ should be a point in hyperplane $\textbf{wx}-b = 1$ according to our definition of $r$.

Expanding this equation, we have \begin{align*} & \textbf{wx}_0 + r\frac{\textbf{w}\textbf{w}}{\|\textbf{w}\|} - b = 1 \\ \implies &\textbf{wx}_0 + r\frac{\|\textbf{w}\|^2}{\|\textbf{w}\|} - b = 1 \\ \implies &\textbf{wx}_0 + r\|\textbf{w}\| - b = 1 \\ \implies &\textbf{wx}_0 - b = 1 - r\|\textbf{w}\| \\ \implies &-1 = 1 - r\|\textbf{w}\|\\ \implies & r = \frac{2}{\|\textbf{w}\|} \end{align*}

2 of 3
21

Let $\textbf{x}_+$ be a positive example on one gutter, such that $$\textbf{w} \cdot \textbf{x}_+ - b = 1$$

Let $\textbf{x}_-$ be a negative example on another gutter, such that $$\textbf{w} \cdot \textbf{x}_- - b = -1$$

The width of margin is the scalar projection of $\textbf{x}_+ - \textbf{x}_-$ on unit normal vector , that is the dot production of $\textbf{x}_+ - \textbf{x}_-$ and $\frac{\textbf{w}}{\|\textbf{w}\|}$

\begin{align} width & = (\textbf{x}_+ - \textbf{x}_-) \cdot \frac{\textbf{w}}{\|\textbf{w}\|} \\ & = \frac {(\textbf{x}_+ - \textbf{x}_-) \cdot {\textbf{w}}}{\|\textbf{w}\|} \\ & = \frac{\textbf{x}_+ \cdot \textbf{w} \,{\bf -}\, \textbf{x}_-\cdot \textbf{w}}{\|\textbf{w}\|} \\ & = \frac{1-b-(-1-b)}{\lVert \textbf{w} \rVert} \\ & = \frac{2}{\|\textbf{w}\|} \end{align}

The above refers to MIT 6.034 Artificial Intelligence

๐ŸŒ
EITCA
eitca.org โ€บ home โ€บ what is the significance of the margin in a support vector machine (svm)?
What is the significance of the margin in a support vector machine (SVM)? - EITCA Academy
August 7, 2023 - It allows the SVM to have a better tolerance for noise and outliers in the training data, leading to a more robust model. The margin also helps in achieving a balance between maximizing the separation between classes and minimizing the classification error. The SVM algorithm aims to find the hyperplane that maximizes the margin while ensuring that the data points are correctly classified.
Top answer
1 of 4
42

"A geometric margin is simply the euclidean distance between a certain x (data point) to the hyperlane. "

I don't think that is a proper definition for the geometric margin, and I believe that is what is confusing you. The geometric margin is just a scaled version of the functional margin.

You can think the functional margin, just as a testing function that will tell you whether a particular point is properly classified or not. And the geometric margin is functional margin scaled by ||w||

If you check the formula:

You can notice that independently of the label, the result would be positive for properly classified points (e.g sig(1*5)=1 and sig(-1*-5)=1) and negative otherwise. If you scale that by ||w|| then you will have the geometric margin.

Why does the geometric margin exists?

Well to maximize the margin you need more that just the sign, you need to have a notion of magnitude, the functional margin would give you a number but without a reference you can't tell if the point is actually far away or close to the decision plane. The geometric margin is telling you not only if the point is properly classified or not, but the magnitude of that distance in term of units of |w|

2 of 4
10

The functional margin represents the correctness and confidence of the prediction if the magnitude of the vector(w^T) orthogonal to the hyperplane has a constant value all the time.

By correctness, the functional margin should always be positive, since if wx + b is negative, then y is -1 and if wx + b is positive, y is 1. If the functional margin is negative then the sample should be divided into the wrong group.

By confidence, the functional margin can change due to two reasons: 1) the sample(y_i and x_i) changes or 2) the vector(w^T) orthogonal to the hyperplane is scaled (by scaling w and b). If the vector(w^T) orthogonal to the hyperplane remains the same all the time, no matter how large its magnitude is, we can determine how confident the point is grouped into the right side. The larger that functional margin, the more confident we can say the point is classified correctly.

But if the functional margin is defined without keeping the magnitude of the vector(w^T) orthogonal to the hyperplane the same, then we define the geometric margin as mentioned above. The functional margin is normalized by the magnitude of w to get the geometric margin of a training example. In this constraint, the value of the geometric margin results only from the samples and not from the scaling of the vector(w^T) orthogonal to the hyperplane.

The geometric margin is invariant to the rescaling of the parameter, which is the only difference between geometric margin and functional margin.

EDIT:

The introduction of functional margin plays two roles: 1) intuit the maximization of geometric margin and 2) transform the geometric margin maximization issue to the minimization of the magnitude of the vector orthogonal to the hyperplane.

Since scaling the parameters w and b can result in nothing meaningful and the parameters are scaled in the same way as the functional margin, then if we can arbitrarily make the ||w|| to be 1(results in maximizing the geometric margin) we can also rescale the parameters to make them subject to the functional margin being 1(then minimize ||w||).

๐ŸŒ
Wikipedia
en.wikipedia.org โ€บ wiki โ€บ Support_vector_machine
Support vector machine - Wikipedia
1 week ago - In machine learning, support vector machines (SVMs, also support vector networks) are supervised max-margin models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratories, SVMs are one of the most studied models, ...
๐ŸŒ
scikit-learn
scikit-learn.org โ€บ 1.5 โ€บ auto_examples โ€บ svm โ€บ plot_svm_margin.html
SVM Margins Example โ€” scikit-learn 1.5.2 documentation
This is sqrt(1+a^2) away vertically in # 2-d. margin = 1 / np.sqrt(np.sum(clf.coef_**2)) yy_down = yy - np.sqrt(1 + a**2) * margin yy_up = yy + np.sqrt(1 + a**2) * margin # plot the line, the points, and the nearest vectors to the plane plt.figure(fignum, figsize=(4, 3)) plt.clf() plt.plot(xx, yy, "k-") plt.plot(xx, yy_down, "k--") plt.plot(xx, yy_up, "k--") plt.scatter( clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], s=80, facecolors="none", zorder=10, edgecolors="k", ) plt.scatter( X[:, 0], X[:, 1], c=Y, zorder=10, cmap=plt.get_cmap("RdBu"), edgecolors="k" ) plt.axis("tight") x_min =
๐ŸŒ
Towards Data Science
towardsdatascience.com โ€บ home โ€บ latest โ€บ support vector machines โ€“ soft margin formulation and kernel trick
Support Vector Machines - Soft Margin Formulation and Kernel Trick | Towards Data Science
January 21, 2025 - Let us compare this with SVMโ€™s objective which handles the linearly separable cases (as given below). ... We see that only ฮพ_i terms are extra in the modified objective and everything else is the same. Point to note: In the final solution, ฮป_is corresponding to points that are closest to the margin and on the wrong side of the margin (i.e.
๐ŸŒ
Quora
quora.com โ€บ What-is-the-intuition-behind-margin-in-SVM
What is the intuition behind margin in SVM? - Quora
Answer: Let's say you've found a hyperplane that completely separates the two classes in your training set. We expect that when new data comes along (i.e. your test set), the new data will look like your training data. Points that should be classified as one class or the other should lie near the...
๐ŸŒ
DEV Community
dev.to โ€บ harsimranjit_singh_0133dc โ€บ support-vector-machines-from-hard-margin-to-soft-margin-1bj1
Support Vector Machines: From Hard Margin to Soft Margin - DEV Community
August 12, 2024 - The term "Hard Margin" comes from the fact that the algorithm requires all data points to be classified with a margin of at least 1. In other words, there are no allowances for misclassification. These strict requirements are why it's called a "hard" margin ยท The goal of Hard margin SVM is to maximize the margin between the two classes.
๐ŸŒ
Quora
quora.com โ€บ What-is-the-mathematical-definition-of-margin-in-support-vector-machine-SVM
What is the mathematical definition of margin in support vector machine (SVM)? - Quora
Answer (1 of 2): Iโ€™ve explained SVMs in detail here โ€” In layman's terms, how does SVM work? โ€” including what is the margin. In short, you want to find a line that separates the points in two classes, while being as far as possible from each class. So in the figure below, the bold line ...