
Consider building an SVM over the (very little) data set shown in Picture for an example like this, the maximum margin weight vector will be parallel to the shortest line connecting points of the two classes, that is, the line between and , giving a weight vector of . The optimal decision surface is orthogonal to that line and intersects it at the halfway point. Therefore, it passes through . So, the SVM decision boundary is:

Working algebraically, with the standard constraint that , we seek to minimize . This happens when this constraint is satisfied with equality by the two support vectors. Further we know that the solution is for some . So we have that:

Therefore a=2/5 and b=-11/5, and . So the optimal hyperplane is given by

and b= -11/5 .
The margin boundary is
This answer can be confirmed geometrically by examining picture.
Answer from Ehsan Keramat on Stack OverflowVideos
Let be a point in the hyperplane
, i.e.,
. To measure the distance between hyperplanes
and
, we only need to compute the perpendicular distance from
to plane
, denoted as
.
Note that is a unit normal vector of the hyperplane
. We have
since
should be a point in hyperplane
according to our definition of
.
Expanding this equation, we have \begin{align*} & \textbf{wx}_0 + r\frac{\textbf{w}\textbf{w}}{\|\textbf{w}\|} - b = 1 \\ \implies &\textbf{wx}_0 + r\frac{\|\textbf{w}\|^2}{\|\textbf{w}\|} - b = 1 \\ \implies &\textbf{wx}_0 + r\|\textbf{w}\| - b = 1 \\ \implies &\textbf{wx}_0 - b = 1 - r\|\textbf{w}\| \\ \implies &-1 = 1 - r\|\textbf{w}\|\\ \implies & r = \frac{2}{\|\textbf{w}\|} \end{align*}

Let be a positive example on one gutter, such that
Let be a negative example on another gutter, such that
The width of margin is the scalar projection of on unit normal vector , that is the dot production of
and
\begin{align} width & = (\textbf{x}_+ - \textbf{x}_-) \cdot \frac{\textbf{w}}{\|\textbf{w}\|} \\ & = \frac {(\textbf{x}_+ - \textbf{x}_-) \cdot {\textbf{w}}}{\|\textbf{w}\|} \\ & = \frac{\textbf{x}_+ \cdot \textbf{w} \,{\bf -}\, \textbf{x}_-\cdot \textbf{w}}{\|\textbf{w}\|} \\ & = \frac{1-b-(-1-b)}{\lVert \textbf{w} \rVert} \\ & = \frac{2}{\|\textbf{w}\|} \end{align}
The above refers to MIT 6.034 Artificial Intelligence
The optimization objective of SVM is to reduce w, b in such a way that we have the maximum margin with the hyperplane.
Mathematically speaking, it is a nonlinear optimization task which is solved by KKT (Karush-Kunn-Tucker) conditions, using lagrange multipliers.
The following video explains this in simple terms for linearly seperable case
https://www.youtube.com/watch?v=1NxnPkZM9bc
Also how this is calculated is better explained here for both linear and primal cases.
https://www.csie.ntu.edu.tw/~cjlin/talks/rome.pdf
The margin between the separating hyperplane and the class boundaries of an SVM is an essential feature of this algorithm.
See, you have two hyperplanes (1) w^tx+b>=1, if y=1 and (2) w^tx+b<=-1, if y=-1. This says that any vector with a label y=1 must lie ether on or behind the hyperplane (1). The same applies to the vectors with label y=-1 and hyperplane (2).
Note: If those requirements can be fulfilled, it implicitly means the dataset is linearly separatable. This makes sense because otherwise no such margin can be constructed.
So, what an SVM tries to find is a decision boundary which ist half-way between (1) and (2). Let's define this boundary as (3) w^tx+b=0. What you see here is that (1), (2) and (3) are parallel hyperplanes because they share the same parameters w and b. The parameters w holds the direction of those planes. Recall that a vector always has a direction and a magnitude/length.
The question is now: How can one calculate the hyperplane (3)? The equations (1) and (2) tell us that any vector with a label y=1 which is closest to (3) lies exactly on the hyperplane (1), hence (1) becomes w^tx+b=1 for such x. The similar applies for the closest vectors with a negative label and (2). Those vectors on the planes called 'support vectors' and the decision boundary (3) only depends on those, because one simply can subtract (2) from (1) for the support vectors and gets:
w^tx+b-w^tx+b=1-(-1) => wt^x-w^tx=2
Note: x for the two planes are different support vectors.
Now, we want to get the direction of w but ignoring it's length to get the shortest distance between (3) and the other planes. This distance is a perpendicular line segment from (3) to the others. To do so, one can divide by the length of w to get the norm vector which is perpendicular to (3), hence (wt^x-w^tx)/||w||=2/||w||. By ignoring the left hand site (it's equal) we see that the distance between the two planes is actually 2/||w||. This distance must be maximized.
Edit:
As others state here, use Lagrange multipliers or the SMO algorithm to minimize the term
1/2 ||w||^2
s.t. y(w^tx+b)>=1
This is the convex form of the optimization problem for the primal svm.
Solving the SVM problem by inspection
By inspection we can see that the boundary decision line is the function $x_2 = x_1 - 3$. Using the formula $w^T x + b = 0$ we can obtain a first guess of the parameters as
$$ w = [1,-1] \ \ b = -3$$
Using these values we would obtain the following width between the support vectors: $\frac{2}{\sqrt{2}} = \sqrt{2}$. Again by inspection we see that the width between the support vectors is in fact of length $4 \sqrt{2}$ meaning that these values are incorrect.
Recall that scaling the boundary by a factor of $c$ does not change the boundary line, hence we can generalize the equation as
$$ cx_1 - xc_2 - 3c = 0$$ $$ w = [c,-c] \ \ b = -3c$$
Plugging back into the equation for the width we get
\begin{aligned} \frac{2}{||w||} & = 4 \sqrt{2} \\ \frac{2}{\sqrt{2}c} & = 4 \sqrt{2} \\ c = \frac{1}{4} \end{aligned}
Hence the parameters are in fact $$ w = [\frac{1}{4},-\frac{1}{4}] \ \ b = -\frac{3}{4}$$
To find the values of $\alpha_i$ we can use the following two constraints which come from the dual problem:
$$ w = \sum_i^m \alpha_i y^{(i)} x^{(i)} $$ $$\sum_i^m \alpha_i y^{(i)} = 0 $$
And using the fact that $\alpha_i \geq 0$ for support vectors only (i.e. 3 vectors in this case) we obtain the system of simultaneous linear equations: \begin{aligned} \begin{bmatrix} 6 \alpha_1 - 2 \alpha_2 - 3 \alpha_3 \\ -1 \alpha_1 - 3 \alpha_2 - 4 \alpha_3 \\ 1 \alpha_1 - 2 \alpha_2 - 1 \alpha_3 \end{bmatrix} & = \begin{bmatrix} 1/4 \\ -1/4 \\ 0 \end{bmatrix} \\ \alpha & = \begin{bmatrix} 1/16 \\ 1/16 \\ 0 \end{bmatrix} \end{aligned}
Source
- https://ai6034.mit.edu/wiki/images/SVM_and_Boosting.pdf
- Full post here
Instead of computing the width between the support vectors (which in this case was easy because two of them happened to be directly across from each other over the decision line), it might be more convenient to use that the support vectors should have value $\pm1$ under the decision function:
$$ cx_1 - cx_2 -3c =0 $$
represents the line, but using the point $B=(2,3)$ with target $-1$ in the diagram, we should have
$$ c(2) - c(3) -3c =-1$$
and hence (again) $c=1/4$.