You can get the support vectors using clf.support_vectors_.

Plotting the support vectors:

import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm

# we create 40 separable points
np.random.seed(0)
X = np.r_[np.random.randn(20, 2) - [2, 2], np.random.randn(20, 2) + [2, 2]]
Y = [0] * 20 + [1] * 20

# fit the model
clf = svm.SVC(kernel='linear', C=1)
clf.fit(X, Y)

# get the separating hyperplane
w = clf.coef_[0]
a = -w[0] / w[1]
xx = np.linspace(-5, 5)
yy = a * xx - (clf.intercept_[0]) / w[1]


margin = 1 / np.sqrt(np.sum(clf.coef_ ** 2))
yy_down = yy - np.sqrt(1 + a ** 2) * margin
yy_up = yy + np.sqrt(1 + a ** 2) * margin

plt.figure(1, figsize=(4, 3))
plt.clf()
plt.plot(xx, yy, 'k-')
plt.plot(xx, yy_down, 'k--')
plt.plot(xx, yy_up, 'k--')

plt.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], s=80,
            facecolors='none', zorder=10, edgecolors='k')
plt.scatter(X[:, 0], X[:, 1], c=Y, zorder=10, cmap=plt.cm.Paired,
            edgecolors='k')

plt.axis('tight')
x_min = -4.8
x_max = 4.2
y_min = -6
y_max = 6

XX, YY = np.mgrid[x_min:x_max:200j, y_min:y_max:200j]
Z = clf.predict(np.c_[XX.ravel(), YY.ravel()])

# Put the result into a color plot
Z = Z.reshape(XX.shape)
plt.figure(1, figsize=(4, 3))
plt.pcolormesh(XX, YY, Z, cmap=plt.cm.Paired)

plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)

plt.xticks(())
plt.yticks(())

plt.show()

Answer from seralouk on Stack Overflow
🌐
MIT
web.mit.edu › 6.034 › wwwbob › svm-notes-long-08.pdf pdf
1 An Idiot’s guide to Support vector machines (SVMs) R. Berwick, Village Idiot
similar feature vectors that make the same prediction. Subcase 2: xi,and xj make opposite predictions about the output value yi (ie, one is · +1, the other –1), but are otherwise very closely similar: then the product aiajyiyjxix is · negative and we are subtracting it, so this adds to the sum, maximizing it. This is precisely · the examples we are looking for: the critical ones that tell the two classses apart.
🌐
GeeksforGeeks
geeksforgeeks.org › machine learning › support-vector-machine-algorithm
Support Vector Machine (SVM) Algorithm - GeeksforGeeks
This transformation helps SVM find a decision boundary even for non-linear data. ... A kernel is a function that maps data points into a higher-dimensional space without explicitly computing the coordinates in that space. This allows SVM to work efficiently with non-linear data by implicitly performing the mapping. For example consider data points that are not linearly separable.
Published   3 weeks ago
🌐
scikit-learn
scikit-learn.org › stable › modules › svm.html
1.4. Support Vector Machines — scikit-learn 1.8.0 documentation
The support vector machines in scikit-learn support both dense (numpy.ndarray and convertible to that by numpy.asarray) and sparse (any scipy.sparse) sample vectors as input. However, to use an SVM to make predictions for sparse data, it must have been fit on such data.
🌐
Quora
quora.com › How-can-you-find-support-vectors-when-programming-SVM
How to find support vectors when programming SVM - Quora
Answer: It is simple if you are using a quadratic solver(in most cases we use it). The vector returned from quadratic solver be \alpha = (\alpha_1, \alpha_2, ....., \alpha_N) where N \, is the number of training examples used to train SVM. This vector will have non-negative values ( \alpha_i \g...
Top answer
1 of 2
1

You can get the support vectors using clf.support_vectors_.

Plotting the support vectors:

import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm

# we create 40 separable points
np.random.seed(0)
X = np.r_[np.random.randn(20, 2) - [2, 2], np.random.randn(20, 2) + [2, 2]]
Y = [0] * 20 + [1] * 20

# fit the model
clf = svm.SVC(kernel='linear', C=1)
clf.fit(X, Y)

# get the separating hyperplane
w = clf.coef_[0]
a = -w[0] / w[1]
xx = np.linspace(-5, 5)
yy = a * xx - (clf.intercept_[0]) / w[1]


margin = 1 / np.sqrt(np.sum(clf.coef_ ** 2))
yy_down = yy - np.sqrt(1 + a ** 2) * margin
yy_up = yy + np.sqrt(1 + a ** 2) * margin

plt.figure(1, figsize=(4, 3))
plt.clf()
plt.plot(xx, yy, 'k-')
plt.plot(xx, yy_down, 'k--')
plt.plot(xx, yy_up, 'k--')

plt.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], s=80,
            facecolors='none', zorder=10, edgecolors='k')
plt.scatter(X[:, 0], X[:, 1], c=Y, zorder=10, cmap=plt.cm.Paired,
            edgecolors='k')

plt.axis('tight')
x_min = -4.8
x_max = 4.2
y_min = -6
y_max = 6

XX, YY = np.mgrid[x_min:x_max:200j, y_min:y_max:200j]
Z = clf.predict(np.c_[XX.ravel(), YY.ravel()])

# Put the result into a color plot
Z = Z.reshape(XX.shape)
plt.figure(1, figsize=(4, 3))
plt.pcolormesh(XX, YY, Z, cmap=plt.cm.Paired)

plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)

plt.xticks(())
plt.yticks(())

plt.show()

2 of 2
1

Let me assume we are talking about libsvm instead of sklearn svc.

The answer can be found in the LIBLINEAR FAQ. In short, you can't. You need to modify the source code.

Q: How could I know which training instances are support vectors?

Some LIBLINEAR solvers consider the primal problem, so support vectors are not obtained during the training procedure. For dual solvers, we output only the primal weight vector w, so support vectors are not stored in the model. This is different from LIBSVM.

To know support vectors, you can modify the following loop in solve_l2r_l1l2_svc() of linear.cpp to print out indices:

    for(i=0; i<l; i++)
    {
        v += alpha[i]*(alpha[i]*diag[GETI(i)] - 2);
        if(alpha[i] > 0)
            ++nSV;
    }

Note that we group data in the same class together before calling this subroutine. Thus the order of your training instances has been changed. You can sort your data (e.g., positive instances before negative ones) before using liblinear. Then indices will be the same.

🌐
Quora
quora.com › How-does-a-SVM-choose-its-support-vectors
How does a SVM choose its support vectors? - Quora
We are only interested in positive values because [math]x_i [/math] corresponding to those non zero [math]\alpha_i [/math] forms the support vectors of our hyperplane where [math](x_i, \, y_i)[/math] is a training example. EDIT: The best learning resource for SVM online is ... Monitor credit for new loans. Get expert help to restore it. Spot unusual credit activity with NordProtect - from $0.99 per month.
Top answer
1 of 1
3

In hard-margin SVM, "If the training data is linearly separable, we can select two parallel hyperplanes that separate the two classes of data, so that the distance between them is as large as possible".

The first thing to do is plot the points to see whether linear separability holds.

The colored regions show the convex hulls of the two sets of points. Clearly we don't have to consider any points within their interiors: they cannot serve as support points. Thus, we can ignore altogether.

Because the linear separability is visually evident, we can easily draw some line that separates these classes. The line works, for instance, as shown. Parallel to that line are the lines of the form for numbers Any value of between and will also separate the classes.

The line evidently lies at a distance from class 1 of and a distance from class -1 of Those distances are found by dropping straight down from the vertex to the line or moving straight up from the vertex to the same line. This enables us to connect these vertices of the classes with the path

where the middle edge moves along the line itself.

On the other hand, the dotted segment between and follows the straight path

This path consists of the hypotenuses of two right triangles. Two of their edges are the (vertical) segments projecting the support points to the separating line. Consequently the length of the dotted line is at least as great as the distance between and perpendicular to the separating line.

On the other hand, if we were to construct a perpendicular at any point along the dashed segment, the path would coincide with the dotted line.

The previous inequality, valid for all separating lines, reduces to an equality in this special case.

You can exploit these ideas to prove generally that

the optimal hard-margin hyperplane bisects any line segment that realizes the shortest distance between two linearly separable subsets of

The support vectors can be read off the figure. The distance from this hyperplane to either class is half the distance between the classes.

As an exercise, redo this analysis replacing the point in class 1 with the point As a hint, a line segment realizing the shortest distance now goes from to

(Why did I write "a" line segment? Redo the problem once more upon replacing by the pair of points You should find many line segments that realize the shortest distances between the classes.)


An hour or so spent studying convex analysis -- definitions of convexity, of convex hulls, characterizations and properties of convex sets, and the proofs of equivalence of those characterizations -- will have an immediate payoff for the intuition as well as your understanding of SVM (and of many fundamental ideas of optimization, too).

Find elsewhere
Top answer
1 of 1
12

Short answer

The support vectors are those points for which the Lagrange multipliers are not zero (there is more than just $b$ in a Support Vector Machine).

Long answer

Hard Margin

For a simple hard-margin SVM, we have to solve the following minimisation problem:

$$\min_{\boldsymbol{w}, b} \frac{1}{2} \|\boldsymbol{w}\|^2$$ subject to $$\forall i : y_i (\boldsymbol{w} \cdot \boldsymbol{x}_i + b) - 1 \geq 0$$

The solution can be found with help of Lagrange multipliers $\alpha_i$. In the process of minimising the Lagrange function, it can be found that $$\boldsymbol{w} = \sum_i \alpha_i y_i \boldsymbol{x}_i.$$ Therefore, $\boldsymbol{w}$ only depends on those samples for which $\alpha_i \neq 0$.

Additionally, the Karush-Kuhn-Tucker conditions require that the solution satisfies $$\alpha_i (y_i (\boldsymbol{w} \cdot \boldsymbol{x}_i + b) - 1) = 0.$$ In order to compute $b$, the constraint for sample $i$ must be tight, i.e. $\alpha_i > 0$, so that $y_i (\boldsymbol{w} \cdot \boldsymbol{x}_i + b) - 1 = 0$. Hence, $b$ depends only on those samples for which $\alpha_i > 0$.

Therefore, we can conclude that the solution depends on all samples for which $\alpha_i > 0$.

Soft Margin

For the C-SVM, which seems to be known as soft-margin SVM, the minimisation problem is given by:

$$\min_{\boldsymbol{w}, b} \frac{1}{2} \|\boldsymbol{w}\|^2 + C \sum_i \xi_i$$ subject to $$\forall i : \begin{aligned}y_i (\boldsymbol{w} \cdot \boldsymbol{x}_i + b) - 1 + \xi_i & \geq 0 \\ \xi_i &\geq 0\end{aligned}$$

Using Lagrange multipliers $\alpha_i$ and $\lambda_i = (C - \alpha_i)$, the weights are (again) given by $$\boldsymbol{w} = \sum_i \alpha_i y_i \boldsymbol{x}_i,$$ and therefore $\boldsymbol{w}$ does depends only on samples for which $\alpha_i \neq 0$.

Due to the Karush-Kuhn-Tucker conditions, the solution must satisfy

$$\begin{align} \alpha_i (y_i (\boldsymbol{w} \cdot \boldsymbol{x}_i + b) - 1 + \xi_i) & = 0 \\ (C - \alpha_i) \xi_i & = 0, \end{align}$$

which allows to compute $b$ if $\alpha_i > 0$ and $\xi_i = 0$. If both constraints are tight, i.e. $\alpha_i < C$, $\xi_i$ must be zero. Therefore, $b$ depends on those samples for which $0 < \alpha_i < C$.

Therefore, we can conclude that the solution depends on all samples for which $\alpha_i > 0$. After all, $\boldsymbol{w}$ still depends on those samples for which $\alpha_i = C$.

Top answer
1 of 3
11

Solving the SVM problem by inspection

By inspection we can see that the boundary decision line is the function $x_2 = x_1 - 3$. Using the formula $w^T x + b = 0$ we can obtain a first guess of the parameters as

$$ w = [1,-1] \ \ b = -3$$

Using these values we would obtain the following width between the support vectors: $\frac{2}{\sqrt{2}} = \sqrt{2}$. Again by inspection we see that the width between the support vectors is in fact of length $4 \sqrt{2}$ meaning that these values are incorrect.

Recall that scaling the boundary by a factor of $c$ does not change the boundary line, hence we can generalize the equation as

$$ cx_1 - xc_2 - 3c = 0$$ $$ w = [c,-c] \ \ b = -3c$$

Plugging back into the equation for the width we get

\begin{aligned} \frac{2}{||w||} & = 4 \sqrt{2} \\ \frac{2}{\sqrt{2}c} & = 4 \sqrt{2} \\ c = \frac{1}{4} \end{aligned}

Hence the parameters are in fact $$ w = [\frac{1}{4},-\frac{1}{4}] \ \ b = -\frac{3}{4}$$

To find the values of $\alpha_i$ we can use the following two constraints which come from the dual problem:

$$ w = \sum_i^m \alpha_i y^{(i)} x^{(i)} $$ $$\sum_i^m \alpha_i y^{(i)} = 0 $$

And using the fact that $\alpha_i \geq 0$ for support vectors only (i.e. 3 vectors in this case) we obtain the system of simultaneous linear equations: \begin{aligned} \begin{bmatrix} 6 \alpha_1 - 2 \alpha_2 - 3 \alpha_3 \\ -1 \alpha_1 - 3 \alpha_2 - 4 \alpha_3 \\ 1 \alpha_1 - 2 \alpha_2 - 1 \alpha_3 \end{bmatrix} & = \begin{bmatrix} 1/4 \\ -1/4 \\ 0 \end{bmatrix} \\ \alpha & = \begin{bmatrix} 1/16 \\ 1/16 \\ 0 \end{bmatrix} \end{aligned}

Source

  • https://ai6034.mit.edu/wiki/images/SVM_and_Boosting.pdf
  • Full post here
2 of 3
1

Instead of computing the width between the support vectors (which in this case was easy because two of them happened to be directly across from each other over the decision line), it might be more convenient to use that the support vectors should have value $\pm1$ under the decision function:

$$ cx_1 - cx_2 -3c =0 $$

represents the line, but using the point $B=(2,3)$ with target $-1$ in the diagram, we should have

$$ c(2) - c(3) -3c =-1$$

and hence (again) $c=1/4$.

🌐
Analytics Vidhya
analyticsvidhya.com › home › support vector machine (svm)
Support Vector Machine (SVM)
April 21, 2025 - SVM is defined such that it is defined in terms of the support vectors only, we don’t have to worry about other observations since the margin is made using the points which are closest to the hyperplane (support vectors), whereas in logistic regression the classifier is defined over all the points. Hence SVM enjoys some natural speed-ups. Let’s understand the working of SVM using an example. Suppose we have a dataset that has two classes (green and blue). We want to classify that the new data point as either blue or green. To classify these points, we can have many decision boundaries, but the question is which is the best and how do we find it?
🌐
Machinecurve
machinecurve.com › index.php › 2020 › 05 › 05 › how-to-visualize-support-vectors-of-your-svm-classifier
How to visualize support vectors of your SVM classifier? | MachineCurve.com
May 5, 2020 - Building further on top of an existing MachineCurve blog article, which constructs and trains a simple binary SVM classifier, we then looked at how support vectors for an SVM can be visualized. By using Python and Scikit-learn, we provided a step-by-step example of how to do this. The end result: a nice Matplotlib-based plot with visualized support vectors.
🌐
Medium
medium.com › @agrawalsam1997 › support-vectors-in-svm-5c66497a5f51
Support Vectors in SVM. Support Vector Machine is a supervised… | by Saurav Agrawal | Medium
June 27, 2023 - # Subsetting the support vectors. X_small = X[svc.support_] y_small = y[svc.support_] # SVM Classifier fitted with the support vectors.
🌐
Saedsayad
saedsayad.com › support_vector_machine.htm
Support Vector Machine - Classification (SVM)
Copyright © 2010-2024, Dr. Saed Sayad · We reached a milestone, "one million pageviews" in 2018
🌐
Wikipedia
en.wikipedia.org › wiki › Support_vector_machine
Support vector machine - Wikipedia
1 week ago - SVMs can also be used for regression tasks, where the objective becomes ... The support vector clustering algorithm, created by Hava Siegelmann and Vladimir Vapnik, applies the statistics of support vectors, developed in the support vector machines algorithm, to categorize unlabeled data. These data sets require unsupervised learning approaches, which attempt to find ...
🌐
Stanford NLP Group
nlp.stanford.edu › IR-book › html › htmledition › support-vector-machines-the-linearly-separable-case-1.html
Support vector machines: The linearly separable case
Since we can scale the functional ... large SVMs, let us choose to require that the functional margin of all data points is at least 1 and that it is equal to 1 for at least one data vector. That is, for all items in the data: and there exist support vectors for which the inequality is an equality. Since each example's distance from the hyperplane is , the geometric margin is . Our desire is still to maximize this geometric margin. That is, we want to find and such ...
🌐
Medium
medium.com › analytics-vidhya › basics-of-support-vector-machine-svm-ba6e923dc7b3
Basics of Support Vector Machine (SVM) | by Jinesh Choudhary | Analytics Vidhya | Medium
February 28, 2024 - Now let us suppose the decision surface is ax+by+c=0 and since support vectors are parallel to the decision surface and equidistant from the decision surface, support vectors line equation can be represented as ax+by+c=-1 and ax+by+c=1. ( Here ...
🌐
ScienceDirect
sciencedirect.com › topics › computer-science › support-vector
Support Vector - an overview | ScienceDirect Topics
Support vector learning The Support Vector Machine is a discriminative approach that attempts to induce an accurate classification model using examples of each class closest to the decision boundary between them. These examples are known as “support vectors” . The learning goal of an SVM is to find a maximally separating “hyperplane” between support vectors in m -dimensional space , that leads to the greatest reduction in training error.
🌐
Harvard-iacs
harvard-iacs.github.io › 2018-CS109A › lectures › lecture-20 › presentation › lecture20_svm.pdf pdf
CS109A Introduction to Data Science Pavlos Protopapas and Kevin Rader
Lecture 20: Support Vector Machines (SVMs) CS109A, PROTOPAPAS, RADER · Outline · • · Classifying Linear Separable Data · • · Classifying Linear Non-Separable Data · • · Kernel Trick · Text Reading: Ch. 9, p. 337-356 · 2 · CS109A, PROTOPAPAS, RADER ·
🌐
GitHub
github.com › christianversloot › machine-learning-articles › blob › main › how-to-visualize-support-vectors-of-your-svm-classifier.md
machine-learning-articles/how-to-visualize-support-vectors-of-your-svm-classifier.md at main · christianversloot/machine-learning-articles
May 5, 2020 - Model optimization is performed by finding a maximum-margin decision boundary for the hyperplane, by using so called support vectors (hence the name of the model class). Support vectors lie at the ‘front line’ between the two classes and are of importance for separating the data. By maximizing the margin between the hyperplane and those support vectors, the confidence about separability between the samples is maximized, and so is model performance. SVMs can be used efficiently with linearly separable data.
Author   christianversloot