For a general kernel it is difficult to interpret the SVM weights, however for the linear SVM there actually is a useful interpretation:

1) Recall that in linear SVM, the result is a hyperplane that separates the classes as best as possible. The weights represent this hyperplane, by giving you the coordinates of a vector which is orthogonal to the hyperplane - these are the coefficients given by svm.coef_. Let's call this vector w.

2) What can we do with this vector? It's direction gives us the predicted class, so if you take the dot product of any point with the vector, you can tell on which side it is: if the dot product is positive, it belongs to the positive class, if it is negative it belongs to the negative class.

3) Finally, you can even learn something about the importance of each feature. This is my own interpretation so convince yourself first. Let's say the svm would find only one feature useful for separating the data, then the hyperplane would be orthogonal to that axis. So, you could say that the absolute size of the coefficient relative to the other ones gives an indication of how important the feature was for the separation. For example if only the first coordinate is used for separation, w will be of the form (x,0) where x is some non zero number and then |x|>0.

Answer from Bitwise on Stack Exchange

For a general kernel it is difficult to interpret the SVM weights, however for the linear SVM there actually is a useful interpretation:

1) Recall that in linear SVM, the result is a hyperplane that separates the classes as best as possible. The weights represent this hyperplane, by giving you the coordinates of a vector which is orthogonal to the hyperplane - these are the coefficients given by svm.coef_. Let's call this vector w.

2) What can we do with this vector? It's direction gives us the predicted class, so if you take the dot product of any point with the vector, you can tell on which side it is: if the dot product is positive, it belongs to the positive class, if it is negative it belongs to the negative class.

3) Finally, you can even learn something about the importance of each feature. This is my own interpretation so convince yourself first. Let's say the svm would find only one feature useful for separating the data, then the hyperplane would be orthogonal to that axis. So, you could say that the absolute size of the coefficient relative to the other ones gives an indication of how important the feature was for the separation. For example if only the first coordinate is used for separation, w will be of the form (x,0) where x is some non zero number and then |x|>0.

Answer from Bitwise on Stack Exchange
Top answer
1 of 5
75

For a general kernel it is difficult to interpret the SVM weights, however for the linear SVM there actually is a useful interpretation:

1) Recall that in linear SVM, the result is a hyperplane that separates the classes as best as possible. The weights represent this hyperplane, by giving you the coordinates of a vector which is orthogonal to the hyperplane - these are the coefficients given by svm.coef_. Let's call this vector w.

2) What can we do with this vector? It's direction gives us the predicted class, so if you take the dot product of any point with the vector, you can tell on which side it is: if the dot product is positive, it belongs to the positive class, if it is negative it belongs to the negative class.

3) Finally, you can even learn something about the importance of each feature. This is my own interpretation so convince yourself first. Let's say the svm would find only one feature useful for separating the data, then the hyperplane would be orthogonal to that axis. So, you could say that the absolute size of the coefficient relative to the other ones gives an indication of how important the feature was for the separation. For example if only the first coordinate is used for separation, w will be of the form (x,0) where x is some non zero number and then |x|>0.

2 of 5
28

I am trying to interpret the variable weights given by fitting a linear SVM.

A good way to understand how the weights are calculated and how to interpret them in the case of linear SVM is to perform the calculations by hand on a very simple example.

Example

Consider the following dataset which is linearly separable

import numpy as np
X = np.array([[3,4],[1,4],[2,3],[6,-1],[7,-1],[5,-3]] )
y = np.array([-1,-1, -1, 1, 1 , 1 ])

Solving the SVM problem by inspection

By inspection we can see that the boundary line that separates the points with the largest "margin" is the line . Since the weights of the SVM are proportional to the equation of this decision line (hyperplane in higher dimensions) using a first guess of the parameters would be

SVM theory tells us that the "width" of the margin is given by . Using the above guess we would obtain a width of . which, by inspection is incorrect. The width is

Recall that scaling the boundary by a factor of does not change the boundary line, hence we can generalize the equation as

$$ cx_1 - cx_2 - 3c = 0$$

Plugging back into the equation for the width we get

\begin{aligned} \frac{2}{||w||} & = 4 \sqrt{2} \\ \frac{2}{\sqrt{2}c} & = 4 \sqrt{2} \\ c = \frac{1}{4} \end{aligned}

Hence the parameters (or coefficients) are in fact


(I'm using scikit-learn)

So am I, here's some code to check our manual calculations

from sklearn.svm import SVC
clf = SVC(C = 1e5, kernel = 'linear')
clf.fit(X, y) 
print('w = ',clf.coef_)
print('b = ',clf.intercept_)
print('Indices of support vectors = ', clf.support_)
print('Support vectors = ', clf.support_vectors_)
print('Number of support vectors for each class = ', clf.n_support_)
print('Coefficients of the support vector in the decision function = ', np.abs(clf.dual_coef_))
  • w = [[ 0.25 -0.25]] b = [-0.75]
  • Indices of support vectors = [2 3]
  • Support vectors = [[ 2. 3.] [ 6. -1.]]
  • Number of support vectors for each class = [1 1]
  • Coefficients of the support vector in the decision function = [[0.0625 0.0625]]

Does the sign of the weight have anything to do with class?

Not really, the sign of the weights has to do with the equation of the boundary plane.

 

Source

https://ai6034.mit.edu/wiki/images/SVM_and_Boosting.pdf

🌐
Wikipedia
en.wikipedia.org › wiki › Support_vector_machine
Support vector machine - Wikipedia
1 week ago - In machine learning, support vector machines (SVMs, also support vector networks) are supervised max-margin models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratories, SVMs are one of the most studied models, ...
🌐
GeeksforGeeks
geeksforgeeks.org › machine learning › support-vector-machine-algorithm
Support Vector Machine (SVM) Algorithm - GeeksforGeeks
The main goal of SVM is to maximize the margin between the two classes. The larger the margin the better the model performs on new and unseen data. Hyperplane: A decision boundary separating different classes in feature space and is represented by the equation wx + b = 0 in linear classification. Support Vectors: The closest data points to the hyperplane, crucial for determining the hyperplane and margin in SVM.
Published   3 weeks ago
🌐
scikit-learn
scikit-learn.org › stable › modules › svm.html
1.4. Support Vector Machines — scikit-learn 1.8.0 documentation
The C value that yields a “null” model (all weights equal to zero) can be calculated using l1_min_c. ... polynomial: \((\gamma \langle x, x'\rangle + r)^d\), where \(d\) is specified by parameter degree, \(r\) by coef0. rbf: \(\exp(-\gamma \|x-x'\|^2)\), where \(\gamma\) is specified by parameter gamma, must be greater than 0. sigmoid \(\tanh(\gamma \langle x,x'\rangle + r)\), where \(r\) is specified by coef0. Different kernels are specified by the kernel parameter: >>> linear_svc = svm.SVC(kernel='linear') >>> linear_svc.kernel 'linear' >>> rbf_svc = svm.SVC(kernel='rbf') >>> rbf_svc.kernel 'rbf'
🌐
Brainvoyager
brainvoyager.com › bv › doc › UsersGuide › MVPA › SupportVectorMachinesSVMs.html
Support Vector Machines (SVMs)
For linearly separable data, the SVM produces a discriminant function with the largest possible margin and since the decision line separates the two classes without error, it is referred to as the hard margin SVM. It can be shown that finding the maximal margin corresponds to solving an optimization problem, which involves minimizing the term ½||w||2 under the constraint that all exemplars are classified correctly. The term ||w|| refers to the norm or length of a vector, which is obtained as the square root of the scalar product of the weight vector with itself: ||w|| = sqrt(ww).
🌐
ScienceDirect
sciencedirect.com › topics › physics-and-astronomy › weight-vector
Weight Vector - an overview | ScienceDirect Topics
SVMs search for an optimal separating hyperplane between classes that maximizes the margin, i.e., the distance from hyperplane to points closest to it on either side. This results in a classifier of the form f(x) = sign(wTx). The model parameters are obtained by solving the following convex ...
Find elsewhere
🌐
Medium
medium.com › analytics-vidhya › basics-of-support-vector-machine-svm-ba6e923dc7b3
Basics of Support Vector Machine (SVM) | by Jinesh Choudhary | Analytics Vidhya | Medium
February 28, 2024 - now the last but not least how do we represent a line as vector? a line vector is represented by the dot product of two matrices where the first matrix represents weight and second represents the dimensions.
Top answer
1 of 1
1

Yes, it is the equation of a general hyperplane.

As to the second question: suppose for some $r>0$ we have two hyperplanes $w_r\cdot x_1 + b_r = r$ and $w_r\cdot x_{-1} + b_r = -r$. Take two points, $x_1$ and $x_{-1}$, one on each of the hyperplanes, then we can use the Cauchy-Schwartz inequality to find a lower bound on the distance between these two points. (Note that the distance between $x_1$ and $x_{-1}$ is just $\|x_1-x_{-1}\|$) \begin{align} \frac{2r}{\|w_r\|} &= \frac{(r-b_r) -(-r-b_r)}{\|w_r\|} = \frac{w_r\cdot x_1 - w_r\cdot x_{-1}}{\|w_r\|} =\frac{w_r}{\|w_r\|}\cdot (x_1-x_{-1})\\ &\overset{\text{C.S.}}{\leq} \underbrace{\left\|\frac{w_r}{\|w_r\|} \right\|}_{=\,1}\;\|x_1-x_{-1}\| = \|x_1-x_{-1}\| \end{align}

The margin is the minimum distance from a point on one plane to another. This minimum is obtained (by Cauchy-Schwartz) if $x_1-x_{-1}$ is parallel to $w$ (and therefore perpendicular to the hyperplane, which is geometrically intuitive).

So the margin would have a width of $\frac{2r}{\|w_r\|}$.

However, this form of the SVM may be expressed as $$\text{Minimize}\quad \|w_r\|\quad\text{s.t.}\quad y_i(w_r\cdot x_i+b_r) \geq r\; \text{for $i=1,\dotsc,n$}$$ By defining $w_r = rw_1$ and $b_r=rb_1$, $$\text{Minimize}\quad \|w_r\|=r\|w_1\|\quad\text{s.t.}\quad y_i(w_r/r\cdot x_i+b_r/r) \geq 1\; \text{for $i=1,\dotsc,n$}$$ which is the same as the program: $$\text{Minimize}\quad \|w_1\|\quad\text{s.t.}\quad y_i(w_1\cdot x_i+b_1) \geq 1\; \text{for $i=1,\dotsc,n$}$$

So, though the first and last minimization problems produce different normal vectors and biases, $w_r$, $w_1$, $b_r$, $b_1$; the width of margin remains unchanged, i.e., the margin width is independent of $r$, as:

$$\frac{2r}{\|w_r\|} = \frac{2r}{\|(rw_1)\|}= \frac{2}{\|w_1\|}$$

🌐
Stack Overflow
stackoverflow.com › questions › 44033646 › decision-boundary-and-weight-vector-in-svm
algorithm - Decision boundary and weight vector in SVM - Stack Overflow
I have some confusions regarding SVM as I don't have much of a mathematical background. Let the equation of hyperplane(in any dimension) be w'x+b=0, now I know that weight vector w is orthogonal to this hyperplane.
🌐
Altair RapidMiner
docs.rapidminer.com › 9.10 › studio › operators › modeling › feature_weights › weight_by_svm.html
Weight by SVM - Altair RapidMiner Documentation
The Weight by SVM operator uses the coefficients of the normal vector of a linear SVM as attribute weights. In contrast to most of the SVM based operators available in RapidMiner, this operator works for multiple classes too. Please note that the attribute values still have to be numerical.
🌐
IEEE Xplore
ieeexplore.ieee.org › document › 1555965
Weighted support vector machine for data classification | IEEE Conference Publication | IEEE Xplore
The basic idea is to assign different weights to different data points such that the WSVM training algorithm learns the decision surface according to the relative importance of data points in the training data set.
🌐
Stanford NLP Group
nlp.stanford.edu › IR-book › html › htmledition › support-vector-machines-the-linearly-separable-case-1.html
Support vector machines: The linearly separable case
Let us formalize an SVM with algebra. A decision hyperplane (page 14.4 ) can be defined by an intercept term and a decision hyperplane normal vector which is perpendicular to the hyperplane. This vector is commonly referred to in the machine learning literature as the weight vector .
🌐
arXiv
arxiv.org › pdf › 1306.3161 pdf
SVM+ and Weighted SVM
Help | Advanced Search · arXiv is a free distribution service and an open-access archive for nearly 2.4 million scholarly articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and ...
🌐
ScienceDirect
sciencedirect.com › science › article › abs › pii › S0167947324001622
Weighted support vector machine for extremely imbalanced data - ScienceDirect
November 4, 2024 - Based on an asymptotically optimal weighted support vector machine (SVM) that introduces label shift, a systematic procedure is derived for applying oversampling and weighted SVM to extremely imbalanced datasets with a cluster-structured positive ...
🌐
ResearchGate
researchgate.net › figure › Discrimination-map-or-support-vector-machine-SVM-weight-vector-continuous-versus_fig2_51780180
Discrimination map or support vector machine (SVM) weight vector :... | Download Scientific Diagram
The SVM weight vector is a linear combination or weighted average of the support vectors, that is the training examples that are most difficult to separate and define the decision boundary.