weight vector svm tutorial

sciencedirect.com › topics › physics-and-astronomy › weight-vector

1 of 5

75

For a general kernel it is difficult to interpret the SVM weights, however for the linear SVM there actually is a useful interpretation:

1) Recall that in linear SVM, the result is a hyperplane that separates the classes as best as possible. The weights represent this hyperplane, by giving you the coordinates of a vector which is orthogonal to the hyperplane - these are the coefficients given by svm.coef_. Let's call this vector w.

2) What can we do with this vector? It's direction gives us the predicted class, so if you take the dot product of any point with the vector, you can tell on which side it is: if the dot product is positive, it belongs to the positive class, if it is negative it belongs to the negative class.

3) Finally, you can even learn something about the importance of each feature. This is my own interpretation so convince yourself first. Let's say the svm would find only one feature useful for separating the data, then the hyperplane would be orthogonal to that axis. So, you could say that the absolute size of the coefficient relative to the other ones gives an indication of how important the feature was for the separation. For example if only the first coordinate is used for separation, w will be of the form (x,0) where x is some non zero number and then |x|>0.

2 of 5

28

I am trying to interpret the variable weights given by fitting a linear SVM.

A good way to understand how the weights are calculated and how to interpret them in the case of linear SVM is to perform the calculations by hand on a very simple example.

Example

Consider the following dataset which is linearly separable

import numpy as np
X = np.array([[3,4],[1,4],[2,3],[6,-1],[7,-1],[5,-3]] )
y = np.array([-1,-1, -1, 1, 1 , 1 ])

Solving the SVM problem by inspection

By inspection we can see that the boundary line that separates the points with the largest "margin" is the line $\text{[math]}$ . Since the weights of the SVM are proportional to the equation of this decision line (hyperplane in higher dimensions) using $\text{[math]}$ a first guess of the parameters would be

$\text{[math]}$

SVM theory tells us that the "width" of the margin is given by $\text{[math]}$ . Using the above guess we would obtain a width of $\text{[math]}$ . which, by inspection is incorrect. The width is $\text{[math]}$

Recall that scaling the boundary by a factor of $\text{[math]}$ does not change the boundary line, hence we can generalize the equation as

$$ cx_1 - cx_2 - 3c = 0$$ $\text{[math]}$

Plugging back into the equation for the width we get

\begin{aligned} \frac{2}{||w||} & = 4 \sqrt{2} \\ \frac{2}{\sqrt{2}c} & = 4 \sqrt{2} \\ c = \frac{1}{4} \end{aligned}

Hence the parameters (or coefficients) are in fact $\text{[math]}$

(I'm using scikit-learn)

So am I, here's some code to check our manual calculations

from sklearn.svm import SVC
clf = SVC(C = 1e5, kernel = 'linear')
clf.fit(X, y) 
print('w = ',clf.coef_)
print('b = ',clf.intercept_)
print('Indices of support vectors = ', clf.support_)
print('Support vectors = ', clf.support_vectors_)
print('Number of support vectors for each class = ', clf.n_support_)
print('Coefficients of the support vector in the decision function = ', np.abs(clf.dual_coef_))

w = [[ 0.25 -0.25]] b = [-0.75]

Indices of support vectors = [2 3]

Support vectors = [[ 2. 3.] [ 6. -1.]]

Number of support vectors for each class = [1 1]

Coefficients of the support vector in the decision function = [[0.0625 0.0625]]

Does the sign of the weight have anything to do with class?

Not really, the sign of the weights has to do with the equation of the boundary plane.

Source

https://ai6034.mit.edu/wiki/images/SVM_and_Boosting.pdf

ScienceDirect

Weight Vector - an overview | ScienceDirect Topics

SVMs search for an optimal separating hyperplane between classes that maximizes the margin, i.e., the distance from hyperplane to points closest to it on either side. This results in a classifier of the form f(x) = sign(wTx). The model parameters are obtained by solving the following convex ...

Videos

youtube.com

SVM Part 7 | Finding the weight vector and bias for the ...

14:28

YouTube

11 - Support Vector Machine (SVM) | Machine Learning Distilled ...

September 1, 2025

28:44

YouTube

Support Vector Machines (SVM) - the basics | simply explained - ...

August 15, 2022

14:58

YouTube

Support Vector Machines: All you need to know!

June 23, 2020

View all

Altair RapidMiner

docs.rapidminer.com › 2024.1 › studio › operators › modeling › feature_weights › weight_by_svm.html

Weight by SVM - Altair RapidMiner Documentation

This operator calculates the relevance of the attributes by computing for each attribute of the input ExampleSet the weight with respect to the class attribute. The coefficients of a hyperplane calculated by an SVM (Support Vector Machine) are set as attribute weights.

MIT

web.mit.edu › 6.034 › wwwbob › svm-notes-long-08.pdf pdf

1 An Idiot’s guide to Support vector machines (SVMs) R. Berwick, Village Idiot

Only the support vectors (on the gutters or margin) will have nonzero · weights or a’s – this reduces the dimensionality of the solution · Why should inner product kernels be involved in pattern · recognition using SVMs, or at all? – Intuition is that inner products provide some measure of ·

scikit-learn

scikit-learn.org › stable › modules › svm.html

1.4. Support Vector Machines — scikit-learn 1.8.0 documentation

The C value that yields a “null” model (all weights equal to zero) can be calculated using l1_min_c. ... polynomial: $(\gamma \langle x, x'\rangle + r)^d$, where $d$ is specified by parameter degree, $r$ by coef0. rbf: $\exp(-\gamma \|x-x'\|^2)$, where $\gamma$ is specified by parameter gamma, must be greater than 0. sigmoid $\tanh(\gamma \langle x,x'\rangle + r)$, where $r$ is specified by coef0. Different kernels are specified by the kernel parameter: >>> linear_svc = svm.SVC(kernel='linear') >>> linear_svc.kernel 'linear' >>> rbf_svc = svm.SVC(kernel='rbf') >>> rbf_svc.kernel 'rbf'

stackoverflow.com › questions › 44033646 › decision-boundary-and-weight-vector-in-svm

algorithm - Decision boundary and weight vector in SVM - Stack Overflow

I have some confusions regarding SVM as I don't have much of a mathematical background. Let the equation of hyperplane(in any dimension) be w'x+b=0, now I know that weight vector w is orthogonal to this hyperplane.

Brainvoyager

brainvoyager.com › bv › doc › UsersGuide › MVPA › SupportVectorMachinesSVMs.html

Support Vector Machines (SVMs)

For linearly separable data, the SVM produces a discriminant function with the largest possible margin and since the decision line separates the two classes without error, it is referred to as the hard margin SVM. It can be shown that finding the maximal margin corresponds to solving an optimization problem, which involves minimizing the term ½||w||2 under the constraint that all exemplars are classified correctly. The term ||w|| refers to the norm or length of a vector, which is obtained as the square root of the scalar product of the weight vector with itself: ||w|| = sqrt(ww).

Medium

medium.com › analytics-vidhya › basics-of-support-vector-machine-svm-ba6e923dc7b3

Basics of Support Vector Machine (SVM) | by Jinesh Choudhary | Analytics Vidhya | Medium

February 28, 2024 - now the last but not least how do we represent a line as vector? a line vector is represented by the dot product of two matrices where the first matrix represents weight and second represents the dimensions.

Find elsewhere

Google Bing Mojeek

GeeksforGeeks

geeksforgeeks.org › machine learning › support-vector-machine-algorithm

Support Vector Machine (SVM) Algorithm - GeeksforGeeks

10:50

The main goal of SVM is to maximize the margin between the two classes. The larger the margin the better the model performs on new and unseen data. Hyperplane: A decision boundary separating different classes in feature space and is represented by the equation wx + b = 0 in linear classification. Support Vectors: The closest data points to the hyperplane, crucial for determining the hyperplane and margin in SVM.

Published 1 week ago

stackoverflow.com › questions › 31539303 › how-to-use-weight-vector-of-svm-and-logistic-regression-for-feature-importance

machine learning - How to use Weight vector of SVM and logistic regression for feature importance? - Stack Overflow

1 of 1

ogrisel.github.io › scikit-learn.org › sklearn-tutorial › auto_examples › svm › plot_weighted_samples.html

As I answered to similar question, weight vector of any linear classifier indicates feature importance: simply because final value is a linear combination of feature values with weights as coefficients, so the bigger weight, the more impact to the final value is caused by the corresponding summand.

Thus, for linear classifier you can take features with biggest weights (not with biggest values of the feature itself, or the biggest product of weight and feature value).

It also explains why SVM with non-linear kernels like RBF don't have such a property: both feature values and weights are transformed into another space and you can't say that the bigger weight leads to bigger impact, see wiki.

If you need to select most important features for non-linear SVM, use special methods for feature selection, namely wrapper methods.

scikit-learn

SVM: Weighted samples — scikit-learn 0.11-git documentation

Previous Support Vector R... Support Vector Regression (SVR) using linear and non-linear kernels · Next Plot the decisio... Plot the decision surface of a decision tree on the iris dataset ... If you use the software, please consider citing scikit-learn. ... Plot decision function of a weighted dataset, where the size of points is proportional to its weight. ... print __doc__ import numpy as np import pylab as pl from sklearn import svm # we create 20 points np.random.seed(0) X = np.r_[np.random.randn(10, 2) + [1, 1], np.random.randn(10, 2)] Y = [1] * 10 + [-1] * 10 sample_weight = 100 * np.a

Altair RapidMiner

docs.rapidminer.com › 8.1 › studio › operators › modeling › feature_weights › weight_by_svm.html

Weight by SVM - RapidMiner Documentation

This operator calculates the relevance of the attributes by computing for each attribute of the input ExampleSet the weight with respect to the class attribute. The coefficients of a hyperplane calculated by an SVM (Support Vector Machine) are set as attribute weights.

CopyProgramming

copyprogramming.com › howto › how-to-calculate-weight-vector-in-svm

Calculating the Weight Vector for SVM - Machine learning

May 28, 2023 - For those using linear SVM, a Python script is available that utilizes the model file generated by svm_learn and svm_perf_ learning. The weight used in this script is simply w=SUM_i (y_i*alpha_i*sv_i) , with sv_i representing the support vector and y_i indicating the category from the trained ...

stackoverflow.com › questions › 20830789 › svmlight-what-is-the-purpose-of-finding-the-weight-vector

machine learning - SVMlight - what is the purpose of finding the weight vector? - Stack Overflow

1 of 2

2

In linear case, the hyperplane can be always defined with d+1 numbers, where d is the dimension of the input space, while the number of actual support vectors may be much larger. By computing this hyperplane (lets call it w) you get more compact model, which can be then used to perform a classification:

cl(x) = sgn(w'x + b)

where w' is a transpositon of w

Things get much more tricky in the kernelized version, as w is in the form of the feature space projection, which may be unknown (or to expensive to compute) so one cannot get an equation of such an object (as it is no longer a hyperplane in the input space, but rather - a hyperplane in very rich feature space).

2 of 2

nlp.stanford.edu › IR-book › html › htmledition › support-vector-machines-the-linearly-separable-case-1.html

"Support vectors are the elements of the training set that would change the position of the dividing hyperplane if removed." The weights represent this hyperplane by providing the coordinates of a vector that is orthogonal to the hyperplane. "Computes the weighted sum of the support vectors" mathematically means sign(w'*x +b), when x is the support vectors and w' is the transpose of weight vectors, the value of w'x+b is 0 and it represents the decision boundary. When a new x reaches, the sign(w'x+b) will determine which class it belongs to.

For those x in the training sample that have the weight of 0, it means the sample does not contribute to the hyperplane, and including that x as a support vector will either increase the classification error, or decrease the margin.

Here is a reference tutorial with plenty of figures for more details.

Stanford NLP Group

Support vector machines: The linearly separable case

svm_learn -c 1 -a alphas.dat train.dat model.dat The -c 1 option is needed to turn off use of the slack variables that we discuss in Section 15.2.1 . Check that the norm of the weight vector agrees with what we found in small-svm-eg.

IEEE Xplore

ieeexplore.ieee.org › document › 1555965

Weighted support vector machine for data classification | IEEE Conference Publication | IEEE Xplore

This paper presents a weighted support vector machine (WSVM) to improve the outlier sensitivity problem of standard support vector machine (SVM) for two-class data classification.

stackoverflow.com › questions › 22329653 › svm-vector-of-weights

machine learning - SVM vector of weights - Stack Overflow

1 of 1

researchgate.net › post › What-is-the-Weight-vector-parameter-in-Support-Vector-Machine-in-machine-learning

If you are implementing linear SVM, there is a Python script based on the model file output by svm_learn and svm_perf_learn. To be more specific, the weight is just w=SUM_i (y_i*alpha_i*sv_i) where sv_i is the support vector, y_i is the category from trained sample.

If you are using non linear SVM, I don't think the weights coefficients are directly related to the input space. Yet you can get the decision function:

f(x) = sgn( SUM_i (alpha_i*y_i*K(sv_i,x)) + b );

where K is your kernel function.

ResearchGate

What is the Weight vector parameter in Support Vector Machine in machine learning? | ResearchGate

It depends if you talk about the linearly separable or non-linearly separable case. In the former, the weight vector can be explicitly retrieved and represents the separating hyper-plane between the two classes. The weight associated to each input dimension (predictor) gives information about its relevance for the discrimination of the two classes. In the non-linear case, the hyper-plane is only implicitly defined in a higher dimensional dot-product space by means of the "kernel trick" mapping (e.g. Gaussian kernel replacing the dot product). When using non-linear kernels more sophisticated feature selection techniques are needed for the analysis of the relevance of input predictors.

1 of 7

3

2 of 7

0

Your question is not entirely clear. I'll assume that you are referring to dual weights (typically denoted by the vector alpha). All predictions for SVM models -- and more generally models resulting from kernel methods -- can be expressed as a linear combination of kernel evaluations between (some) training instances (the support vectors) and the test instance. This follows from the so-called representer theorem (cfr. the link). The coefficients in this linear combination are the dual weights (alpha's) multiplied by the label corresponding to each training instance (y's). http://alex.smola.org/papers/2001/SchHerSmo01.pdf

Stack Exchange

math.stackexchange.com › questions › 2286035 › decision-boundary-and-weight-vector-in-svm

algorithms - Decision boundary and weight vector in SVM - Mathematics Stack Exchange

1 of 1