svm loss function python

Compute the gradient of the SVM loss function

stackoverflow.com › questions › 36020583 › compute-the-gradient-of-the-svm-loss-function

The method to calculate gradient in this case is Calculus (analytically, NOT numerically!). So we differentiate loss function with respect to W(yi) like this:

and with respect to W(j) when j!=yi is:

The 1 is just indicator function so we can ignore the middle form when condition is true. And when you write in code, the example you provided is the answer.

Since you are using cs231n example, you should definitely check note and videos if needed.

Hope this helps!

Answer from dexhunter on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 66740435 › svm-loss-function

python - SVM Loss Function - Stack Overflow

def svm_loss_naive(W, X, y): """ SVM loss function, naive implementation calculating loss for each sample using loops. Inputs: - X: A numpy array of shape (n, m) containing data(samples).

Stack Overflow

stackoverflow.com › questions › 36020583 › compute-the-gradient-of-the-svm-loss-function

python - Compute the gradient of the SVM loss function - Stack Overflow

Top answer

1 of 3

The method to calculate gradient in this case is Calculus (analytically, NOT numerically!). So we differentiate loss function with respect to W(yi) like this:

and with respect to W(j) when j!=yi is:

The 1 is just indicator function so we can ignore the middle form when condition is true. And when you write in code, the example you provided is the answer.

Since you are using cs231n example, you should definitely check note and videos if needed.

Hope this helps!

2 of 3

If the substraction less than zero the loss is zero so the gradient of W is also zero. If the substarction larger than zero, then the gradient of W is the partial derviation of the loss.

Videos

05:30

YouTube

What is the Hinge Loss in SVM in Machine Learning | Data Science ...

Introduction to Hinge Loss | Loss function SVM | Machine Learning ...

February 16, 2023

22:18

YouTube

7.3.4. Loss Function for Support Vector Machine Classifier - Hinge ...

November 1, 2021

05:08

YouTube

4. Hinge Loss/Multi-class SVM Loss - YouTube

July 2, 2022

View all

CS231n

cs231n.github.io › linear-classify

CS231n Deep Learning for Computer Vision

Code. Here is the loss function (without regularization) implemented in Python, in both unvectorized and half-vectorized form: def L_i(x, y, W): """ unvectorized version. Compute the multiclass svm loss for a single example (x,y) - x is a column vector representing an image (e.g.

PyImageSearch

pyimagesearch.com › home › blog › multi-class svm loss

Multi-class SVM Loss - PyImageSearch

April 17, 2021 - There are only two possible class labels in this dataset and is therefore a 2-class problem which can be solved using a standard, binary SVM loss function. That said, let’s still apply Multi-class SVM loss so we can have a worked example on how to apply it.

freeCodeCamp

freecodecamp.org › news › support-vector-machines

SVM Machine Learning Algorithm Explained

January 24, 2020 - The following is code written for training, predicting and finding accuracy for SVM in Python: import numpy as np class Svm (object): """" Svm classifier """ def __init__ (self, inputDim, outputDim): self.W = None # - Generate a random svm weight matrix to compute loss # # with standard normal distribution and Standard deviation = 0.01. # sigma =0.01 self.W = sigma * np.random.randn(inputDim,outputDim) def calLoss (self, x, y, reg): """ Svm loss function D: Input dimension.

MaviccPRP@web.studio

maviccprp.github.io › a-support-vector-machine-in-just-a-few-lines-of-python-code

A Support Vector Machine in just a few Lines of Python Code

April 3, 2017 - We will use hinge loss for our SVM: $c$ is the loss function, $x$ the sample, $y$ is the true label, $f(x)$ the predicted label.

Stack Exchange

stats.stackexchange.com › questions › 529550 › adjusting-the-loss-function-for-support-vector-machines-for-svc-in-sklearn

python - Adjusting the loss function for Support Vector Machines for SVC in sklearn - Cross Validated

$\begingroup$ I'm not sure if it's possible to make this change in sklearn, but this seems like a perfect job for CVXPY if you have a convex program (that depends on whether $\nu_i$ is an affine function). I've used it previously (actually, to implement the dual SVM algorithm) and it's quite well-documented -- it's plug-and-play with numpy mostly.

Stack Overflow

stackoverflow.com › questions › 33324830 › python-sklearn-svm-providing-a-custom-loss-function

scikit learn - Python : sklearn svm, providing a custom loss function - Stack Overflow

Top answer

1 of 1

To answer your question, unless you have a very good idea of why you want to define a custom kernel, I'd stick with the built-ins. They are very fast, flexible, and powerful, and are well-suited to most applications.

That being said, let's go into a bit more detail:

A Kernel Function is a special kind of measure of similarity between two points. Basically a larger value of the similarity means the points are more similar. The scikit-learn SVM is designed to be able to work with any kernel function. Several kernels built-in (e.g. linear, radial basis function, polynomial, sigmoid) but you can also define your own.

Your custom kernel function should look something like this:

def my_kernel(x, y):
    """Compute My Kernel

    Parameters
    ----------
    x : array, shape=(N, D)
    y : array, shape=(M, D)
        input vectors for kernel similarity

    Returns
    -------
    K : array, shape=(N, M)
        matrix of similarities between x and y
    """
    # ... compute something here ...
    return similarity_matrix

The most basic kernel, a linear kernel, would look like this:

def linear_kernel(x, y):
    return np.dot(x, y.T)

Equivalently, you can write

def linear_kernel_2(x, y):
    M = np.array([[1, 0],
                  [0, 1]])
    return np.dot(x, np.dot(M, y.T))

The matrix M here defines the so-called inner product space in which the kernel acts. This matrix can be modified to define a new inner product space; the custom function from the example you linked to just modifies M to effectively double the importance of the first dimension in determining the similarity.

More complicated non-linear modifications are possible as well, but you have to be careful: kernel functions must meet certain requirements (they must satisfy the properties of an inner-product space) or the SVM algorithm will not work correctly.

Find elsewhere

Google Bing Mojeek

GeeksforGeeks

geeksforgeeks.org › hinge-loss-relationship-with-support-vector-machines

Hinge-loss & relationship with Support Vector Machines - GeeksforGeeks

June 7, 2024 - We will study hard margin and soft margin SVM in detail latter. Let us first understand hinge loss. ... Hinge loss is used in binary classification problems where the objective is to separate the data points in two classes typically labeled as +1 and -1. Mathematically, Hinge loss for a data point can be represented as : ... In this case the product t.y will always be positive and its value greater than 1 and therefore the value of 1-t.y will be negative. So the loss function value max(0,1-t.y) will always be zero.

Wikipedia

en.wikipedia.org › wiki › Hinge_loss

Hinge loss - Wikipedia

January 26, 2026 - In machine learning, the hinge loss is a loss function used for training classifiers. The hinge loss is used for "maximum-margin" classification, most notably for support vector machines (SVMs).

Extensions Optimization

Quora

quora.com › What-are-the-different-types-of-loss-functions-that-can-be-used-with-support-vector-machines-SVMs-and-why-might-someone-choose-to-use-one-over-the-other

What are the different types of loss functions that can be used with support vector machines (SVMs), and why might someone choose to use one over the other? - Quora

Answer: Support vector machines ... In order to train an SVM, one needs to specify a loss function which defines the goal of the optimization process. Different loss functions will......

Stack Overflow

stackoverflow.com › questions › 38581242 › gradient-of-a-loss-function-for-an-svm

python - Gradient of a Loss Function for an SVM - Stack Overflow

Top answer

1 of 1

I don't have enough reputation to comment, so I am answering here. Whenever you compute loss vector for x[i], ith training example and get some nonzero loss, that means you should move your weight vector for the incorrect class (j != y[i]) away by x[i], and at the same time, move the weights or hyperplane for the correct class (j==y[i]) near x[i]. By parallelogram law, w + x lies in between w and x. So this way w[y[i]] tries to come nearer to x[i] each time it finds loss>0.

Thus, dW[:,y[i]] += -X[i] and dW[:,j] += X[i] is done in the loop, but while update, we will do in direction of decreasing gradient, so we are essentially adding X[i] to correct class weights and going away by X[i] from weights that miss classify.

Medium

medium.com › analytics-vidhya › loss-functions-multiclass-svm-loss-and-cross-entropy-loss-9190c68f13e0

Loss Functions — Multiclass SVM Loss and Cross Entropy Loss | by Ramji Balasubramanian | Analytics Vidhya | Medium

December 24, 2020 - image_2 =max(0, 3.76 — (-1.20) + 1) + max(0, -3.81 — (-1.20) + 1)image_3 =max(0, -2.37 — (-2.27) + 1) + max(0, 1.03 — (-2.27) + 1)loss = (image_1 + image_2 + image_3) / 3.0 ... Our goal here is to classify our input image(Panda) as Dog, Cat or Panda. This involves three steps. Step 1 — We will get the scoring value for each of the three classes as we got in Multiclass SVM based on the used function.

Stack Overflow

stackoverflow.com › questions › 42971039 › how-to-vectorize-loss-in-svm

python - How to vectorize loss in SVM - Stack Overflow

Top answer

1 of 1

Here's a vectorized approach -

delta = 1
N = X.shape[0]
M = W.shape[1]
scoresv = X.dot(W)
marginv = scoresv - scoresv[np.arange(N), y][:,None] + delta

mask0 = np.zeros((N,M),dtype=bool)
mask0[np.arange(N),y] = 1
mask = (marginv<0) | mask0
marginv[mask] = 0

loss_out = marginv.sum()/num_train # mean
loss_out += 0.5 * reg * np.sum(W * W) # l2 regularization

Additionally, we could optimize np.sum(W * W) with np.tensordot, like so -

float(np.tensordot(W,W,axes=((0,1),(0,1))))

Runtime test

Proposed approach as function -

def svm_loss_vectorized_v2(W, X, y, reg):
    delta = 1
    N = X.shape[0]
    M = W.shape[1]
    scoresv = X.dot(W)
    marginv = scoresv - scoresv[np.arange(N), y][:,None] + delta

    mask0 = np.zeros((N,M),dtype=bool)
    mask0[np.arange(N),y] = 1
    mask = (marginv<=0) | mask0
    marginv[mask] = 0

    loss_out = marginv.sum()/num_train # mean
    loss_out += 0.5 * reg * float(np.tensordot(W,W,axes=((0,1),(0,1))))
    return loss_out

Timings -

In [86]: W= np.random.randn(3073,10)
    ...: X= np.random.randn(500,3073)
    ...: y= np.random.randint(0,10,(500))
    ...: reg = 4.56
    ...: 

In [87]: svm_loss_naive(W, X, y, reg)
Out[87]: 70380.938069371899

In [88]: svm_loss_vectorized_v2(W, X, y, reg)
Out[88]: 70380.938069371914

In [89]: %timeit svm_loss_naive(W, X, y, reg)
100 loops, best of 3: 10.2 ms per loop

In [90]: %timeit svm_loss_vectorized_v2(W, X, y, reg)
100 loops, best of 3: 2.94 ms per loop

scikit-learn

scikit-learn.org › stable › modules › generated › sklearn.metrics.hinge_loss.html

hinge_loss — scikit-learn 1.8.0 documentation

>>> import numpy as np >>> X = np.array([[0], [1], [2], [3]]) >>> Y = np.array([0, 1, 2, 3]) >>> labels = np.array([0, 1, 2, 3]) >>> est = svm.LinearSVC() >>> est.fit(X, Y) LinearSVC() >>> pred_decision = est.decision_function([[-1], [2], [3]]) >>> y_true = [0, 2, 3] >>> hinge_loss(y_true, pred_decision, labels=labels) 0.56 ·

University of Oxford

robots.ox.ac.uk › ~az › lectures › ml › lect2.pdf pdf

Lecture 2: The SVM classifier

• Support Vector Machine (SVM) classifier · • Wide margin · • Cost function · • Slack variables · • Loss functions revisited · • Optimization · Binary Classification · Given training data (xi, yi) for i = 1 . . . N, with · xi ∈Rd and yi ∈{−1, 1}, learn a classiﬁer f(x) such that ·

Stack Exchange

stats.stackexchange.com › questions › 155088 › gradient-for-hinge-loss-multiclass

Gradient for hinge loss multiclass - Cross Validated

Top answer

1 of 2

Let's use the example of the SVM loss function for a single datapoint:

$L_i = \sum_{j\neq y_i} \left[ \max(0, w_j^Tx_i - w_{y_i}^Tx_i + \Delta) \right]$

Where $\Delta$ is the desired margin.

We can differentiate the function with respect to the weights. For example, taking the gradient with respect to $w_{yi}$ we obtain:

$\nabla_{w_{y_i}} L_i = - \left( \sum_{j\neq y_i} \mathbb{1}(w_j^Tx_i - w_{y_i}^Tx_i + \Delta > 0) \right) x_i$

Where 1 is the indicator function that is one if the condition inside is true or zero otherwise. While the expression may look scary when it is written out, when you're implementing this in code you'd simply count the number of classes that didn't meet the desired margin (and hence contributed to the loss function) and then the data vector $x_i$ scaled by this number is the gradient. Notice that this is the gradient only with respect to the row of $W$ that corresponds to the correct class. For the other rows where $j≠{{y}_{i}}$ the gradient is:

$\nabla_{w_j} L_i = \mathbb{1}(w_j^Tx_i - w_{y_i}^Tx_i + \Delta > 0) x_i$

Once you derive the expression for the gradient it is straight-forward to implement the expressions and use them to perform the gradient update.

Taken from Stanford CS231N optimization notes posted on github.

2 of 2

First of all, note that multi-class hinge loss function is a function of $W_r$. \begin{equation} l(W_r) = \max( 0, 1 + \underset{r \neq y_i}{ \max } W_r \cdot x_i - W_{y_i} \cdot x_i) \end{equation} Next, max function is non-differentiable at $0$. So, we need to calculate the subgradient of it. \begin{equation} \frac{\partial l(W_r)}{\partial W_r} = \begin{cases} \{0\}, & W_{y_i}\cdot x_i > 1 + \underset{r \neq y_i}{ \max } W_r \cdot x_i \\ \{x_i\}, & W_{y_i}\cdot x_i < 1 + \underset{r \neq y_i}{ \max } W_r \cdot x_i\\ \{\alpha x_i\}, & \alpha \in [0,1], W_{y_i}\cdot x_i = 1 + \underset{r \neq y_i}{ \max } W_r \cdot x_i \end{cases} \end{equation} In the second case, $W_{y_i}$ is independent of $W_r$. Above definition of subgradient of multi-class hinge loss is similar to subgradient of binary class hinge loss.

Medium

medium.com › swlh › support-vector-machine-machine-learning-in-python-5befb92ba3d0

Support Vector Machine: Machine Learning in Python | by Divyansh Chaudhary | The Startup | Medium

January 25, 2021 - For an intended output t = ±1 and a classifier y, the hinge loss of the prediction is defined as: where t is the output of the SVM given input x, and y is the true class [-1,1]. Note: sklearn library is used only to create a Dataset using make_classification() function. This blog is a summary of Support Vector Machine and its Math involved in Python.

HackerNoon

hackernoon.com › hinge-loss-a-steadfast-loss-evaluation-function-for-the-svm-classification-models-in-ai-and-ml

Hinge Loss - A Steadfast Loss Evaluation Function for the SVM Classification Models in AI & ML | HackerNoon

January 4, 2023 - Researchers use an algebraic acme called “Losses” in order to optimise the machine learning space defined by a specific use case.

EITCA

eitca.org › home › what is the role of the loss function in svm training?

What is the role of the loss function in SVM training? - EITCA Academy

August 7, 2023 - In SVM training, the loss function is used to quantify the error or discrepancy between the predicted outputs of the SVM model and the true labels of the training data. The goal of training an SVM is to find the optimal hyperplane that maximally separates the different classes in the data.