Brave Search

Soft margin in linear support vector machine using python

stackoverflow.com › questions › 48804198 › soft-margin-in-linear-support-vector-machine-using-python

As for scipy.optimize, you misuse its optimization methods. Both Newton-CG and BFGS assume your cost function is smooth, which is not the case. If you use a robust gradient-free method, like Nelder-Mead, you will converge to the right point in most cases (I have tried it).

Your problem can be theoretically solved by gradient descent, but only if you adapt it to a non-smooth function. Currently, your algorithm approaches optimum quickly, but starts jumping around instead of converging, due to a large learning rate combined with a sharp change in gradient where the maximum in the cost function changes from 0 to positive:

You can calm these oscillations down by decreasing learning rate each time when your costs fails to decrease relative to the previous iteration

def train(self):

    #----------Optimize using scipy.optimize----------
    if self.method=='optimize':
        opt=optimize.minimize(self.costFunc,self.theta,args=(self.xdata,self.ydata),\
                jac=self.jac,method='BFGS')
        self.theta=opt.x

    #---------Optimize using Gradient descent---------
    elif self.method=='GD':
        costs=[]
        lr=self.lr

        for ii in range(self.n_iter):
            dj=self.jac(self.theta,self.xdata,self.ydata)
            old_theta = self.theta.copy()
            self.theta=self.theta-lr*dj
            cii=self.costFunc(self.theta,self.xdata,self.ydata)

            # if cost goes up, decrease learning rate and restore theta
            if len(costs) > 0 and cii > costs[-1]:
                lr *= 0.9
                self.theta = old_theta
            costs.append(cii)

        self.costs=numpy.array(costs)

    return self

This small amendment to your code results in much better convergence:

and in resulting parameters which are pretty close to the optimal - like [0.50110433 0.50076661] or [0.50092616 0.5007394 ].

In modern applications (like neural networks) this adaptation of learning rate is implemented within advanced gradient descent algorithms like ADAM, which constantly track changes in mean and variance of the gradient.

Update. This second part of the answer concerns the secont version of the code.

About ADAM. You got exploding vt because of the line vt=vt/(1-beta2**t). You should normalize only the value of vt used to calculate a gradient step, not the value that goes to the next iteration. Like here:

...
mt=beta1*mt_1+(1-beta1)*dj
vt=beta2*vt_1+(1-beta2)*dj**2
mt_temp=mt/(1-beta1**t)
vt_temp=vt/(1-beta2**t)
old_theta=self.theta
self.theta=self.theta-lr*mt_temp/(numpy.sqrt(vt_temp)+epsilon)
mt_1=mt
vt_1=vt
...

About instability. Both Nelder-Mead method and gradient descent depend on initial value of the parameters, that's the sad truth. You can try to improve convergence by making more iterations of GD and fading learning rate in a wiser way, or by decreasing optimization parameters like xatol and fatol for Nelder-Mead method.

However, even if you achieve perfect convergence (parameter values like [ 1.81818459 -1.81817712 -4.09093887] in your case), you have problems. Convergence can be roughly checked by the following code:

print(mysvm.costFunc(numpy.concatenate([mysvm.theta, [mysvm.b]]), mysvm.xdata, mysvm.ydata))
print(mysvm.costFunc(numpy.concatenate([mysvm.theta, [mysvm.b+1e-3]]), mysvm.xdata, mysvm.ydata))
print(mysvm.costFunc(numpy.concatenate([mysvm.theta, [mysvm.b-1e-3]]), mysvm.xdata, mysvm.ydata))
print(mysvm.costFunc(numpy.concatenate([mysvm.theta-1e-3, [mysvm.b]]), mysvm.xdata, mysvm.ydata))
print(mysvm.costFunc(numpy.concatenate([mysvm.theta+1e-3, [mysvm.b]]), mysvm.xdata, mysvm.ydata))

which results in

6.7323592305075515
6.7335116664996
6.733895813394582
6.745819882839341
6.741974212439457

Your cost increases if you change theta or the intercept in any direction - thus, the solution is optimal. But then solution of sklearn is not optimal (from the point of view of mysvm), because the code

print(mysvm.costFunc(numpy.concatenate([clf.coef_[0], clf.intercept_]), mysvm.xdata, mysvm.ydata))

prints 40.31527145374271! It means, you have reached a local minimum, but the sklearn's SVM has minimized something different.

And if you read the documentation of sklearn, you can find what's wrong: they minimize sum(errors) * C + 0.5 * penalty, and you minimize mean(errors) * C + 0.5 * penalty!!! This is the most probable cause of discrepancy.

Answer from David Dale on Stack Overflow

Analytics Vidhya

analyticsvidhya.com › home › introduction support vector machines (svm) with python implementation

Introduction Support Vector Machines (SVM) with Python Implementation

December 9, 2024 - Soft margin SVM is like a savvy detective, armed with the power to draw clear lines between different classes of data points, enabling it to make accurate predictions with remarkable precision.

scikit-learn

scikit-learn.org › stable › auto_examples › svm › plot_svm_margin.html

SVM Margins Example — scikit-learn 1.8.0 documentation

A small value of C includes more/all the observations, allowing the margins to be calculated using all the data in the area. # Authors: The scikit-learn developers # SPDX-License-Identifier: BSD-3-Clause import matplotlib.pyplot as plt import numpy as np from sklearn import svm # we create 40 separable points np.random.seed(0) X = np.r_[np.random.randn(20, 2) - [2, 2], np.random.randn(20, 2) + [2, 2]] Y = [0] * 20 + [1] * 20 # figure number fignum = 1 # fit the model for name, penalty in (("unreg", 1), ("reg", 0.05)): clf = svm.SVC(kernel="linear", C=penalty) clf.fit(X, Y) # get the separating hyperplane w = clf.coef_[0] a = -w[0] / w[1] xx = np.linspace(-5, 5) yy = a * xx - (clf.intercept_[0]) / w[1] # plot the parallels to the separating hyperplane that pass through the # support vectors (margin away from hyperplane in direction # perpendicular to hyperplane).

Videos

14:05

YouTube

Demonstration: Soft margin SVM - YouTube

Part 24-SVM Classification (hard margin and soft margin) - YouTube

Soft Margin SVM : Data Science Concepts - YouTube

November 30, 2020

youtube.com

SVM: Support Vector Machine - Soft Margin | Complete math ...

04:20

YouTube

Soft Margin SVM - YouTube

Soft & Hard Margin Support Vector Machine (SVM)| Machine Learning ...

github.com › DrIanGregory › MachineLearning-SupportVectorMachines

GitHub - DrIanGregory/MachineLearning-SupportVectorMachines: Support vector machines implemented from scratch in Python. · GitHub

SKLearn implements SVM optimisation using LIBSVM which utilises an SMO routine. Python (>2.7), Numpy, CVXOPT, sklearn and matplotlib.

Starred by 14 users

Forked by 3 users

Languages Python

Python Programming

pythonprogramming.net › soft-margin-kernel-cvxopt-svm-machine-learning-tutorial

Kernels, Soft Margin SVM, and Quadratic Programming ...

In this tutorial, we're going to show a Python-version of kernels, soft-margin, and solving the quadratic programming problem with CVXOPT. In this brief section, I am going to mostly be sharing other resources with you, should you want to dig deeper into the SVM or Quadratic Programming in ...

Python Programming

pythonprogramming.net › soft-margin-svm-machine-learning-tutorial

Soft Margin Support Vector Machine

Our new optimization is the above calculation, where slack is greater than or equal to zero. The closer to 0 the slack is, the more "hard-margin" we are. The higher the slack, the more soft the margin is. If slack was 0, then we'd have a typical hard-margin classifier.

Stack Overflow

stackoverflow.com › questions › 48804198 › soft-margin-in-linear-support-vector-machine-using-python

Soft margin in linear support vector machine using python - Stack Overflow

Top answer

1 of 1

11

As for scipy.optimize, you misuse its optimization methods. Both Newton-CG and BFGS assume your cost function is smooth, which is not the case. If you use a robust gradient-free method, like Nelder-Mead, you will converge to the right point in most cases (I have tried it).

Your problem can be theoretically solved by gradient descent, but only if you adapt it to a non-smooth function. Currently, your algorithm approaches optimum quickly, but starts jumping around instead of converging, due to a large learning rate combined with a sharp change in gradient where the maximum in the cost function changes from 0 to positive:

You can calm these oscillations down by decreasing learning rate each time when your costs fails to decrease relative to the previous iteration

def train(self):

    #----------Optimize using scipy.optimize----------
    if self.method=='optimize':
        opt=optimize.minimize(self.costFunc,self.theta,args=(self.xdata,self.ydata),\
                jac=self.jac,method='BFGS')
        self.theta=opt.x

    #---------Optimize using Gradient descent---------
    elif self.method=='GD':
        costs=[]
        lr=self.lr

        for ii in range(self.n_iter):
            dj=self.jac(self.theta,self.xdata,self.ydata)
            old_theta = self.theta.copy()
            self.theta=self.theta-lr*dj
            cii=self.costFunc(self.theta,self.xdata,self.ydata)

            # if cost goes up, decrease learning rate and restore theta
            if len(costs) > 0 and cii > costs[-1]:
                lr *= 0.9
                self.theta = old_theta
            costs.append(cii)

        self.costs=numpy.array(costs)

    return self

This small amendment to your code results in much better convergence:

and in resulting parameters which are pretty close to the optimal - like [0.50110433 0.50076661] or [0.50092616 0.5007394 ].

In modern applications (like neural networks) this adaptation of learning rate is implemented within advanced gradient descent algorithms like ADAM, which constantly track changes in mean and variance of the gradient.

Update. This second part of the answer concerns the secont version of the code.

About ADAM. You got exploding vt because of the line vt=vt/(1-beta2**t). You should normalize only the value of vt used to calculate a gradient step, not the value that goes to the next iteration. Like here:

...
mt=beta1*mt_1+(1-beta1)*dj
vt=beta2*vt_1+(1-beta2)*dj**2
mt_temp=mt/(1-beta1**t)
vt_temp=vt/(1-beta2**t)
old_theta=self.theta
self.theta=self.theta-lr*mt_temp/(numpy.sqrt(vt_temp)+epsilon)
mt_1=mt
vt_1=vt
...

About instability. Both Nelder-Mead method and gradient descent depend on initial value of the parameters, that's the sad truth. You can try to improve convergence by making more iterations of GD and fading learning rate in a wiser way, or by decreasing optimization parameters like xatol and fatol for Nelder-Mead method.

However, even if you achieve perfect convergence (parameter values like [ 1.81818459 -1.81817712 -4.09093887] in your case), you have problems. Convergence can be roughly checked by the following code:

print(mysvm.costFunc(numpy.concatenate([mysvm.theta, [mysvm.b]]), mysvm.xdata, mysvm.ydata))
print(mysvm.costFunc(numpy.concatenate([mysvm.theta, [mysvm.b+1e-3]]), mysvm.xdata, mysvm.ydata))
print(mysvm.costFunc(numpy.concatenate([mysvm.theta, [mysvm.b-1e-3]]), mysvm.xdata, mysvm.ydata))
print(mysvm.costFunc(numpy.concatenate([mysvm.theta-1e-3, [mysvm.b]]), mysvm.xdata, mysvm.ydata))
print(mysvm.costFunc(numpy.concatenate([mysvm.theta+1e-3, [mysvm.b]]), mysvm.xdata, mysvm.ydata))

which results in

6.7323592305075515
6.7335116664996
6.733895813394582
6.745819882839341
6.741974212439457

Your cost increases if you change theta or the intercept in any direction - thus, the solution is optimal. But then solution of sklearn is not optimal (from the point of view of mysvm), because the code

print(mysvm.costFunc(numpy.concatenate([clf.coef_[0], clf.intercept_]), mysvm.xdata, mysvm.ydata))

prints 40.31527145374271! It means, you have reached a local minimum, but the sklearn's SVM has minimized something different.

And if you read the documentation of sklearn, you can find what's wrong: they minimize sum(errors) * C + 0.5 * penalty, and you minimize mean(errors) * C + 0.5 * penalty!!! This is the most probable cause of discrepancy.

Medium

hai-dang.medium.com › solve-and-implement-support-vector-machine-part-2-soft-margin-3d488dd96e5d

Solve and implement Support Vector Machine (Part 2 — Soft margin) | by Dang Nguyen | Medium

May 17, 2022 - Solve and implement Support Vector Machine (Part 2 — Soft margin) Table of contents: 1. Introduction 2. Find solution for Soft margin SVM 3. Implement in Python 4. Evaluation 1. Introduction In the …

Xavierbourretsicotte

xavierbourretsicotte.github.io › SVM_implementation.html

Support Vector Machine: Python implementation using CVXOPT — Data Blog

In this second notebook on SVMs we will walk through the implementation of both the hard margin and soft margin SVM algorithm in Python using the well known CVXOPT library. While the algorithm in its mathematical form is rather straightfoward, its implementation in matrix form using the CVXOPT ...

Find elsewhere

Google Bing Mojeek

Tullo

tullo.ch › articles › svm-py

A basic soft-margin kernel SVM implementation in Python —Andrew Tulloch

November 26, 2013 - The full implementation of the training (using cvxopt as a quadratic program solver) in Python is given below: The code is fairly self-explanatory, and follows the given training algorithm quite closely. To compute our Lagrange multipliers, we simply construct the Gram matrix and solve the given QP. We then pass our trained support vectors and their corresponding Lagrange multipliers and weights to the SVMPredictor, whose implementation is given below.

Emile Mathieu

emilemathieu.fr › posts › 2018 › 08 › svm

An Efficient Soft-Margin Kernel SVM Implementation In Python - Emile Mathieu

August 7, 2018 - This short tutorial aims at introducing support vector machine (SVM) methods from its mathematical formulation along with an efficient implementation in a few lines of Python! Do play with the full code hosted on my github page. I strongly recommend reading Support Vector Machine Solvers (from L.

GitHub

github.com › emilemathieu › blog_svm

GitHub - emilemathieu/blog_svm: An Efficient Soft-Margin Kernel SVM Implementation In Python

An Efficient Soft-Margin Kernel SVM Implementation In Python - emilemathieu/blog_svm

Author emilemathieu

GeeksforGeeks

geeksforgeeks.org › machine learning › support-vector-machine-algorithm

Support Vector Machine (SVM) Algorithm - GeeksforGeeks

10:50

Hard Margin: A maximum-margin ... Soft Margin: Allows some misclassifications by introducing slack variables, balancing margin maximization and misclassification penalties when data is not perfectly separable....

Published 2 weeks ago

GitHub

github.com › juliusberner › svm_tf_pytorch

GitHub - juliusberner/svm_tf_pytorch: Soft-margin SVM gradient-descent implementation in PyTorch and TensorFlow/Keras

Linear soft-margin support-vector machine (gradient-descent) implementation in PyTorch and TensorFlow 2.x (and comparison to scikit-learn). Teaching Material for Machine Learning in Physics VDSP-ESI Winter School 2020: Getting used to ML frameworks and in particular to automatic differentiation. Local: create virtual environment · clone the repository · install requirements.txt · start jupyter notebook and open svm_tf_pytorch.ipynb ·

Author juliusberner

sandipanweb

sandipanweb.wordpress.com › 2018 › 04 › 23 › implementing-a-soft-margin-kernelized-support-vector-machine-binary-classifier-with-quadratic-programming-in-r-and-python

Implementing a Soft-Margin Kernelized Support Vector Machine Binary Classifier with Quadratic Programming in R and Python | sandipanweb

April 24, 2018 - The following figure describes the soft-margin SVM in a more formal way. The following figures show how the SVM dual quadratic programming problem can be formulated using the R quadprog QP solver (following the QP formulation in the R package quadprog). The following figures show how the SVM dual quadratic programming problem can be formulated using the Python CVXOPT QP solver (following the QP formulation in the python library CVXOPT).

scikit-learn

scikit-learn.org › 1.5 › auto_examples › svm › plot_svm_margin.html

SVM Margins Example — scikit-learn 1.5.2 documentation

A small value of C includes more/all the observations, allowing the margins to be calculated using all the data in the area. # Code source: Gaël Varoquaux # Modified for documentation by Jaques Grobler # License: BSD 3 clause import matplotlib.pyplot as plt import numpy as np from sklearn import svm # we create 40 separable points np.random.seed(0) X = np.r_[np.random.randn(20, 2) - [2, 2], np.random.randn(20, 2) + [2, 2]] Y = [0] * 20 + [1] * 20 # figure number fignum = 1 # fit the model for name, penalty in (("unreg", 1), ("reg", 0.05)): clf = svm.SVC(kernel="linear", C=penalty) clf.fit(X, Y) # get the separating hyperplane w = clf.coef_[0] a = -w[0] / w[1] xx = np.linspace(-5, 5) yy = a * xx - (clf.intercept_[0]) / w[1] # plot the parallels to the separating hyperplane that pass through the # support vectors (margin away from hyperplane in direction # perpendicular to hyperplane).

GitHub

github.com › ajtulloch › svmpy

GitHub - ajtulloch/svmpy: Basic soft-margin kernel SVM implementation in Python

This is a basic implementation of a soft-margin kernel SVM solver in Python using numpy and cvxopt.

Starred by 261 users

Forked by 110 users

Languages Python 100.0% | Python 100.0%

DEV Community

dev.to › harsimranjit_singh_0133dc › support-vector-machines-from-hard-margin-to-soft-margin-1bj1

Support Vector Machines: From Hard Margin to Soft Margin - DEV Community

August 12, 2024 - \xi_i \geq 0 ξi≥0 The Soft Margin SVM problem is a type of quadratic programming problem. It involves a quadratic objective function and linear constraints. To solve the problem efficiently, it is often useful to convert it to its dual ...

GeeksforGeeks

geeksforgeeks.org › machine learning › using-a-hard-margin-vs-soft-margin-in-svm

Using a Hard Margin vs Soft Margin in SVM - GeeksforGeeks

July 23, 2025 - This is a soft margin approach. Training the model on the data (features X and labels y). This involves finding the optimal hyperplane that separates the classes with the largest margin.

Stack Overflow

stackoverflow.com › questions › 12355434 › svm-with-hard-margin-and-c-value

python - svm with hard margin and C value - Stack Overflow