Brave Search

scikit-learn: what is the difference between SVC and SGD?

stackoverflow.com › questions › 48022187 › scikit-learn-what-is-the-difference-between-svc-and-sgd

SVM is a support-vector machine which is a special linear-model. From a theoretical view it's a convex-optimization problem and we can get the global-optimum in polynomial-time. There are many different optimization-approaches.

In the past people used general Quadratic Programming solvers. Nowadays specialized approaches like SMO and others are used.

sklearn's specialized SVM-optimizers are based on liblinear and libsvm. There are many documents and research papers if you are interested in the algorithms.

Keep in mind, that SVC (libsvm) and LinearSVC (liblinear) make different assumptions in regards to the optimization-problem, which results in different performances on the same task (linear-kernel: LinearSVC much more efficient than SVC in general; but some tasks can't be tackled by LinearSVC).

SGD is an Stochastic Gradient Descent-based (this is a general optimization method!) optimizer which can optimize many different convex-optimization problems (actually: this is more or less the same method used in all those Deep-Learning approaches; so people use it in the non-convex setting too; throwing away theoretical-guarantees).

sklearn says: Stochastic Gradient Descent (SGD) is a simple yet very efficient approach to discriminative learning of linear classifiers under convex loss functions. Now it's actually even more versatile, but here it's enough to note that it subsumes (some) SVMs, logistic-regression and others.

Now SGD-based optimization is very different from QP and others. If one would take QP for example, there are no hyper-parameters to tune. This is a bit simplified, as there can be tuning, but it's not needed to guarantee convergence and performance! (theory of QP-solvers, e.g. Interior-point method is much more robust)

SGD-based optimizers (or general first-order methods) are very very hard to tune! And they need tuning! Learning-rates or learning-schedules in general are parameters to look at as convergence depends on these (theory and practice)!

It's a very complex topic, but some simplified rules:

Specialized SVM-methods
- scale worse with the number of samples
- do not need hyper-parameter tuning
SGD-based methods
- scale better for huge-data in general
- need hyper-parameter tuning
- solve only a subset of the tasks approachable by the the above (no kernel-methods!)

My opinion: use (the easier to use) LinearSVC as long as it's working, given your time-budget!

Just to make it clear: i highly recommend grabbing some dataset (e.g. from within sklearn) and do some comparisons between those candidates. The need for param-tuning is not a theoretical-problem! You will see non-optimal (objective / loss) results in the SGD-case quite easily!

And always remember: Stochastic Gradient Descent is sensitive to feature scaling docs. This is more or less a consequence of first-order methods.

Answer from sascha on Stack Overflow

scikit-learn

scikit-learn.org › stable › modules › generated › sklearn.linear_model.SGDClassifier.html

SGDClassifier — scikit-learn 1.8.0 documentation

SGD allows minibatch (online/out-of-core) learning via the partial_fit method. For best results using the default learning rate schedule, the data should have zero mean and unit variance. This implementation works with data represented as dense or sparse arrays of floating point values for the features. The model it fits can be controlled with the loss parameter; by default, it fits a linear support vector machine (SVM...

Stack Overflow

stackoverflow.com › questions › 48022187 › scikit-learn-what-is-the-difference-between-svc-and-sgd

python - scikit-learn: what is the difference between SVC and SGD? - Stack Overflow

Top answer

1 of 2

25