Brave Search

Proper choice of C and gamma is critical to the SVM’s performance. One is advised to use GridSearchCV with C and gamma spaced exponentially far apart to choose good values. ... You can define your own kernels by either giving the kernel as a python function or by precomputing the Gram matrix.

DataCamp

datacamp.com › tutorial › svm-classification-scikit-learn-python

Scikit-learn SVM Tutorial with Python (Support Vector Machines) | DataCamp

December 27, 2019 - Until now, you have learned about the theoretical background of SVM. Now you will learn about its implementation in Python using scikit-learn. In the model the building part, you can use the cancer dataset, which is a very famous multi-class classification problem.

Discussions

Ways to speedup SVR training in scikitlearn

For reference, here is a copy of my reply on the scikit-learn mailing list: Kernel SVM are not scalable to large or even medium number of samples as the complexity is quadratic (or more). You should try to: learn independent SVR models on a partitions of the data (e.g. 10 models trained on 5000 samples each) and then compute the mean predictions of the 10 models as the final prediction. The aggregate training complexity should be much lower: 10 * (5000 ** 2) << (10 * 5000) ** 2 and furthermore the 10 SVR models can be trained independently in parallel. Also the grid search for the best hyper parameters can be done only once on 5000 random samples and the optimal parameters can be reused to trained the 9 remaining models. perform a feature expansion of the data using the Nystroem method for instance and then fit a LinearSVC model on the resulting dataset. You can use a Pipeline object to combine the 2 to be able to grid search C and gamma together, see: http://scikit-learn.org/stable/modules/kernel_approximation.html#nystroem-kernel-approx investigate other non linear regression models such as GBRT regressors (see: http://scikit-learn.org/stable/modules/ensemble.html#regression ), Adaboost (with decision stumps as the base learner, only available in the master branch: http://scikit-learn.org/dev/modules/ensemble.html#adaboost ), ExtraTreesRegressor (see http://scikit-learn.org/stable/modules/ensemble.html#extremely-randomized-trees ) Note that the partitioning trick suggested for SVR might also work to speed up the training of the other models. Also in scikit-learn master there is an implementation of RandomizedSearchCV as a much faster (yet approximate) alternative to GridSearchCV. Beware that the cv_scores_ attribute it currently false (but the best_params_ attribute is correct). This bug is fixed in this PR: https://github.com/scikit-learn/scikit-learn/pull/2042 More on reddit.com

r/MachineLearning

June 7, 2013

Exporting Python Scikit-learn model to trading platform in C++

sklearn uses libsvm and liblinear internally, which are C libraries. If you use libsvm without the sklearn wrappers, you can save models easily from python and load them in the C++ app. https://stackoverflow.com/questions/11440970/how-can-i-save-a-libsvm-python-object-instance More on reddit.com

r/algotrading

June 27, 2017

How well do Support Vector Machines scale to very large training datasets?

It should be fine with an online linear svm that is explicitly formulated for these kinds of problems. LibLinear is one of these and I think they have Java hooks, although I'm not sure about that. There is also Pegasos and Bottou's Stochastic quasi-Newton work.

If you're using a kernel svm the minimum bottleneck is the computation of the kernel matrix which is O(N² ), but you can expect performance of around O(N³ ) on larger datasets. If you want references from this you should google "Trade offs large scale learning" by Bottou.

You can make approximate kernel SVMs work on large scale data, but it involves approximating the kernel by a set of projections, and it's probably more hassle than it's worth for an initial test.

Randomly guessing at the runtime of the linear SVM - you haven't given me sufficient information about the specs of your machine, or the data, but somewhere between an hour and a couple of days probably. The numbers get better if your feature vectors are sparse.

Videos