kernel svm in machine learning

Flexible Feature Mapping: Can use ... learned representations, or hybrid metrics. Custom Distance Metrics: Supports non-Euclidean measures like cosine, correlation, Hamming. Better Accuracy for Unique Data: Works well when standard kernels fail to capture real patterns. Complexity Trade-off: May require mathematical checks to ensure SVM ...

Published November 8, 2025

scikit-learn

scikit-learn.org › stable › modules › svm.html

1.4. Support Vector Machines — scikit-learn 1.8.0 documentation

Proper choice of C and gamma is critical to the SVM’s performance. One is advised to use GridSearchCV with C and gamma spaced exponentially far apart to choose good values. ... You can define your own kernels by either giving the kernel as a python function or by precomputing the Gram matrix. Classifiers with custom kernels behave the same way as any other classifiers, except that: Field support_vectors_ is now empty, only indices of support vectors are stored in support_

Discussions

[D] Is SVM/Are kernel methods still relevant?

Very relevant. Most real-world problems don't need deep learning, especially many problem spaces that have limited training data available, lots of classical ML like boosting and SVMs still perform quite well. On the academic side, tons of work around doing faster approximate kernel learning is now becoming hyper-relevant to improving the scalability of transformers, as you can interperate many of the attention mechanisms in use as a kind of kernel. More on reddit.com

r/MachineLearning

July 28, 2021

Why don't researchers use the kernel method in neural networks?

TL;DR: Because what a neural network does is essentially "learning a kernel" A kernel takes your input features, transforms them into some other space and then essentially works with those transformed features (when you apply the kernel trick, you do all this implicitly, but that doesn't change this fact), and then does a linear classification in that new space. A neural network, takes your input features, transforms them into some other space and then works with those transformed features (the activations of the hidden layers) and then does a linear classification in that new space. Now the main difference is that a Neural Network learns this representation "on it's own". You can sort of influence the network a bit by e.g. applying specific regularizers or by switching activation functions or maybe training algorithm specifica. But in the grand scheme of things, the network will learn some obscure feature space that you won't understand and that you will have a hard time interpreting. A Kernel on the other hand was designed by someone who had a very specific representation in mind. The advantage is you can put a shit-ton of domain knowledge into your kernel (e.g. when using Mismatch-Kernels for DNA-sequences). On the other hand the learning algorithm (e.g. the SVM) has no way of influencing this representation during learning. Maybe there are a few kinks to the kernel representation that do make sense to humans, but are useless to a classifier. A Neural Network can adapt it's representation, but an SVM can't. Now, with that said, there are a few issues with putting a well-designed kernel on top of a Neural Network: The kernel has to work with the activation of the NN. This is usually an uninterpretable representation. Whereas a domain expert could look at input features and know what each feature stands for and apply his domain knowledge to this, he will have a hard time interpreting the activations of certain units. So one of the biggest advantages of kernels (designing problem-specific kernels) is lost. You often can't backpropagate through a kernel. So you can't just e.g. include a String-Kernel or a Graph-Similarity Kernel as part of your neural network, because it's very hard to backpropagate gradients through the expressions involved in such a kernel. That leaves you with non-specific kernels such as the RBF or polynomial kernel. Those don't care about the underlying representation and don't include any domain knowledge. Still the RBF kernel is just something that we know "works well" for all classification problems, since it simply exploits geometric features of it's input space ( = points that are close-by are likely to have the same label). So what you could do is just take the hidden representation, and apply an SVM with a non-specific kernel to it, as e.g. done in the paper linked by u/aydind . There has indeed been very little work in that direction (AFAIK). More on reddit.com

r/MachineLearning

September 4, 2014

Where do Support Vector Machines perform badly?

Well with each kernel the SVMs become a different algorithm so this is not a very well defined question. So one drawback of the SVMs is you have to choose the kernel. That means you have to provide the true structure of the data as an input, while other algorithms, like neural networks or random-forests, try to automatically find the structure. Also you have to tune the parameters for the kernels and the C parameter which can be time consuming and decrease the performance if you do it wrong. Here is a link on the subject. So you can either find some non-linear dataset and try to fit it without a kernel to show that it doesn't work or you can set some random values at the C parameter to show that it decreases the accuracy. Or you can find a very simple and big dataset and show that it takes very long for the SVMs to train while a simple logistic regression has the same results in less time. More on reddit.com

r/MachineLearning

January 6, 2016

Can someone explain Kernel Trick intuitively?

Introduction Suppose you have an N-dimensional dataset, and you would like to apply some linear machine learning technique to it, e.g. find a linear separating hyperplane - but it's not working, because the shape of the dataset is too non-linear. One way to go is to try finding a non-linear separator, e.g. stop looking for hyperplanes and start looking for higher-order surfaces. We're not interested in this option because linear is simpler. Another way to go is to transform your input variables so that the shape of the dataset becomes more linear. E.g. if there's a clear parabola in the shape, you might want to square one of the variables to linearise it. Transforming the dataset Note that you don't have to preserve the dimensionality of the original dataset when doing this transformation! E.g. all you need to do for looking a polynomial hyperplane of order 3 separating a 2-dimensional dataset is to map each point (x, y) to the 6-dimensional vector (x, x2 , x3 , y, y2 , y3 ). The amount of information is the same, but now even a linear classifier can make use of polynomial trends; and you can easily train a linear classifier on the transformed dataset, which will give you a non-linear classifier on the original dataset. The kernel trick Let's call your transformation function F. Most linear machine learning techniques can be implemented with using only the dot product operation, call it P. If you can compute a . b for any two points of your dataset, you often don't need anything else - you don't need to even know the dimensionality of the points. So what if you knew a function K(a, b) such that K(a, b) = F(a) . F(b). Then, during learning, every time you needed to compute F(a) . F(b), you'd just compute K(a, b) - you wouldn't need to actually apply the function F - the transformation would exist only "implicitly". K is called the kernel. Magic of the kernel trick: transformed dataset can be implicit This opens up quite a number of possibilities. For example, you no longer need the transformed dataset to be small-dimensional (fit in memory), or even finite-dimensional, or even to exist at all - you only need the function K to obey a number of dot-product-like properties. With some complicated enough K functions, you may get arbitrarily precise separation - the only danger is overfitting. Why does it work Let us now try to understand how the shape of the K function corresponds to the kinds of surfaces it can expose in your dataset. SVM classifiers and their simple linear case Recall that an SVM classifier for an M-point dataset looks like: class(x) = sign(sum(i=1..M)(w_i * (x_i . x)) + b) for some set of support weight vectors w and constant vector b. It just so happens that, with an N-dimensional space, there can anyway be no more than N linearly independent vectors in the X dataset, and setting w_i to non-zero for more than N values of "i" is simply redundant. So you can replace this formula with sign(w . x + b) for a single N-dimensional vector w - i.e. for linear classification, you don't need to interpret your data points as "support vectors" and give them "weights" - you can explicitly specify a hyperplane by two vector. Using a kernel in an SVM classifier But for kernel methods, you need to use the original form - class(x) = sign(sum(i=1..M)(w_i * (x_i . x)) + b). Transform this into: class(x) = sign(sum(i=1..M)(w_i * Q(x_i , x) + b). Example: Gaussian kernel SVM For example, suppose K is a "radial basis function" kernel http://en.wikipedia.org/wiki/RBF_kernel : K(w, b) = exp(-||w-b||2 / sigma2 ). Then, this basically means "class(x) = weighted sum of exponentially decreasing distances from x to points in the dataset". Note how dramatically this differs from the linear case, even though the method is the same. It is really enlightening to see how surfaces of "K(w, x) = const" look for a fixed w, or "K(w1, x) + K(w2, x) = const". Note how, for a linear kernel, the shape of K(w, x) = const is no different from K(w1, x) + K(w2, x) = const - they're both planes - but for a non-linear kernel they're different. This is where it "clicked" for me. At this point, I think you're ready to consume examples of the kernel trick (of possible F or Q functions) found on the internet - my favourite reference on that is http://crsouza.blogspot.com/2010/03/kernel-functions-for-machine-learning.html . More on reddit.com

r/MachineLearning

August 4, 2013

Videos