In this lesson, you'll learn how to create SVMs with non-linear decision boundaries data using kernels!
You will be able to:
- Define the kernel trick and explain why it is important in an SVM model
- Describe a radial basis function kernel
- Describe a sigmoid kernel
- Describe a polynomial kernel
- Determine when it is best to use specific kernels within SVM
In the previous lab, you looked at a plot where a linear boundary was clearly not sufficient to separate the two classes cleanly. Another example where a linear boundary would not work well is shown below. How would you draw a max margin classifier here? The intuitive solution is to draw an arc around the circles, separating them from the surrounding diamonds. To generate non-linear boundaries such as this, you use what is known as a kernel.
The idea behind kernel methods is to create (nonlinear) combinations of the original features and project them onto a higher-dimensional space. For example, take a look at how this dataset could be transformed with an appropriate kernel from a two-dimensional dataset onto a new three-dimensional feature space.
There are several kernels, and an overview can be found in this lesson, as well as in the scikit-learn documentation here. The idea is that kernels are inner products in a transformed space.
The linear kernel is, as you've seen, the default kernel and simply creates linear decision boundaries. The linear kernel is represented by the inner product of the
There are two parameters when training an SVM with the Radial Basis Function:
-
The parameter
$C$ is common to all SVM kernels. Again, by tuning the$C$ parameter when using kernels, you can provide a trade-off between misclassification of the training set and simplicity of the decision function. A high$C$ will classify as many samples correctly as possible (and might potentially lead to overfitting) -
$gamma$ defines how much influence a single training example has. The larger$gamma$ is, the closer other examples must be to be affected
The RBF kernel is specified as:
Gamma has a strong effect on the results: a
In scikit-learn, you can specify a value for gamma
. The default gamma
value is "auto", if no other gamma is specified, gamma is set to
The Polynomial kernel is specified as
-
$d$ can be specified by the parameterdegree
. The default degree is 3. -
$r$ can be specified by the parametercoef0
. The default is 0.
The sigmoid kernel is specified as:
This kernel is similar to the signoid function in logistic regression.
NuSVC is similar to SVC, but adds an additional parameter,
Just like SVC, NuSVC implements the "one-against-one" approach when there are more than 2 classes. This means that when there are n classes,
LinearSVC is similar to SVC, but instead of the "one-versus-one" method, a "one-vs-rest" method is used. So in this case, when there are
You can make predictions using support vector machines. The SVC decision function gives a probability score per class. However, this is not done by default. You'll need to set the probability
argument equal to True
. Scikit-learn internally performs cross-validation to compute the probabilities, so you can expect that setting probability
to True
makes the calculations longer. For large datasets, computation can take considerable time to execute.
Great! You now have a basic understanding of how to use kernel functions in Support Vector Machines. You'll do just that in the upcoming lab!