Every data scientist should have SVM in their toolbox. Learn how to master this versatile model with a hands-on introduction.
Among the available Machine Learning models, there exists one whose versatility makes it a must-have tool for every data scientist toolbox: Support Vector Machine (SVM).
SVM is a powerful and versatile algorithm, which, at its core, can delineate optimal hyperplanes in a high-dimensional space, effectively segregating the different classes of a dataset. But it doesn’t stop here! Its effectiveness is not limited to classification tasks: SVM is well-suited even for regression and outlier detection tasks.
One feature makes the SVM approach particularly effective. Instead of processing the entire dataset, as KNN does, SVM strategically focuses only on the subset of data points located near the decision boundaries. These points are called support vectors, and the mathematics behind this unique idea will be simply explained in the upcoming sections.
By doing so, Support Vector Machine is computationally conservative and ideal for tasks involving medium or even medium-large datasets.
As I do in all my articles, I won’t just explain the theoretical concepts, but I will also provide you with coding examples to familiarize yourself with the Scikit-Learn (sklearn) Python library.
At its core, SVM classification resembles the elegant simplicity of Linear Algebra. Imagine a dataset in two-dimensional space, with two distinct classes to be separated. Linear SVM tries to separate the two classes with the best possible straight line.
What does it mean “best” in this context? SVM searches for the optimal separation line: a line that not only separates the classes, but does it with the maximum possible distance from the closest training instances of each classes. That distance is called margin. The data points that lay on the margin edge are…
Be the first to comment