Introduction
In this post, I’ll be implementing a linear Support Vector Machine (SVM) from scratch in Python. This is the sixth post in the “Machine Learning from Scratch” series.
SVMs are powerful classifiers that find the optimal hyperplane to separate classes by maximizing the margin between them. They’re particularly effective in high-dimensional spaces.
Support Vector Machine
A Support Vector Machine aims to find a hyperplane that best separates two classes while maximizing the distance (margin) to the nearest data points from each class. These nearest points are called support vectors.
For a linear SVM, we want to find weights w and bias b that satisfy:
y(wx + b) ≥ 1
The optimization problem is to minimize ||w||² subject to these constraints. We can solve this using gradient descent with a hinge loss function.
Implementation
I’m using numpy for numerical computations. For testing, I’ll use train_test_split and datasets from scikit-learn.
The SVM class has the following methods:
__init__: Constructor to set hyperparameters like learning rate, lambda parameter, and iterations.fit: Method to train the SVM using gradient descent.predict: Method to make predictions using the learned hyperplane.
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import datasets
class SVM:
def __init__(self, lr=0.001, lambda_param=0.01, num_iter=1000):
self.lr = lr
self.lambda_param = lambda_param
self.num_iter = num_iter
self.weights = None
self.bias = None
def fit(self, X, y):
num_samples, num_features = X.shape
y_ = np.where(y <= 0, -1, 1)
self.weights = np.zeros(num_features)
self.bias = 0
for _ in range(self.num_iter):
for idx, x_i in enumerate(X):
condition = y_[idx] * (np.dot(x_i, self.weights) - self.bias) >= 1
if condition:
self.weights -= self.lr * (2 * self.lambda_param * self.weights)
else:
self.weights -= self.lr * (2 * self.lambda_param * self.weights - np.dot(x_i, y_[idx]))
self.bias -= self.lr * y_[idx]
def predict(self, X):
linear_output = np.dot(X, self.weights) - self.bias
return np.sign(linear_output)
Now let’s test the model on a binary classification dataset.
def accuracy(y_test, predictions):
return np.sum(y_test == predictions) / len(y_test)
if __name__ == '__main__':
X, y = datasets.make_blobs(
n_samples=500, n_features=2, centers=2,
cluster_std=1.5, random_state=42
)
y = np.where(y == 0, -1, 1)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
model = SVM(lr=0.001, lambda_param=0.01, num_iter=1000)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
acc = accuracy(y_test, predictions)
print(f"Accuracy: {acc}")
The linear SVM achieves strong accuracy on linearly separable data. For non-linearly separable data, kernel tricks can be used to map the data into higher dimensions, though that’s beyond the scope of this basic implementation.
That’s all for this post. Thanks for reading!