Machine Learning from Scratch: Support Vector Machine

Introduction

In this post, I’ll be implementing a linear Support Vector Machine (SVM) from scratch in Python. This is the sixth post in the “Machine Learning from Scratch” series.

SVMs are powerful classifiers that find the optimal hyperplane to separate classes by maximizing the margin between them. They’re particularly effective in high-dimensional spaces.

Support Vector Machine

A Support Vector Machine aims to find a hyperplane that best separates two classes while maximizing the distance (margin) to the nearest data points from each class. These nearest points are called support vectors.

For a linear SVM, we want to find weights w and bias b that satisfy:

y(wx + b) ≥ 1

The optimization problem is to minimize ||w||² subject to these constraints. We can solve this using gradient descent with a hinge loss function.

Implementation

I’m using numpy for numerical computations. For testing, I’ll use train_test_split and datasets from scikit-learn.

The SVM class has the following methods:

__init__: Constructor to set hyperparameters like learning rate, lambda parameter, and iterations.
fit: Method to train the SVM using gradient descent.
predict: Method to make predictions using the learned hyperplane.

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import datasets

class SVM:
    def __init__(self, lr=0.001, lambda_param=0.01, num_iter=1000):
        self.lr = lr
        self.lambda_param = lambda_param
        self.num_iter = num_iter
        self.weights = None
        self.bias = None

    def fit(self, X, y):
        num_samples, num_features = X.shape
        y_ = np.where(y <= 0, -1, 1)

        self.weights = np.zeros(num_features)
        self.bias = 0

        for _ in range(self.num_iter):
            for idx, x_i in enumerate(X):
                condition = y_[idx] * (np.dot(x_i, self.weights) - self.bias) >= 1
                if condition:
                    self.weights -= self.lr * (2 * self.lambda_param * self.weights)
                else:
                    self.weights -= self.lr * (2 * self.lambda_param * self.weights - np.dot(x_i, y_[idx]))
                    self.bias -= self.lr * y_[idx]

    def predict(self, X):
        linear_output = np.dot(X, self.weights) - self.bias
        return np.sign(linear_output)

Now let’s test the model on a binary classification dataset.

def accuracy(y_test, predictions):
    return np.sum(y_test == predictions) / len(y_test)


if __name__ == '__main__':
    X, y = datasets.make_blobs(
        n_samples=500, n_features=2, centers=2, 
        cluster_std=1.5, random_state=42
    )
    y = np.where(y == 0, -1, 1)

    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42
    )

    model = SVM(lr=0.001, lambda_param=0.01, num_iter=1000)
    model.fit(X_train, y_train)
    predictions = model.predict(X_test)

    acc = accuracy(y_test, predictions)
    print(f"Accuracy: {acc}")

The linear SVM achieves strong accuracy on linearly separable data. For non-linearly separable data, kernel tricks can be used to map the data into higher dimensions, though that’s beyond the scope of this basic implementation.

That’s all for this post. Thanks for reading!

Machine Learning from Scratch: K-Means Clustering

Machine Learning from Scratch: Decision Tree