Code to train a Support Vector Machine?

sylvaticus · February 28, 2020, 8:39pm

Hello,
In the MITx 6.86x “Machine Learning with Python-From Linear Models to Deep Learning” course we has been presented with a recitation that uses scikit-learn to train a SVM model (code follows).

In particular, the key functions are model =linear_model.SGDClassifier(loss='hinge', penalty='l2', alpha=alpha[i]) that builds the model, and score = ms.cross_val_score(model, X, y, cv=5) that automatically partitions the training set in 4 parts to be used for SVM training (using gradient descendent, or so they told us), and one part, different each time, to be used as validation set, returning a vector of scores.

I would like to port it in Julia, but before coding inefficient gradient descent and training methods, I wander if efficient versions of these algorithms already exist.

Also, how could I access the dataset? PyCall?

Please consider I know nothing of Machine Learning (this is just the first unit…)

Here is the code of the recitation in Python:

#!/usr/bin/env python
# coding: utf-8

# In[ ]:


### Scikit-Learn: https://scikit-learn.org/stable/


# In[21]:


# Imports
import numpy as np
import matplotlib.pyplot as plt # for plotting 
import seaborn as sns # for plotting
from sklearn import datasets
from sklearn import preprocessing
from sklearn import linear_model
from sklearn import model_selection as ms


# In[6]:


cancer_data = datasets.load_breast_cancer()
y = cancer_data.target # Training labels ('malignant = 0', 'benign = 1')
X = cancer_data.data # 30 attributes; https://scikit-learn.org/stable/datasets/index.html#breast-cancer-dataset
X = preprocessing.scale(X) # scale each data attribute to zero-mean and unit variance 


# In[24]:


# Plot the first 2 attributes of training points
sns.scatterplot(X[:, 0], X[:, 1], hue=y)
plt.xlabel('Tumor Radius')
plt.ylabel('Tumor Texture')
plt.grid(True)
plt.show()


# In[30]:


alpha = np.arange(1e-15,1,0.005) # Range of hyperparameter values 1E-15 to 1 by 0.005
val_scores = np.zeros((len(alpha),1)) # Initialize validation score for each alpha value

for i in range(len(alpha)): # for each alpha value
    # Set up SVM with hinge loss and l2 norm regularization
    model = linear_model.SGDClassifier(loss='hinge', penalty='l2', alpha=alpha[i])
    # Calculate cross validation scores for 5-fold cross-validation
    score = ms.cross_val_score(model, X, y, cv=5)
    val_scores[i] = score.mean() # Calculate mean of the 5 scores


# In[31]:


# Plot how cross-validation score changes with alpha
plt.plot(alpha,val_scores)
plt.xlim(0,1)
plt.xlabel('alpha')
plt.ylabel('Mean Cross-Validation Accuracy')
plt.grid(True)
plt.show()


# In[47]:


# Determine the alpha that maximizes the cross-validation score
ind = np.argmax(val_scores)
alpha_star = alpha[ind]
print('alpha_star =', alpha_star)

plt.plot(alpha,val_scores)
plt.plot(np.ones(11)*alpha_star,np.arange(0,1.1,0.1),'--r')
plt.xlim(0,1)
plt.ylim(0.94,0.98)
plt.xlabel('alpha')
plt.ylabel('Mean Cross-Validation Accuracy')
plt.grid(True)
plt.show()


# In[67]:


# Train model with alpha_star
model_star = linear_model.SGDClassifier(loss='hinge', penalty='l2', alpha=alpha_star)
model_trained = model_star.fit(X,y)
print('Training Accuracy =', model_trained.score(X,y))
# Training Accuracy = 0.9806678383128296


# In[62]:


# Plot decision boundary of trained model
slope = model_trained.coef_[0,1]/-model_trained.coef_[0,0]
x1 = np.arange(-10,10,0.5)
y1 = slope*x1
sns.scatterplot(X[:, 0], X[:, 1], hue=y)
plt.plot(x1,y1,'--k')
plt.xlim(-4,4)
plt.ylim(-6,6)
plt.xlabel('Tumor Radius')
plt.ylabel('Tumor Texture')
plt.grid(True)
plt.show()

Topic		Replies	Views
Learning Machine Learning (SVM) with Julia Machine Learning question , package	10	4057	March 18, 2020
Working One-class SVM in Julia? General Usage question	6	909	July 28, 2020
Julia implementations of some of the foundational Machine Learning models and algorithms from scratch Machine Learning package	0	1129	March 10, 2017
Implementing one-class SVM for Novelty Detection Statistics question , package	11	2849	July 28, 2020
Calculate SVM dual loss effeciently Performance	0	482	August 5, 2019

Code to train a Support Vector Machine?

Related topics