Support Vector Regression | Machine Learning


Support Vector Regression: Support Vector Regression(SVR) is quite different than other Regression models. It uses the Support Vector Machine(SVM, a classification algorithm) algorithm to predict a continuous variable. While other linear regression models try to minimize the error between the predicted and the actual value, Support Vector Regression tries to fit the best line within a predefined or threshold error value. What SVR does in this sense, it tries to classify all the prediction lines in two types, ones that pass through the error boundary( space separated by two parallel lines) and ones that don’t. Those lines which do not pass the error boundary are not considered as the difference between the predicted value and the actual value has exceeded the error threshold, 𝞮(epsilon). The lines that pass, are considered for a potential support vector to predict the value of an unknown. The following illustration will help you to grab this concept.


.



To understand the above image, you first need to learn some important definitions.


  1. Kernel: Kernel is a function that is used to map a lower-dimensional data points into higher dimensional data points. As SVR performs linear regression in a higher dimension, this function is crucial. There are many types of kernel such as Polynomial Kernel, Gaussian Kernel, Sigmoid Kernel, etc.

  2. Hyper Plane: In Support Vector Machine, a hyperplane is a line used to separate two data classes in a higher dimension than the actual dimension. In SVR, a hyperplane is a line that is used to predict continuous value.

  3. Boundary Line: Two parallel lines drawn to the two sides of Support Vector with the error threshold value, 𝞮(epsilon) are known as the boundary line. These lines create a margin between the data points.

  4. Support Vector: The line from which the distance is minimum or least from two boundary data points.


From the above illustration, you clearly can find the idea. The boundary is trying to fit as many instances as possible without violating the margin. The width of the boundary is controlled by the error threshold 𝞮(epsilon). In classification, the support vector X is used to define the hyperplane that separated the two different classes. Here, these vectors are used to perform linear regression.



How Does SVR Works?






Let's do these steps one by one

Here we choose a Gaussian Kernel.

Now we come to the Correlation Matrix


                         


In the equation above, we are evaluating our kernel for all pairs of points in our training set and adding the regularizer resulting in the matrix.




Then we estimate the elements of the coefficient matrix by the following:

            

                       

Overall, following all these steps, now our SVR model is ready to predict unknown values.



Support Vector Regression in Python: Now we are going to implement this idea into a workable code in Python. We are working with Position_Salaries.csv dataset. Let's have a look at the data-


                                                                                 

You can download the dataset from here.


This dataset contains the position and level of some employees and according to this level, the salary is calculated. Let's check the graph of this dataset


The graph shows that the data is non-linear. Now, what if we want to learn the salary for a level at 6.5. What would be that? To predict that, we will implement Support Vector Regression.


First of all, include the essential libraries


# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Let's import the dataset and make the feature matrix and the dependent variable vector

# Importing the dataset
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values

Now, we need to feature scale the data

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
sc_y = StandardScaler()
X = sc_X.fit_transform(X)
y = sc_y.fit_transform(y)

Now, we need to feature scale the data

# Fitting SVR to the dataset
from sklearn.svm import SVR
regressor = SVR(kernel = 'rbf')
regressor.fit(X, y)

Fit the SVR algorithm to the dataset Let's predict the result

# Predicting a new result
y_pred = regressor.predict(6.5)
y_pred = sc_y.inverse_transform(y_pred)

Finally, we can now visualize what our model has done!

# Visualising the SVR results
plt.scatter(X, y, color = 'red')
plt.plot(X, regressor.predict(X), color = 'blue')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()

Let's have a look at the graph