Support Vector Machine
A super star among the supervised learning algorithms that is equally applicable in both classification and regression on linear and non-linear data
SVM is one of the popular supervised machine learning methods that can be equally used for classification and regression, but SVM is mostly used for classification. The principle of SVM is to find an hyperplane which can classify the training data points into labelled categories. The input of SVM is the training data and uses this training sample point to predict the class of test points.
Consider the following set of points of two classes shown in the graph.
By looking at the figure we can see that the points can be separated using a hyperplane(line) where + class points are above the line and — class will be below the line. Here we need to remember that there can be many hyperplanes which separate the given points in different ways as shown in figure. Each of the hyperplanes is valid as it separates the given points successfully. But, here our objective is to find the optimal hyperplane.
Separating hyper-planes
Support vector machine chosen as the best hyperplane is one which is at the maximum distance from the data points from each category. For a given hyperplane, one can compute the distance between the closest data point and hyperplane from both classes. If we double the distance values, we will get a margin.
Margin space is known as no man’s land where no data point is present. By looking at the figure we observe that margin width completely depends on how far the points are from the hyperplane. So, the optimal hyperplane is defined by the biggest margin and the objective of SVM is to find a hyperplane with maximum margin form training data.
There are many hyperplanes that might classify the data. One reasonable choice as the best hyperplane is the one that represents the largest separation, or margin, between the two classes. So we choose the hyperplane so that the distance from it to the nearest data point on each side is maximized. If such a hyperplane exists, it is known as the maximum-margin hyperplane
Hard v/s Soft margin
If the data is linearly separable:
• we can select two parallel hyperplanes that separate the two classes of data, so that the distance between them is as large as possible.
- The region bounded by these two hyperplanes is called the “margin”.
• the maximum-margin hyperplane is the hyperplane that lies halfway between them.
When data are NOT linearly separable :
• Hinge loss function: adds a penalty for crossing over the margin
• Penalty is proportional to the distance from the margin
Kernel
Advantages of SVM
- SVM model works well with high dimensional data.
2. SVM model equally works well with linear and nonlinear separable data.
3. Training model is relatively simple and easy.
Disadvantages of SVM
- Selection of the right kernel and parameters can be computationally expensive.
Applications of SVM
- Pattern classification
Applying the Support Vector approach to a particular practical problem involves resolving a number of questions based on the problem definition and the design involved with it.
2. Text categorization
The task of text categorization is the classification of natural text documents into a fixed number of predefined categories based on their content. Since a document can be assigned to more than one category this is not a multi-class classification problem, but can be viewed as a series of binary classification problems, one for each category.
3. Image classification
SVM for Classification
SVM is a useful technique for data classification. Even though it’s considered that Neural Networks are easier to use than this, however, sometimes unsatisfactory results are obtained. A classification task usually involves with training and testing data which consist of some data instances . Each instance in the training set contains one target values and several attributes. The goal of SVM is to produce a model which predicts target value of data instances in the testing set which are given only the attributes .
Classification in SVM is an example of Supervised Learning. Known labels help indicate whether the system is performing in a right way or not. This information points to a desired response, validating the accuracy of the system, or be used to help the system learn to act correctly. A step in SVM classification involves identification as which are intimately connected to the known classes. This is called feature selection or feature extraction. Feature selection and SVM classification together have a use even when prediction of unknown samples is not necessary. They can be used to identify key sets which are involved in whatever processes distinguish the classes.
SVM for Regression
SVMs can also be applied to regression problems by the introduction of an alternative loss function . The loss function must be modified to include a distance measure. The regression can be linear and non linear. Linear models mainly consist of the following loss functions, e-intensive loss functions, quadratic and Huber loss function. Similarly to classification problems, a non-linear model is usually required to adequately model data. In the same manner as the non-linear SVC approach, a non-linear mapping can be used to map the data into a high dimensional feature space where linear regression is performed. The kernel approach is again employed to address the curse of dimensionality. In the regression method there are considerations based on prior knowledge of the problem and the distribution of the noise. In the absence of such information Huber’s robust loss function, has been shown to be a good alternative.
Conclusion
Support Vector Machines acts as one of the best approach to data modeling. They combine generalization control as a technique to control dimensionality. The kernel mapping provides a common base for most of the commonly employed model architectures, enabling comparisons to be performed [8]. In classification problems generalization control is obtained by maximizing the margin, which corresponds to minimization of the weight vector in a canonical framework. The solution is obtained as a set of support vectors that can be sparse. The minimization of the weight vector can be used as a criterion in regression problems, with a modified loss function. Traditional classification approaches perform poorly when working directly because of the high dimensionality of the data, but Support Vector Machines can avoid the pitfalls of very high dimensional representations.