SVM Margin Formula Derivation
When introduced to the SVM algorithm, we all came across the formula for the width of the margin:
where w is the vector identifying the hyperplane, has direction perpendicular to the margin and is learned during training. If you need to refresh your knowledge on SVM, click HERE.
In this short article I will walk you through the derivation of equation (1). Before diving into the meat of the derivation let's talk about the dot product between 2 vectors. The dot product between vectors is a scalar and is a measure of their similarity.
Consider two vectors with equal magnitude: if they are perpendicular their dot product is 0; if they have opposite direction the dot product is negative; if they have the same direction, the dot product is maximized and positive.
The dot product is closely related to orthogonal projections of one vector onto the other:
Figure 1. Vector projection.
where a and b are two vector, ap is the projection of the vector a onto b. The dot product between the two vectors is equal to:
Rearranging (2) gives:
Thus, the projection of the vector a onto the vector b is equal to the dot product between the vector a and the unit vector with the same direction of b (note that b/||b|| = unit vector). That's all we need for the derivation of (1).
The hyperplane (w), the margin and the two support vectors (X-, X+) are depicted in the figure below.
Figure 2. SVM decision boundaries
The difference between the right and left support vectors is the orange vector. Note that it start on the left edge of the margin and ends on the right edge. To calculate the width of the margin we multiply the difference vector by the an unit vector perpendicular to the margin. We already have a perpendicular vector like that, that's w. The only thing left to do is to get its unit vector. Thus:
Remember that in the SVM algorithm the two support vectors are unique because:
Let's plug (5) in (4), and simplify:
And we are done!