Cosine Similarity

The cosine similarity is the measure of similarity between two vectors. More precisely, it is the cosine measure of the angle between two vectors. This metric is largely used in text similarity ML problems as it bypasses the need for caring about the magnitude of vectors.

Mathematically, it is represented as:

$$ d(a,b) = \frac{ a*b } { |a| |b| } $$

where:
$a*b$ is the dot product of vectors
$ |a|$ and $|b|$ are the dot products of the vectors

The formula can be expanded with summation as follows:

$$ d(a, b) = \frac { \sum_{i=1}^n {a_i b_i} } { \sqrt { \sum_{i=1}^n {a_i^2}} \sqrt {\sum_{i=1}^n{b_i^2}} } $$

Here is an example of computing cosine similarity. For example, suppose we have two points: A = (6.3, 4.1)
B = (3.2, 1.7)

$$ d(a, b) = \frac { (6.3*3.2) + (4.1*1.7) } { \sqrt { (6.3^2) + (4.1^2)} \sqrt { (3.2^2) + (1.7^2) } } $$

$$ d(a, b) = \frac {27.13} { 7.516 * 3.623} $$

$$ d(a, b) = \frac {27.13} {27.2368} $$

$$ d(a, b) = .996 $$

Implementation in Python:

import numpy as np
a = np.array([6.3, 4.1])
b = np.array([3.2, 1.7])

np.dot(a, b)/(np.sqrt(np.dot(a, a)) * np.sqrt( np.dot(b, b) ))
0.9960776759242224