Ways to Measure Distance

There are several ways to measure distance between two points in a multidimensional space. The choice of distance metric depends on the specific problem and the characteristics of the data.

Euclidean Distance

The Euclidean distance is the straight-line distance between two points in a multidimensional space. It is calculated as the square root of the sum of the squared differences between the corresponding coordinates of the two points.

1
d(A, B) = sqrt((x1 - y1)^2 + (x2 - y2)^2 + ... + (xn - yn)^2)

For example, let’s say we have two points in a 2D space:

  • Point A: (1, 2)
  • Point B: (4, 6)

The Euclidean distance between these two points is calculated as:

sqrt((1 - 4)^2 + (2 - 6)^2) = sqrt(9 + 16) = sqrt(25)

Manhattan Distance

The Manhattan distance is the sum of the absolute differences between the corresponding coordinates of the two points. It is calculated as the sum of the absolute differences between the coordinates of the two points.

1
d(A, B) = |x1 - y1| + |x2 - y2| + ... + |xn - yn|

For example, let’s say we have two points in a 2D space:

  • Point A: (1, 2)
  • Point B: (4, 6)

The Manhattan distance between these two points is calculated as:

|1 - 4| + |2 - 6| = 3 + 4 = 7

Minkowski Distance

The Minkowski distance is a generalization of the Euclidean and Manhattan distances. It is calculated as the sum of the powers of the differences between the corresponding coordinates of the two points.

1
d(A, B) = (|x1 - y1|^p + |x2 - y2|^p + ... + |xn - yn|^p)^(1/p)

For example, let’s say we have two points in a 2D space:

  • Point A: (1, 2)
  • Point B: (4, 6)

The Minkowski distance between these two points with a power of 2 is calculated as:

|(1 - 4)|^2 + (|2 - 6|)^2 = 9 + 16 = 25

The Minkowski distance between these two points with a power of 3 is calculated as:

|(1 - 4)|^3 + (|2 - 6|)^3 = 27 + 64 = 91

Hamming Distance

The Hamming distance is a measure of the difference between two strings of equal length. It is calculated as the number of positions at which the corresponding symbols are different.

For example, let’s say we have two strings:

  • String A: AABB
  • String B: ABBC

The Hamming distance between these two strings is calculated as:

  • Position 1: A vs A (same)
  • Position 2: A vs B (different)
  • Position 3: B vs B (same)
  • Position 4: B vs C (different)

d(A, B) = 2

Jaccard Distance

The Jaccard distance is a measure of the similarity between two sets. It is calculated as the size of the intersection divided by the size of the union of the two sets.

1
Jaccard Similarity = |A ∩ B| / |A ∪ B|

For example, let’s say we have two sets:

  • Set A: {1, 2, 3}
  • Set B: {2, 3, 4}

The Jaccard distance between these two sets is calculated as:
size(A ∩ B) / size(A ∪ B) = 2 / 4 = 0.5

Therefore, the Jaccard distance is: 1 - 0.5 = 0.5

Cosine Distance

The Cosine distance is a measure of the similarity between two vectors. It is calculated as the dot product of the two vectors divided by the product of their magnitudes.

1
Cosine Similarity = (A · B) / (||A|| × ||B||)

For example, let’s say we have two vectors:

  • Vector A: [1, 2, 3]
  • Vector B: [2, 3, 4]

The Cosine distance between these two vectors is calculated as:
A · B = 1 × 2 + 2 × 3 + 3 × 4 = 2 + 6 + 12 = 20

||A|| = sqrt(1^2 + 2^2 + 3^2) = sqrt(14)

||B|| = sqrt(2^2 + 3^2 + 4^2) = sqrt(29)

20 / (sqrt(14) × sqrt(29)) ≈ 20 / 20.149

Therefore, the cosine distance is: 1 - 0.9926 = 0.0074