Chapter 5-2. Types of Distances and Similarities
Recommended Article: 【Statistics】 Chapter 5. Statistics
1. Overview
4. Types of Similarity Concepts
1. Overview
⑴ The difference between norm and distance: often used interchangeably.
① norm
② distance function (metric)
③ If a norm is defined, a distance 𝑑 can be defined.
④ However, the existence of a corresponding norm is not guaranteed just because a distance is defined.
⑵ The difference between distance and similarity: often used interchangeably.
① Commonality: Saying two data points are close (short distance) is equivalent to saying they are similar. In other words, distance ∝ 1 / similarity.
② Commonality: In machine learning, terms like loss function, error function, and cost function also refer to the difference between the true value and the predicted value (∝ 1 / similarity).
③ Difference: While a distance function is rigorously defined in linear algebra, it’s not necessary for loss functions or similarity measures to satisfy that definition.
⑶ Various types of distances and concepts of similarity.
Figure 1. Various types of distances and concepts of similarity
2. Types of Norm Concepts
⑴ Type 1. L1-norm
⑵ Type 2. L2-norm
⑶ Type 3. p-norm
Figure 2. Trend of the p-norm = 1 interval based on p-value
⑷ Type 4. Frobenius norm
3. Types of Distance Concepts
⑴ Type 1. L1 Loss Function (L1-distance, MAE, city-block distance, taxicab distance, rectilinear distance, Manhattan distance, sparse learning, compressed sensing)
① In other words, it is a method of calculating distance by setting paths in shapes like ‘ㄱ’ and ‘ㄴ’.
② MAPE is defined similarly as follows:
⑵ Type 2. L2 Loss Function (L2-distance, MSE): Euclidean distance using the Pythagorean theorem (standard)
① PSNR (Peak Signal to Noise Ratio): the representation of MSE (Mean Squared Error) in decibels.
⑶ Type 3. Maximum Metric (Chebyshev Distance, Supremum Distance, Uniform Distance, Chessboard Distance, 𝑑∞ Distance)
① Defined as ∣∣x−y∣∣∞, the p-norm where p = ∞; reminiscent of a chess piece, the King.
② In distance calculations, only the dominant dimension (feature) is considered, while the rest are ignored.
③ Applications: Sample entropy in signal processing theory and robust control in control theory.
⑷ Type 4. Cross Entropy: Typically has a binary cross-entropy (BCE).
⑸ Type 5. Distance in Information Theory
⑹ Type 6. Delaunay Triangulation(Delaunay triangulation)
⑺ Type 7. Vector arithmetic
① 7-1. Dot Product: Vector inner product
② 7-2. Hadamard product: Element-wise multiplication
⑻ Type 8. Linkage Metric: Defines distance between clusters
⑼ Type 9. Hamming Distance
① Assign binary values to each data point and measure the distance between data points based on the difference in values. This is often used in information theory.
② Example: (0, 1, 1, 0, 0, 1) and (1, 1, 1, 1, 0, 0) have different values at the 1st, 4th, and 6th positions, so the Hamming distance is 3.
⑽ Type 10. Standardized Distance:
① Distance standardized by the measurement unit of the variable.
② Formula:
d(i, j)2 = (Xi - Xj)T D-1 (Xi - Xj)
○ Xi: Starting point matrix
○ Xj: Endpoint matrix
○ D: Sample variance (diagonal) matrix
⑾ Type 11. Mahalanobis Distance**
Figure 3. Mahalanobis Distance
① A statistical distance that considers both the standardization of variables and the correlation between variables (shape of the data distribution).
② Formula: When trying to determine the distance d between two data points Xi and Xj, the following formula is used:
d(i, j)2 = (Xi - Xj)T S-1 (Xi - Xj)
○ Xi : Starting point matrix
○ Xj : Endpoint matrix
○ S : Sample covariance matrix
○ S is a positive semi-definite matrix because (Xi - Xj)tS-1(Xi - Xj) ≥ 0 holds.
○ S can be zero if the data contains dependent variables or if the number of samples is small compared to the dimensionality of the data.
③ Advantages: Unlike Euclidean distance, it is scale-free, considers data distribution and correlation, and offers benefits such as outlier detection.
○ If Euclidean distance is a spherical distance, Mahalanobis distance is an elliptical distance.
④ Limitations: Assumes the normality of the data. The process of calculating the sample covariance matrix is computationally intensive.
⑤ Python code
import numpy as np
from scipy.linalg import inv
def mahalanobis_distance(x, y, covariance_matrix, regularization=1e-6):
# Add regularization to the covariance matrix's diagonal
regularized_cov = covariance_matrix + np.eye(covariance_matrix.shape[0]) * regularization
x_minus_y = np.array(x) - np.array(y)
covariance_matrix_inv = inv(regularized_cov)
distance = np.dot(np.dot(x_minus_y, covariance_matrix_inv), x_minus_y.T)
return np.sqrt(distance)
# Example usage:
x = [1, 2, 3]
y = [4, 5, 6]
data = np.array([x, y])
cov_matrix = np.cov(data, rowvar=False) # Here, we're just using the covariance of x and y for simplicity
print(mahalanobis_distance(x, y, cov_matrix))
⑿ Type 12. Levenshtein Distance: An algorithm that determines how similar two strings, A and B, are to each other.
⒀ Type 13. Minkowski Distance
① Distance in m-dimensional Minkowski space.
② When m = 1, it is equivalent to Manhattan distance.
③ When m = 2, it is equivalent to Euclidean distance.
⒁ Type 14. Hausdorff Distance
① Formalization: For two sets A = {a1, …, ap} and B = {b1, …, bq},
H(A, B) = max(h(A, B), h(B, A))
② directed Hausdorff distance: The distance between the two points in A and B that are furthest apart,
h(A, B) = maxa ∈ A minb ∈ B || a - b ||
⒂ Type 15. Focal Loss
① Formalization
FL = -(1 - Pt)γ log (Pt)
⒃ Type 16. Sørensen–Dice Coefficient (Dice Distance)
① Formalization
2 | A ∩ B| / | A ∪ B |
⒄ Type 17. Gromov-Wasserstein distance (Kantorovich–Rubinstein metric, Earth Mover’s Distance, EMD)
⒅ Type 18. Jensen-Shannon distance
⒆ Type 19. total variation (TV) distance
⒇ Type 20. Kolmogorov-Sminrov distance
⒇ Type 21. Hellinger distance: Requires kernel density estimation of a probability density function.
⒇ Type 22. Huber loss function
⒇ Type 23. Bhattacharyya loss
⒇ Type 24. ELBO (evidence lower bound)
⒇ Type 25. Aitchison distance: A distance concept of simplex
⒇ Type 26. Bray Curtis distance
① i = one site, j = another site
② Si = numbers of species in i, Sj = numbers of species in j
③ Cij = the less number of the overlapping sites in species
⒇ Type 27. Fourier loss
⒇ Type 28. Cook’s distance: Used to identify outliers in bioinformatics.
⒇ Type 29. Levenshtein edit distance
4. Types of Similarity Concepts
⑴ Type 1. Pearson Correlation Coefficient(Pearson correlation coefficient)
① Given the standard deviations σx, σy of X and Y,
② Characteristics
○ Correlation between two variables measured on an interval or ratio scale
○ Applicable to continuous variables
○ Assumes normality
○ Widely used in practice
⑵ Type 2. Spearman Correlation Coefficient(Spearman correlation coefficient)
① Define x’ = rank(x) and y’ = rank(x),
② Characteristics
○ A method for measuring the correlation between two ordinal-scale variables
○ A non-parametric approach targeting ordinal variables
○ Suitable for data with many zeros
○ Sensitive to deviations or errors in the data
○ Produces higher values compared to Kendall’s correlation coefficient
⑶ Type 3. Kendall Correlation Coefficient(Kendall correlation coefficient)
① Define correlation for concordant and discordant pairs,
② Characteristics
○ A method for measuring the correlation between two ordinal-scale variables
○ A non-parametric approach designed for ordinal variables
○ Suitable for data with many zeros
○ Effective for small sample sizes or when there are many ties in the data
⑷ Type 4. Matthew correlation coefficient (MCC)
⑸ Type 5. χ2
① For measurement data xm, ym, and the approximating function f(x),
⑹ Type 6.SSIM
① Image similarity comparison algorithm
def SSIM(x, y):
# assumption : x and y are grayscale images with the same dimension
import numpy as np
def mean(img):
return np.mean(img)
def sigma(img):
return np.std(img)
def cov(img1, img2):
img1_ = np.array(img1[:,:], dtype=np.float64)
img2_ = np.array(img2[:,:], dtype=np.float64)
return np.mean(img1_ * img2_) - mean(img1) * mean(img2)
K1 = 0.01
K2 = 0.03
L = 256 # when each pixel spans 0 to 255
C1 = K1 * K1 * L * L
C2 = K2 * K2 * L * L
C3 = C2 / 2
l = (2 * mean(x) * mean(y) + C1) / (mean(x)**2 + mean(y)**2 + C1)
c = (2 * sigma(x) * sigma(y) + C2) / (sigma(x)**2 + sigma(y)**2 + C2)
s = (cov(x, y) + C3) / (sigma(x) * sigma(y) + C3)
return l * c * s
⑺ Type 7. Mutual Information
① Principle: Can the second image be predicted given the first image?
② Useful for analyzing the relationship between two images obtained from different modalities
○ Example: In MRI, T1-weighted and T2-weighted images have many inverted points; mutual information considers this.
③ Code
def mutual_information(img1, img2):
import numpy as np
import cv2
import matplotlib.pyplot as plt
# img1 and img2 are 3-channel color images
a = img1[:,:,0:1].reshape(img1.shape[0], img1.shape[1])
b = img2[:,:,0:1].reshape(img2.shape[0], img2.shape[1])
hgram, x_edges, y_edges = np.histogram2d(
a.ravel(),
b.ravel(),
bins=20
)
pxy = hgram / float(np.sum(hgram))
px = np.sum(pxy, axis=1) # marginal for x over y
py = np.sum(pxy, axis=0) # marginal for y over x
px_py = px[:, None] * py[None, :] # Broadcast to multiply marginals
nzs = pxy > 0 # Only non-zero pxy values contribute to the sum
return np.sum(pxy[nzs] * np.log(pxy[nzs] / px_py[nzs]))
⑻ Type 8. Relative entropy(Kullback-Leibler divergence, KL divergence, KLD)
⑼ Type 9. Sinkhorn divergence (entropic Wasserstein distance)
⑽ Type 10. Cressie-Read power divergence
⑾ Type 11. Mr (Thresholded Mander’s Colocalization Coefficient)
① Ratio of overlapping pixels between two different monochrome images
② tMr (Thresholded Mr): Mr calculated considering values below a specific threshold as background with zero values
③ Background: Pearson correlation coefficient is not suitable for comparing monochrome images due to its negative values
④ Feature 1. Ranges from 0 to 1
⑤ Feature 2. Sensitive to background pixel values, but not heavily influenced by values of overlapping pixels
⑥ Feature 3. Dependent on Pearson correlation
⑦ Step 1. First, use Pearson correlation to obtain p-value and test for colocalization
⑧ Step 2. If colocalization is present, calculate tM1 and tM2 values
⑨ Usage: ImageJ
⑿ Type 12. Jaccard Similarity (IoU, intersection over union)
① Jaccard score: For two sets A and B,
② Takes values between 0 and 1: A value of 1 indicates that the two sets are identical, while a value of 0 indicates that they have no common elements.
③ A method used for the distance of nominal variables.
⒀ Type 13. Cosine Similarity
① Cosine value: For two vectors A and B,
⒁ Type 14. Euclidean Similarity
① Euclidean distance: For two vectors A and B,
⒀ Type 13. Coverage Score
① For two sets A and B,
⒂ Type 15. Coverage score
① For two sets A and B,
⒃ Type 16. Fisher Exact Test
① For two sets A and B,
⒄ Type 17. Faiss: Faiss is a library for efficient similarity search and clustering of dense vectors. Developed by Meta.
⒅ Type 18. Smith–Waterman similarity: Used for evaluating the similarity between nucleic acid or amino acid sequences.
⒆ Type 19. Maximal information coefficient (MIC)
⒇ Type 20. Spectral Similarity: Compares eigenvectors when comparing matrices A and B. Related to the Laplacian matrix and graph theory.
⒇ Type 21. SCC (Stratum-Adjusted Correlation Coefficient): A Pearson correlation coefficient that considers weights based on strata.
Input: 2022.08.02 16:03
Modified: 2023.08.23 14:28