Chapter 5-2. Types of Distances and Similarities
Recommended Article: 【Statistics】 Chapter 5. Statistics
1. Overview
4. Types of Similarity Concepts
1. Overview
⑴ The difference between norm and distance: often used interchangeably.
① norm
② distance function (metric)
③ If a norm is defined, a distance 𝑑 can be defined.
④ However, the existence of a corresponding norm is not guaranteed just because a distance is defined.
⑵ The difference between distance and similarity: often used interchangeably.
① Commonality: Saying two data points are close (short distance) is equivalent to saying they are similar. In other words, distance ∝ 1 / similarity.
② Commonality: In machine learning, terms like loss function, error function, and cost function also refer to the difference between the true value and the predicted value (∝ 1 / similarity).
③ Difference: While a distance function is rigorously defined in linear algebra, it’s not necessary for loss functions or similarity measures to satisfy that definition.
⑶ Various types of distances and concepts of similarity.
Figure 1. Various types of distances and concepts of similarity
2. Types of Norm Concepts
⑴ Type 1. L1-norm
⑵ Type 2. L2-norm
⑶ Type 3. p-norm
⑷ Type 4. Frobenius norm
3. Types of Distance Concepts
⑴ Type 1. L1 Loss Function (L1-distance, MAE, city-block distance, taxicab distance, rectilinear distance, Manhattan distance, sparse learning, compressed sensing)
① In other words, it is a method of calculating distance by setting paths in shapes like ‘ㄱ’ and ‘ㄴ’.
⑵ Type 2. L2 Loss Function (L2-distance, MSE): Euclidean distance using the Pythagorean theorem (standard)
⑶ Type 3. Cross Entropy: Typically has a binary cross-entropy (BCE).
⑷ Type 4. Distance in Information Theory
⑸ Type 5. Delaunay Triangulation(Delaunay triangulation)
⑹ Type 6. Dot Product: Vector inner product
⑺ Type 7. Linkage Metric: Defines distance between clusters
⑻ Type 8. Hamming Distance
① Assign binary values to each data point and measure the distance between data points based on the difference in values. This is often used in information theory.
② Example: (0, 1, 1, 0, 0, 1) and (1, 1, 1, 1, 0, 0) have different values at the 1st, 4th, and 6th positions, so the Hamming distance is 3.
⑼ Type 9. Standardized Distance:
① Distance standardized by the measurement unit of the variable.
② Formula:
d(i, j)2 = (Xi - Xj)T D-1 (Xi - Xj)
○ Xi: Starting point matrix
○ Xj: Endpoint matrix
○ D: Sample variance (diagonal) matrix
⑽ Type 10. Mahalanobis Distance**
Figure 2. Mahalanobis Distance
① A statistical distance that considers both the standardization of variables and the correlation between variables (shape of the data distribution).
② Formula: When trying to determine the distance d between two data points Xi and Xj, the following formula is used:
d(i, j)2 = (Xi - Xj)T S-1 (Xi - Xj)
○ Xi : Starting point matrix
○ Xj : Endpoint matrix
○ S : Sample covariance matrix
③ Advantages: Unlike Euclidean distance, it is scale-free, considers data correlation, and has benefits such as outlier detection.
④ Limitations: Assumes the normality of the data. The process of calculating the sample covariance matrix is computationally intensive.
⑤ Python code
import numpy as np
from scipy.linalg import inv
def mahalanobis_distance(x, y, covariance_matrix, regularization=1e-6):
# Add regularization to the covariance matrix's diagonal
regularized_cov = covariance_matrix + np.eye(covariance_matrix.shape[0]) * regularization
x_minus_y = np.array(x) - np.array(y)
covariance_matrix_inv = inv(regularized_cov)
distance = np.dot(np.dot(x_minus_y, covariance_matrix_inv), x_minus_y.T)
return np.sqrt(distance)
# Example usage:
x = [1, 2, 3]
y = [4, 5, 6]
data = np.array([x, y])
cov_matrix = np.cov(data, rowvar=False) # Here, we're just using the covariance of x and y for simplicity
print(mahalanobis_distance(x, y, cov_matrix))
⑾ Type 11. Levenshtein Distance: An algorithm that determines how similar two strings, A and B, are to each other.
⑿ Type 12. Minkowski Distance
① Distance in m-dimensional Minkowski space.
② When m = 1, it is equivalent to Manhattan distance.
③ When m = 2, it is equivalent to Euclidean distance.
⒀ Type 13. Hausdorff Distance
① Formalization: For two sets A = {a1, …, ap} and B = {b1, …, bq},
H(A, B) = max(h(A, B), h(B, A))
② directed Hausdorff distance: The distance between the two points in A and B that are furthest apart,
h(A, B) = maxa ∈ A minb ∈ B || a - b ||
⒁ Type 14. Focal Loss
① Formalization
FL = -(1 - Pt)γ log (Pt)
⒂ Type 15. Sørensen–Dice Coefficient (Dice Distance)
① Formalization
2 | A ∩ B| / | A ∪ B |
⒃ Type 16. Gromov-Wasserstein distance (Kantorovich–Rubinstein metric, Earth Mover’s Distance, EMD)
⒄ Type 17. Sinkhorn divergence
⒅ Type 18. Cressie-Read power divergence
⒆ Type 19. Jensen-Shannon distance
⒇ Type 20. total variation (TV) distance
⒇ Type 21. Kolmogorov-Sminrov distance
⒇ Type 22. Hellinger distance: Requires kernel density estimation for probability density function (pdf).
⒇ Type 24. Huber loss function
⒇ Type 25. Bhattacharyya loss
⒇ Type 26. evidence lower bound (ELBO)
⒇ Type 27. Aitchison distance: Concept of distance in a simplex.
⒇ Type 28. Bray Curtis distance
BCij = 1 - 2Cij / (Si + Sj)
① i = one site, j = another site
② Si = numbers of species in i, Sj = numbers of species in j
③ Cij = the less number of the overlapping sites in species
⒇ Type 29. Fourier loss
4. Types of Similarity Concepts
⑴ Type 1. Pearson Correlation Coefficient(Pearson correlation coefficient)
① Given the standard deviations σx, σy of X and Y,
⑵ Type 2. Spearman Correlation Coefficient(Spearman correlation coefficient)
① Define x’ = rank(x) and y’ = rank(x),
⑶ Type 3. Kendall Correlation Coefficient(Kendall correlation coefficient)
① Define correlation for concordant and discordant pairs,
⑷ Type 4. Matthew correlation coefficient (MCC)
⑸ Type 5. χ2
① For measurement data xm, ym, and the approximating function f(x),
⑹ Type 6.SSIM
① Image similarity comparison algorithm
def SSIM(x, y):
# assumption : x and y are grayscale images with the same dimension
import numpy as np
def mean(img):
return np.mean(img)
def sigma(img):
return np.std(img)
def cov(img1, img2):
img1_ = np.array(img1[:,:], dtype=np.float64)
img2_ = np.array(img2[:,:], dtype=np.float64)
return np.mean(img1_ * img2_) - mean(img1) * mean(img2)
K1 = 0.01
K2 = 0.03
L = 256 # when each pixel spans 0 to 255
C1 = K1 * K1 * L * L
C2 = K2 * K2 * L * L
C3 = C2 / 2
l = (2 * mean(x) * mean(y) + C1) / (mean(x)**2 + mean(y)**2 + C1)
c = (2 * sigma(x) * sigma(y) + C2) / (sigma(x)**2 + sigma(y)**2 + C2)
s = (cov(x, y) + C3) / (sigma(x) * sigma(y) + C3)
return l * c * s
⑺ Type 7. Mutual Information
① Principle: Can the second image be predicted given the first image?
② Useful for analyzing the relationship between two images obtained from different modalities
○ Example: In MRI, T1-weighted and T2-weighted images have many inverted points; mutual information considers this.
③ Code
def mutual_information(img1, img2):
import numpy as np
import cv2
import matplotlib.pyplot as plt
# img1 and img2 are 3-channel color images
a = img1[:,:,0:1].reshape(img1.shape[0], img1.shape[1])
b = img2[:,:,0:1].reshape(img2.shape[0], img2.shape[1])
hgram, x_edges, y_edges = np.histogram2d(
a.ravel(),
b.ravel(),
bins=20
)
pxy = hgram / float(np.sum(hgram))
px = np.sum(pxy, axis=1) # marginal for x over y
py = np.sum(pxy, axis=0) # marginal for y over x
px_py = px[:, None] * py[None, :] # Broadcast to multiply marginals
nzs = pxy > 0 # Only non-zero pxy values contribute to the sum
return np.sum(pxy[nzs] * np.log(pxy[nzs] / px_py[nzs]))
⑻ Type 8. Relative entropy(Kullback-Leibler divergence, KL divergence, KLD)
⑼ Type 9. Mr (Thresholded Mander’s Colocalization Coefficient)
① Ratio of overlapping pixels between two different monochrome images
② tMr (Thresholded Mr): Mr calculated considering values below a specific threshold as background with zero values
③ Background: Pearson correlation coefficient is not suitable for comparing monochrome images due to its negative values
④ Feature 1. Ranges from 0 to 1
⑤ Feature 2. Sensitive to background pixel values, but not heavily influenced by values of overlapping pixels
⑥ Feature 3. Dependent on Pearson correlation
⑦ Step 1. First, use Pearson correlation to obtain p-value and test for colocalization
⑧ Step 2. If colocalization is present, calculate tM1 and tM2 values
⑨ Usage: ImageJ
⑽ Type 10. Jaccard Similarity (IoU, intersection over union)
① Jaccard score: For two sets A and B,
⑾ Type 11. Cosine Similarity
① Cosine value: For two vectors A and B,
⑿ Type 12. Euclidean Similarity
① Euclidean distance: For two vectors A and B,
⒀ Type 13. Coverage Score
① For two sets A and B,
⒁ Type 14. Fisher Exact Test
① For two sets A and B,
⒂ Type 15. Faiss: Faiss is a library for efficient similarity search and clustering of dense vectors. Developed by Meta.
⒃ Type 16. Smith–Waterman similarity: Used for evaluating the similarity between nucleic acid or amino acid sequences.
⒄ Type 17. Maximal information coefficient (MIC)
⒅ Type 18. Spectral similarity: For the 𝑘 k-th eigenvalues 𝜆 𝐴 𝑘 λ Ak and 𝜆 𝐵 𝑘 λ Bk of matrices 𝐴 A and 𝐵 B, respectively.
Input: 2022.08.02 16:03
Modified: 2023.08.23 14:28