Gromov-Wasserstein distance (Kantorovich–Rubinstein metric, Earth Mover’s Distance, EMD)
Recommended Post: 【Statistics】 Chapter 5-2. Distance Functions and Similarity
1. Overview
2. Code
1. Overview
⑴ Joint Probability Distribution (Joint Probability Mass Function, Joint Probability Distribution; Coupling)
① Discrete Random Variables: For X ={x1, ···, xm}, Y ={y1, ···, yn}, the function π(xi, yj) = π(X = xi, Y = yj)
② Continuous Random Variables: For ∂2F(x, y) / ∂x ∂y = π(x, y), the function π(x, y)
③ Property 1. π(x, y) ≥ 0
④ Property 2. ∑∑ π(x, y) = 1
⑵ Marginal Probability Distribution
① Definition: Transforming the joint probability distribution into a distribution of either random variable X or Y
② Marginal Probability Distribution for Discrete Random Variables
③ Marginal Probability Distribution for Continuous Random Variables
⑶ Kontorovich’s problem
① Definition: Given the probability distribution of X (i.e., πX), the probability distribution of Y (i.e., πY), and the cost function (i.e., c(x, y)), the problem is to find the joint probability distribution π that minimizes the expected cost.
○ X is called the source, Y the target.
② Assumption: There must exist a functional relationship between X and Y ( ∵ for π(x, y) to have practical significance)
③ Instead of πX, conditions like {x1, x2, ···, xn} may be given (but πX=x1 = πX=x2 = ··· = πX=xn)
④ Example: Problem of pairing {x1, x2, ···, xn} and {y1, y2, ···, yn} in a close manner.
Figure 1. Example of Kontorovich’s problem
⑤ Kantorovich-Rubinstein duality (KD) problem is merely a modification of the above definition.
⑷ Optimal Transport Plan
① Definition: The solution to Kontorovich’s problem, forming a convex set
② Existence of Solution: Always exists under mild conditions like lower semicontinuous
③ Methods to find the solution
○ North West Corner Method
○ Vogel’s Approximation Method
○ Least-Cost Method
○ Various Numerical Analysis Techniques
④ Example 1. Deciding how to distribute resources when demand and supply are fixed
Figure 2. Optimal Transport Plan in Economics
⑤ Example 2. Differences in Optimal Transport Plans for Continuous and Discrete Variables
Figure 3. Differences in Optimal Transport Plans for Continuous and Discrete Variables
⑸ Wasserstein Distance
① The expected value of the cost function in a special Kontorovich’s problem scenario where the cost function c(x, y) is the Euclidean distance d(x, y)
○ Resulting in the expected value of a distance with the same unit as the distance
○ Measuring how much the shape of a pile of sand changes when moved from one place to another, which is exactly the Wasserstein distance
Figure 4. Process of Moving a Pile of Sand
② Can also measure the distance between two sample groups in different metric spaces
○ Example: When X is a set of 2D data points, and Y is a set of 3D data points
③ Example: WGAN (Wasserstein Generative Adversarial Neural Network)
⑹ Applications
① Application 1. p-th order Wasserstein distance: Using the p-th moment (p-th moment, p ≥ 1) and then raising it to the power of 1/p to produce the expected value of a distance with the same unit as Euclidean distance.
② Application 2. Partial Wasserstein Distance: Calculating the Wasserstein distance using only part of X and Y
③ Application 3. Entropic Wasserstein Distance: Incorporating product measure (i.e., X ⊗ Y), Relative Entropy(Kullback-Leibler divergence)
○ Entropic Wasserstein distance is also known as Sinkhorn divergence
○ Advantage 1. Makes Kontorovich’s problem differentiable, allowing the use of gradient-based optimization methods
○ Advantage 2. Introduces 𝝐 to provide flexibility in adjusting the optimal transport plan
④ Application 4. Robust Wasserstein Distance: Using cλ(x, y) = min { c(x, y), 2λ } instead of c(x, y)
⑤ Application 5. Sinkhorn-Knopp Algorithm: By normalizing the cost function, it enables ultra-fast optimization in optimal transport.
2. Code
⑴ Example 1. Generate X = {x1, x2, ···, xn}, Y = {y1, y2, ···, yn} as an example, and calculate the 2D-partial Wasserstein distance (ref)
Input: Nov 4, 2023, 02:09