Gromov-Wasserstein distance (Kantorovich–Rubinstein metric, Earth Mover’s Distance, EMD)
Recommended Post: 【Statistics】 Chapter 5-2. Distance Functions and Similarity
1. Overview
5. Applications
1. Overview
⑴ Joint Probability Distribution (Joint Probability Mass Function, Joint Probability Distribution; Coupling)
① Discrete Random Variables: For X ={x1, ···, xm}, Y ={y1, ···, yn}, the function π(xi, yj) = π(X = xi, Y = yj)
② Continuous Random Variables: For ∂2F(x, y) / ∂x ∂y = π(x, y), the function π(x, y)
③ Property 1. π(x, y) ≥ 0
④ Property 2. ∑∑ π(x, y) = 1
⑵ Marginal Probability Distribution
① Definition: Transforming the joint probability distribution into a distribution of either random variable X or Y
② Marginal Probability Distribution for Discrete Random Variables
③ Marginal Probability Distribution for Continuous Random Variables
2. Kontorovich’s problem
⑴ Definition: Given the probability distribution of X (i.e., πX), the probability distribution of Y (i.e., πY), and the cost function (i.e., c(x, y)), the problem is to find the joint probability distribution π that minimizes the expected cost.
① X is called the source, Y the target.
Figure 1. Kontorovich’s problem
⑵ Assumption: There must exist a functional relationship between X and Y ( ∵ for π(x, y) to have practical significance)
⑶ Instead of πX, conditions like {x1, x2, ···, xn} may be given (but πX=x1 = πX=x2 = ··· = πX=xn)
⑷ Example: Problem of pairing {x1, x2, ···, xn} and {y1, y2, ···, yn} in a close manner.
Figure 2. Example of Kontorovich’s problem
⑸ Kantorovich-Rubinstein duality (KD) problem is merely a modification of the above definition.
3. Optimal Transport Plan
⑴ Definition: The solution to Kontorovich’s problem, forming a convex set
⑵ Existence of Solution: Always exists under mild conditions like lower semicontinuous
⑶ Methods to find the solution
① North West Corner Method
② Vogel’s Approximation Method
③ Least-Cost Method
④ Various Numerical Analysis Techniques
⑷ Example 1. Deciding how to distribute resources when demand and supply are fixed
Figure 3. Optimal Transport Plan in Economics
⑸ Example 2. Differences in Optimal Transport Plans for Continuous and Discrete Variables
Figure 4. Differences in Optimal Transport Plans for Continuous and Discrete Variables
4. Wasserstein Distance
⑴ The expected value of the cost function in a special Kontorovich’s problem scenario where the cost function c(x, y) is the Euclidean distance d(x, y)
① Resulting in the expected value of a distance with the same unit as the distance
② Measuring how much the shape of a pile of sand changes when moved from one place to another, which is exactly the Wasserstein distance
Figure 5. Process of Moving a Pile of Sand
⑵ Can also measure the distance between two sample groups in different metric spaces
① Example: When X is a set of 2D data points, and Y is a set of 3D data points
⑶ Example: WGAN (Wasserstein Generative Adversarial Neural Network)
5. Applications
⑴ Application 1. p-th order Wasserstein distance: Using the p-th moment (p-th moment, p ≥ 1) and then raising it to the power of 1/p to produce the expected value of a distance with the same unit as Euclidean distance.
⑵ Application 2. Partial Wasserstein Distance: Calculating the Wasserstein distance using only part of X and Y
① Example 1. Generate X = {x1, x2, ···, xn}, Y = {y1, y2, ···, yn} as an example, and calculate the 2D-partial Wasserstein distance (ref)
⑶ Application 3. Entropic Wasserstein Distance: Incorporating product measure (i.e., X ⊗ Y), Relative Entropy(Kullback-Leibler divergence)
① Entropic Wasserstein distance is also known as Sinkhorn divergence
② Advantage 1. Makes Kontorovich’s problem differentiable, allowing the use of gradient-based optimization methods
③ Advantage 2. Introduces 𝝐 to provide flexibility in adjusting the optimal transport plan
⑷ Application 4. Robust Wasserstein Distance: Using cλ(x, y) = min { c(x, y), 2λ } instead of c(x, y)
⑸ Application 5. Sinkhorn-Knopp Algorithm: By normalizing the cost function, it enables ultra-fast optimization in optimal transport.
⑹ Application 6. collective OT (COT): A technique that simultaneously optimizes optimal transport across multiple pairs. (e.g., COMMOT)
Figure 6. Diagram of COMMOT
Input: Nov 4, 2023, 02:09