Graph Theory

Recommended Reading: 【Mathematics】 Mathematics Table of Contents

1. Basics

2. Applications

1. Basics

⑴ Components of a graph

① V: Node

② E: Edge

③ G: Graph

④ V(G) = {v₁, ···, v_n}

⑤ v_i ~ v_j: When edge e_ij connects vertices v_i, v_j.

⑵ Degree of Graph G: Number of nodes, i.e.,

V(G)

⑶ Size of Graph G: Number of edges, i.e.,

E(G)

⑷ Degree of node v: Number of edges connected to node v, i.e., deg(v).

① deg(v_i) = ∑_{v_i~v_j} 1

② δ(G) refers to the minimum degree of nodes in G, and Δ(G) refers to the maximum degree of nodes in G.

③ ∑ deg(v_i) = 2 E(G)

④ If all the nodes of a graph have the same degree, the graph is regular.

⑤ The nodes of an Eulerian graph have even degree.

⑸ Simple graph and complete graph

① Simple graph: A graph in which at most one edge exists between any two nodes, and self-loops (edges connecting a node to itself) are not allowed.

Figure 1. Simple Graph

② Complete graph: A graph in which all nodes are connected by unique edges.

Figure 2. Complete Graph

○ strongly connected (= irreducible): A state where any node i can reach any other node j within the graph.

○ The number of edges in a complete graph is _NC₂ = N(N-1) / 2.

⑹ Subgraph: H is a subgraph of G if V(H) ⊆ V(G) and E(H) ⊆ E(G).

① A subgraph H is an induced subgraph of G if two vertices of V(H) are adjacent if and only if they are adjacent in G.

② clique: a complete subgraph of a graph.

③ Walk: A path like v₀ → v₁ → … → v_m, connected by edges v₀v₁, v₁v₂, …, v_m-1v_m.

○ v₀ is called the initial vertex, and v_m is called the final vertex.

○ The number of edges in the walk is called the length.

④ Path: A walk where all nodes are distinct.

⑤ Cycle: A walk where v₀ = v_m.

⑥ Forest: A graph containing no cycles.

⑦ Tree: A connected forest

⑧ Period: The period of a node i is the greatest common divisor of all paths from node i to itself.

○ Example: If two nodes A and B are connected by two edges such that A = B, the period of each node is 2.

⑨ Aperiodic: When the period of all nodes is 1.

○ aperiodic ⊂ irreduicible

○ Example: A walk where each node has a path leading back to itself is aperiodic.

⑩ Regular

○ regular ⊂ irreducible

○ For some natural number k, all elements of the k-th power of the transition matrix M, M^k, are positive (i.e., non-zero values).

⑺ Types of graph

① Undirected graph: A graph with undirected edges

② Directed graph: A graph with directed edges

③ k-partite graph: If its set of vertices admits a partition into k classes such that the vertices of the same class are not adjacent.

⑻ Centrality of a network

① Degree centrality shows people with many social connections.

② Closeness centrality indicates who is at the heart of a social network.

③ Betweenness centrality describes people who connect social circles.

④ Eigenvector centrality is high among influential people in the network.

⑼ Adjacency matrix: Represented as A.

① Definition: When the number of nodes is n, an n × n adjacency matrix is defined as, A_i,j = 1 (if v_i ~ v_j), A_i,j = 0 (otherwise)

Figure 3. Example of adjancy matrix

② Real-valued function on the set of the graph’s vertices, f: V → ℝ

Figure 4. Example of the real-valued function f

③ The following fomula holds

④ Symmetrically normalized adjacency matrix: Ã = D^-1/2AD^-1/2 (cf. D_ii = d(v_i))

⑽ Incidence matrix: Represented as ∇.

① Incidence matrix of undirected graph is defined as, B_i,j = 1 (if vertex v_i is incident with edge e_j), B_i,j = 0 (otherwise)

Figure 5. Example of incidence matrix

② Incidence matrix of directed graph

③ Both cases where nodes are rows and edges are columns (e.g., Fig. 5) and where edges are rows and nodes are columns (e.g., Fig. 6) are being used.

⑾ Laplacian matrix

① Overview

○ Definition: L = ∇^T∇ = D - A (cf. D_ii = d(v_i))

○ Multiplicity: # of connected components of the graph

Figure 6. Example of Laplacian matrix

○ Here, ∇ is as follows:

Matrix([
    [1, -1,  0,  0],  # e1: v1 -> v2
    [1,  0, -1,  0],  # e2: v1 -> v3
    [0,  1, -1,  0],  # e3: v2 -> v3
    [0,  1,  0, -1]  # e4: v2 -> v4
])

② Modification of Laplacian matrix

○ Normalized graph Laplacian: L_n = D^-1/2LD^-1/2 = I - D^-1/2AD^-1/2. symmetric, semi-definite positive

○ Transition matrix of Markov chain: L^t = D^-1A

○ Random-walk graph Laplacian: L_r = D^-1L = I - L_t = D^-1/2L_nD^1/2

③ Feature 1. L has an eigenvalue of 0. L1_n = 0

○ If there are k disconnected graphs, there are k eigenvalues of 0.

○ The closer the eigenvalue is to 0, the more connected the graph is.

○ Fiedler vector: The eigenvector corresponding to the second smallest eigenvalue after λ₁ = 0. It represents the overall structure of the graph.

④ Feature 2. (Lf)(v_i) = ∑_{v_j~v_i} (f(v_i) - f(v_j)): Refer to Fig. 4.

⑤ Feature 3. L is symmetric and positive semi-definite.

○ Quadratic form: f^TLf = ∑_{e_i,j, i < j} (f(v_i) - f(v_j))² = (1/2) ∑_{e_i,j} (f(v_i) - f(v_j))² ≥ 0

○ In the example of Fig 4, f^TLf = 9.27

○ That is, L has n non-negative, real-valued eigen values.

⑥ Feature 4. Given Y = [y₁ y₂ ⋯ y_m] ∈ ℝ^k×m,

○ tr(Y^TLY) = ∑_{e_i,j} y_i - y_j ²

⑦ Feature 5. Rayleigh Quotient

○ λ₂, the Fiedler value (algebraic connectivity), represents the amount of energy required to differentiate or separate values.

○ A large λ₂ indicates that the data is difficult to separate, suggesting a well-connected overall structure.

○ A small λ₂ means the data is easy to separate, indicating a weakly connected structure or one that is clustered into distinct groups.

⑧ Application 1. Laplacian operator for an undirected weighted graph

⑨ Application 2. Spectral Clustering (ref)

○ The original problem

○ It seeks to find “min-cuts” in a graph: vertex partition with minimum inter-cluster edge weight while normalizing the cluster volume.

○ Definition of cut

○ Definition of volume

○ 2-cluster optimization problem

○ Multi-cluster optimization problem

○ Equivalent to finding one-hot encoding of cluster labels X ∈ {0,1}^N×K that satisfies

○ A solution using the Laplacian matrix

○ Problem definition

○ Reason why the condition y^TDy=1 is necessary: Without this condition, there exists a trivial solution y=0, and for any nonzero y, there exists a scalar c with 0≤c<1 such that cy makes the objective function arbitrarily small.

○ Reason why the condition y^TD1=0 is necessary: This condition is necessary and sufficient for y≠1 (orthogonality). Without it, the minimum of the objective function is always 0 and always achieved at y=1 (by the method of Lagrange multipliers).

○ In the original spectral clustering problem, relaxing the one-hot constraint and adding the condition X^TDX = I makes it equivalent to this solution.

○ Step 1: Calculate the distance between the given n data points to create an n × n adjacency matrix A. The Gaussian kernel is often used.

○ Step 2: Based on the adjacency matrix, calculate the Laplacian matrix: L_n or L_r can also be used.

○ Step 3: Calculate the matrix of eigenvectors W = [ω₂, ···, ω_k] ∈ ℝ^n×k corresponding to the k smallest eigenvalues of the Laplacian matrix. Exclude ω₁ = 1, which corresponds to λ₁ = 0, as it is trivial.

○ Step 4: The n data points are naturally embedded into k dimensions: Y = W^T = [y₁, ⋯, y_n], with each column vector representing the embedded data points.

○ Step 5: Perform clustering, such as K-means clustering, on the column vectors of Y.

⑩ Application 3. The Laplacian matrix can be used as a Fourier transform operator (e.g., G-FuNK)

⑿ Degree distribution: The ratio of nodes with degree k among all nodes, i.e., P(k) = N_k /

V(G)

⒀ Distance: The length of the shortest path between two nodes. If no such path exists, the distance is defined as ∞.

⒁ Clustering coefficient: If the degree of a specific node is k_i and the number of edges between the surrounding nodes connected to that node is e_i, then:

① For example, if A is connected to B, C, and D, and there are additional edges B-C and C-D, then for A, e_i = 2, k_i = 3, C_i = 2 / 3.

② k_i (k_i - 1) / 2: The maximum number of edges that could be formed between the surrounding nodes.

③ C_i has a value between 0 and 1, with a value close to 1 indicating that the surrounding nodes are fully connected.

2. Applications

⑴ Pigeonhole principle

⑵ The number of unique graphs depending on the number of nodes: Nearly perfectly follows an exponential function.

⑶ Dijkstra and Floyd algorithms

⑷ Algorithm for counting the number of subgraphs

⑸ The power of belief

⑹ Social network theory or six degrees of separation

① The theory that any two people on Earth are connected by no more than six intermediaries.

② Proof: If one person knows an average of 100 people, then after six steps, 100⁶ = 1 billion, reaching a similar conclusion.

⑺ Perron-Frobenius theorem

① Theorem 1. If the Markov chain with transition matrix P is strongly connected, then there is exactly one steady-state distribution q.

○ The steady-state distribution satisfies Pq = q.

② Theorem 2. If it is also aperiodic, then:

○ P_ij: The probability of transitioning from node j to node i. ∑_i P_ij = 1.

○ 2-1. the matrix P^k converges to a matrix with columns all equal to q as k → ∞; i.e., P_ij(k) converges to q_i (for any i and j) as k → ∞

○ 2-2. for any initial state x₀ is, the Markov Chain {x_k} converges to q as k → ∞

⑻ Google’s PageRank Algorithm

① Step 1. Represent web pages as a graph and express it as a Markov process transition matrix P.

② Step 2. For sink nodes, forcibly assign the transition probability to other pages as 1/n: P → P’.

③ Step 3. P’’ = (1 - α) P’ + (α / n) 1.

○ Here, 1 refers to an n × n matrix filled with ones.

○ While P’ is not guaranteed to be regular, P’’ is always guaranteed to be regular.

○ P’’ is a Markov chain defined by Google’s founders, Sergey Brin and Larry Page.

④ Step 4. Search for the steady-state vector x such that P’‘x = x.

○ Since P’’ is regular, it is strongly connected, which ensures the existence of a unique steady-state.

○ The state-transition probability at this point can be considered as the importance of the recommended page.

⑼ Four color theorem

① Any map can be colored with at most four colors, ensuring that no two adjacent regions share the same color.

② The theorem was proven by verifying that the four-color theorem holds in all cases using a computer.

⑽ Eulerian path problem or Königsberg bridge problem

① An Eulerian path, which passes through each node exactly once, exists in a graph if and only if there are 0 or 2 nodes with odd degrees.

② If there are 2 nodes with odd degrees, one of these nodes is the starting point and the other is the endpoint.

③ There is no path that crosses all seven bridges of Königsberg exactly once.

⑾ Ramsey theory

① In any sufficiently large graph, a certain structure (e.g., a complete graph of acquaintances or a subgraph of strangers) must exist.

② R(s, t): The minimum number of nodes such that a subgraph of s strangers or a subgraph of t acquaintances is guaranteed to exist.

③ Theorem 1. R(3, 3) = 6: Among any 6 people, there are either 3 mutual friends or 3 mutual strangers. This can be proven by counting possibilities.

④ Theorem 2. R(3, 3, 3) = 17.

⑤ Theorem 3. R(3, 3, 4) = 30.

⑥ Theorem 4. Recurrence relation of R(s, t): R(s, t) ≤ R(s-1, t) + R(s, t-1).

⑦ Theorem 5. Existence of an upper bound for R(s, t): Proven through the recurrence relation of R(s, t) and mathematical induction (ref).

⑧ Theorem 6. R_k(s₁, s₂, …, s_k) ≤ R_k/2(R(s₁, s₂), R(s₃, s₄), …, R(s_k-1, s_k)).

⑿ Schur’s theorem

① For any k ≥ 2, there exists an n > 3 such that under k-coloring, there always exists a (x, y, z) satisfying x + y = z, colored the same way in {1, 2, …, n}. Proven through Ramsey theory.

② This is connected to Fermat’s Last Theorem, revealing the existence of (x, y, z) where x^m + y^m = z^m (mod p).

⒀ Erdős–Rényi Random Graph

① A very large graph emerges at a certain threshold, even with a simple rule of connecting nodes randomly.

Input: 2024.10.01 19:44

616

Graph Theory

1. Basics

2. Applications

results matching ""

No results matching ""