Chapter 20. Autoencoder
Recommended post: 【Algorithm】 Algorithm Table of Contents
1. Overview
2. Type 1. Variational Autoencoder
3. Type 2. Graph Autoencoder
4. Type 3. Matrix Factorization
5. Type 4. Latent Topic Model
6. Type 5. Generative Adversarial Network
7. Type 6. Flow Model
1. Overview
⑴ Definition: A neural network that compresses input data as much as possible, then reconstructs the compressed data back to its original form
① Encoder: An artificial neural network from the input layer to the hidden layer. Also called a recognition network
○ Recognition network: A specialized network that detects patterns and assigns meaning
② Decoder: An artificial neural network from the hidden layer to the output layer. Also called a generative network
○ Generative network: A network capable of generating new data very similar to the input data
③ The encoder and decoder are trained together
Figure 1. Structure of an Autoencoder
⑵ Formalization
Z = Eθ(X), X̂ = Dϕ(Z)
① Z: Latent variable
② Eθ: Encoder network
③ X̂: Reconstructed input
④ Dϕ: Decoder network
⑶ Characteristics
① Unsupervised learning neural network
② The encoder performs dimensionality reduction
③ The decoder functions as a generative model
④ The number of nodes in the input layer is equal to the output layer
⑤ The number of nodes in the hidden layer is less than in the input layer
⑥ Dimensionality reduction algorithms can also be considered examples of autoencoders
⑷ Effects
① Effect 1. Data compression: If the hidden layer has fewer nodes than the input layer, the network can compress the input data
② Effect 2. Data abstraction: Converts complex data into a multidimensional vector, enabling classification or recombination of input data
○ These multidimensional vectors are also called hidden layers or features
○ PCA is a representative recombination algorithm
③ Effect 3. Latent variable inference: If priors forming the input data are treated as latent variables, the autoencoder can infer them
○ Also referred to as deconvolution
2. Type 1. Variational Autoencoder
⑴ Overview
① Uses latent space and variational inference to incorporate known probability distributions
② Depending on the type of distribution, examples include the ZINB autoencoder (zero-inflated negative binomial autoencoder)
⑵ Structure
① Encoder: Maps input data to a probability distribution in the hidden layer, i.e., a parametrized distribution
② Decoder: Treats the parameterized distribution as a prior in Bayesian inference and maps to the output layer
○ Training: Minimizes the difference between the parametric posterior and true posterior using a cost function
③ Difference from autoencoder: Autoencoder is deterministic, whereas variational autoencoder is probabilistic
④ Variational autoencoder naturally performs latent matching better than the regular autoencoder
⑶ Formalization
Z ~ qθ(Z X), X̂ = Dϕ(Z)
① Z: Latent variable
② qθ: Conditional probability distribution
③ X̂: Reconstructed input
3. Type 2. Graph Autoencoder
⑴ Using graph theory and Laplacian matrices, a loss function similar to autoencoders is defined
⑵ When graph autoencoder is combined with variational inference, it is called a variational graph autoencoder (VGAE)
⑶ Comparison of AE, GAE, VAE, VGAE loss functions
4. Type 3. Matrix Factorization
⑴ Definition: An algorithm that decomposes a known matrix A into the product of matrices W and H. A ~ W × H
① Matrix A: Represents sample-feature. Known from the samples
② Matrix H: Represents variable-feature
③ Similar to K-means clustering and PCA
④ Since autoencoders include non-linear transformations, they are a broader concept than matrix factorization
⑤ The following algorithms are based on the least square method, but gradient descent method can also be used
⑵ Algorithm: Find U and V such that R = UV, R ∈ ℝ5×4, U ∈ ℝ5×2, V ∈ ℝ2×4
import numpy as np
R = np.outer([3,1,4,2.,3],[1,1,0,1]) + np.outer([1,3,2,1,2],[1,1,4,1])
U=np.random.rand(5,2)
V=np.random.rand(2,4)
for i in range(3):
V,_,_,_=np.linalg.lstsq(U, R, rcond=None)
Ut,_,_,_=np.linalg.lstsq(V.T, R.T, rcond=None)
U=Ut.T
U@V # it is similar to R!
⑶ 3-1. NMF (non-negative matrix factorization)
import numpy as np
R = np.outer([3,1,4,2.,3],[1,1,0,1]) + np.outer([1,3,2,1,2],[1,1,4,1])
U=np.random.rand(5,2)
V=np.random.rand(2,4)
for i in range(20):
V,_,_,_=np.linalg.lstsq(U, R, rcond=None)
V = np.where(V >= 0, V, 0) #set negative entries equal zero
Ut,_,_,_=np.linalg.lstsq(V.T, R.T, rcond=None)
U=Ut.T
U = np.where(U >= 0, U, 0) #set negative entries equal zero
U@V # it is similar to R!
⑷ 3-2. Matrix completion (Netflix algorithm): Performs matrix factorization on masked R
import numpy as np
R = np.outer([3,1,4,2.,3],[1,1,0,1]) + np.outer([1,3,2,1,2],[1,1,4,1])
mask=np.array([[1.,1,1,1],
[1,0,0,0],
[1,0,0,0],
[1,0,0,0],
[1,0,0,0]])
U=np.random.rand(5,2)
V=np.random.rand(2,4)
RR = U@V
RR = RR*(1-mask)+R*mask
for i in range(20):
V,_,_,_=np.linalg.lstsq(U, R, rcond=None)
V = np.where(V >= 0, V, 0)
Ut,_,_,_=np.linalg.lstsq(V.T, R.T, rcond=None)
U=Ut.T
U = np.where(U >= 0, U, 0)
RR = U@V
RR = RR*(1-mask)+R*mask
RR*(1-mask)+R*mask # it is similar to R!
⑸ 3-3. SVD (singular value decomposition)
⑹ Application 1. Cell type classification
① To obtain cell type proportions from scRNA-seq data collected from tissues
② Reducing confounding effects due to cell type heterogeneity is important
③ 1-1. Constrained linear regression
④ 1-2. Reference-based approach
○ 1-2-1. CIBERSORT (cell-type identification by estimating relative subsets of RNA transcript): Can check cell type proportion and p-value per sample
⑺ Application 2. Joint NMF: Enables extension to multi-omics
⑻ Application 3. Metagene extraction
⑼ Application 4. Starfysh: An algorithm that infers archetypes in spatial transcriptomics and determines representative anchors for each archetype.
① Step 1. Construct the autoencoder
Y = WBX
○ X ∈ ℝS×G: Input data (spot × gene)
○ D: Number of archetypes
○ B ∈ ℝD×S: Encoder. Infers archetypes; for each archetype, the total distribution across all spots must sum to 1
○ H = BX: Latent variable
○ W ∈ ℝS×D: Decoder. Reconstructs input data; for each spot, the total weight across archetypes must sum to 1.
○ Y = WBX: Reconstructed input
② Step 2. Solve the optimization algorithm to compute W and B
③ Step 3. Spots with the highest weights for each archetype in the W matrix are selected as anchor spots
④ Step 4. Granularity adjustment: When distances between archetypes are close, merge or adjust them using a hierarchical structure
⑤ Step 5. For each anchor, search nearest spots to form archetypal communities and identify marker genes
⑥ Step 6. If a signature gene set is given, add archetypal marker genes to the existing set and recalculate the anchors
○ In this case, use the stable marriage matching algorithm to pair each archetype with the most similar signature
5. Type 4. Latent Topic Model
⑴ 4-1. Unigram
⑵ 4-2. LSI (latent semantic indexing)
① Application 1. Probabilistic LSI
○ Limitation 1. Not suitable as a generative model for natural language documents: Difficult to probabilistically model unseen documents
○ Limitation 2. The number of parameters increases linearly with the number of documents used in training
⑶ 4-3. Latent Dirichlet Allocation (LDA)
① Overview
○ Definition: Given a document × word proportion, separating it into document × topic and topic × word proportion.
○ In the LDA model, the latent multinomial variable is called a topic.
○ Like NMF, generates a sparse matrix
② Formalization
○ Premise: Parameters α, β, joint distribution of topic mixture θ, topic set z of size N, and word set w of size N
○ Objective: Use Bayes theorem to find θ and z.
○ Joint probability distribution
○ Uses variational EM (expectation maximization) to find θ and z
③ Application 1. Two-level model: On the hidden topic variable z
④ Python Programming: sklearn.decomposition.LatentDirichletAllocation
6. Type 5. Generative Adversarial Network
Will be updated later.
7. Type 6. Flow Model
⑴ XVFI: A kind of optical flow.
⑵ FILM(Frame Interpolation for Large Motion): Encoder + U-Net like decoder
Input: 2023.06.27 00:55
Modified: 2025.06.22 18:49