Chapter 7. Dimension Reduction Algorithm
Recommended post : 【Algorithm】 Algorithm Index
1. Overview
2. Type 1. Principal Component Analysis (PCA)
3. Type 2. SNE, symmetric-SNE, tSNE
4. Type 3. UMAP
5. Type 4. SVD
6. Type 5. ICA
7. Others
b. UMAP (uniform manifold approximation and projection)
1. Overview
⑴ Definition of Dimension Reduction
① When given high-dimensional data X(1), ···, X(n) ∈ ℝp,
② Find an appropriate subspace V ⊂ ℝp (where dim(V) = k),
③ Calculate the projection of X onto V, denoted as X̃.
④ It is an unsupervised algorithm.
⑵ Characteristics
① Information preservation
② Easier model training : In machine learning, it requires fewer parameters, so it is faster with lower computation.
③ Easier result interpretation : Visualization of high-dimensional data.
④ Noise reduction
⑤ It can reveal the true dimensionality of the given data.
⑶ Methods
① Feature selection
○ Selecting a few important variables from the existing ones and discarding the rest.
○ Choose one variable with a high correlation or high variance inflation factor (VIF).
② Feature extraction
○ Combine all variables to derive new variables that best represent the data.
○ Techniques that combine existing variables to create new ones.
2. Type 1. Principal Component Analysis (PCA)
⑴ Overview
① Definition : A dimension reduction technique that transforms correlated variables into orthogonal variables.
② In other words, it is a dimension reduction method that leaves only a few meaningful coordinate systems.
③ The orthogonal variables are called principal components, which can be used when they contain as much information as the original data.
④ It originates from the awareness that the given dimensionality and the true dimensionality of the data may differ.
⑤ Introduced by Pearson in 1901.
⑥ PCA can be defined not only in subspace (passing through the origin) but also in affine space.
⑦ ** Normalize by the mean and then perform PCA on the subspace.
⑧ PCA does not require multivariate normality of the given data.
⑨ It can be used when the data is linearly distributed.
⑩ Whitening: It is better to standardize features along the principal component directions when performing standardization.
⑵ Premises
① Represent the basis of the space where the data belongs as e1, ···, ep (the set of bases does not need to be orthogonal).
○ e1, ···, ep can be considered as bases representing the given dimensions (no need for orthogonality).
② Goal : To find the orthogonal basis Z1, ···, Zk of such a subspace (k < n) (typical unsupervised problem setting).
○ Z1, ···, Zk can be considered as bases representing the true dimensionality of the data.
③ Random vector
○ X represents a specific data point : x1, ···, xn are each component.
④ Mean vector : Also called population mean.
⑤ Variance-covariance matrix
⑥ Unbiased sample variance-covariance matrix : With the design matrix X, and center matrix Xc,
○ Used when the population variance-covariance matrix is unknown.
○ Since X ∈ ℝn×p, XTX ∈ ℝp×n × ℝn×p = ℝp×p.
⑦ Expression of the goal orthogonal basis
○ Zk : = k-th PC ∈ ℝp×1.
⑧ Characteristics of the goal orthogonal basis : Easily derived from linear algebra concepts.
○ Var, Cov, etc., are specific values related to variance, belonging to ℝ1x1.
○ Do not confuse with the variance-covariance matrix.
⑶ Definition of the PCA problem
⑷ Solution to the PCA problem
① Method 1. Eigen-decomposition of ∑.
○ Align well along each coordinate axis so that the covariance matrix becomes diagonal.
○ If we set the directions of data variation along the x, y, z axes, the covariance in the x-y plane becomes 0.
○ Note that (PTX)T = XTPTT = XTP, and E[P] = P, E[PT] = PT.
○ Since P is a rotation transformation, the diagonal matrix is also the inverse matrix, leading to the following:
○ P = [ p1, p2, ···, pN], Z = cov( X ) is defined, so,
○ Hence, λi pi = Zpi, where λ is the eigenvalue and pi is the eigenvector.
○ Remove eigenvalues smaller than a certain threshold η, and perform dimension reduction by projecting the data set onto the remaining eigenvectors.
○ Conclusion 1. The d-th principal component PCd is the eigenvector corresponding to the d-th largest eigenvalue.
○ Conclusion 2. The meaning of eigenvectors with the same eigenvalue : The magnitude of the corresponding principal component is the same.
② Method 2. Singular value decomposition (SVD) of the center matrix Xc: The standard method currently used in PCA algorithms.
○ Step 1: Center the data to obtain the centered data Xc.
○ Step 2: Define C = (1/n) XcTXc = ∑ (covariance matrix) ○ Step 3: Since C is symmetric, it can be orthogonally diagonalized as C = VDVT (where V is the same as the V in Xc = U∑VT)
③ Method 3. Lagrange multiplier under equality constraint.
⑸ Determining the number of axes
① Method 1. Scree plot for the fraction of total variance.
○ Definition of the fraction of total variance.
○ Graphical representation.
Figure 1. Example of axis setting.
Figure 2. Scree plot : Shows the amount of variance explained by each principal component.
② Method 2. Scree plot for eigenvalues.
③ Method 3. Use in-sample 10-fold cross-validation to determine the number of axes p that minimizes MSPE.
Figure 3. Determination of the number of axes.
④ Since information about the remaining axes is lost, PCA functions as a lossy compression algorithm.
⑹ Summary
① Comparison of prediction performance.
Figure 4. Prediction performance comparison.
3. Type 2.SNE, symmetric-SNE, tSNE
⑴ A dimension reduction algorithm that attempts to preserve local neighborhoods in the data.
⑵ Nonlinear and non-deterministic.
⑶ Requires knowledge of cost function, updating algorithm, etc., related to deep learning.
4. Type 3. UMAP
5. Type 4. Singular Value Decomposition (SVD)
⑴ A method that extracts singular values from an M × N dimensional matrix and reduces the dimensionality of the given data.
⑵ Formula : Let A be an m × n matrix with rank r.
① Then there exists an m × n matrix ∑ whose diagonal entries are the r nonzero singular values of A, σ1 ≥ σ2 ≥ ··· ≥ σr > 0
② and there exists an m × m orthogonal matrix U and an n × n orthogonal matrix V such that
○ The SVD is not unique since we can switch the sign of a column of U and the sign of the corresponding column of V.
③ Python code
# np.linalg.svd computes the SVD of a matrix
A = np.array([[4,11,14],
[8,7,-2]])
U, s, Vt=np.linalg.svd(A)
print(U)
print(s) # s is the vector of nonzero singular values
print(Vt)
⑶ Application 1. ATA = V(∑T∑)VT : The singular values of A are the square roots of the eigenvalues of ATA. However, some eigenvalues are set to 0 according to the rank.
⑷ Application 2. reduced SVD: Note that for the reduced version, both U and V have orthonormal columns. This means that UTU = I and VTV = I. However, U and V are not square in this version, so they are not orthogonal matrices.
A = np.array([[4,11,14],
[8,7,-2]])
U, s, Vt=np.linalg.svd(A, full_matrices=False)
print(U)
print(s)
print(Vt)
⑸ Application 3. rank-k approximation
① For k < rank(A), the m × n matrix A can be split into m × k + k × n matrices to achieve compression.
② Given k, how to find the optimal rank-k approximation of A:
○ A = UΣVT = σ1u1v1T + ··· + σrurvrT, where σ1 ≥ ⋯ ≥ σr, remove the smaller singular values σr, σr−1, ….
○ By doing this, the Frobenius distance between A(k) and A is defined as follows:
⑹ Application 4. PCA(principal component analysis)
① Center the data
② Xc is the centered data and C = (1/n) XcTXc
③ C is symmetric, so it is orthogonally diagonalizable as C = VDVT. V is the same as in Xc = U∑VT in SVD.
6. Type 5. Independent Component Analysis (ICA)
⑴ Unlike PCA, it reduces dimensionality by statistically separating multivariate signals into independent components.
⑵ The distribution of independent components follows a non-normal distribution.
7. Others
⑴ Factor analysis
① Definition : A method of interpreting the structure within the data by grouping similar variables based on their correlations.
② Assumes the existence of latent variables that cannot be observed in the data.
③ Often used in social sciences or surveys.
⑵ Multi-dimensional scaling (MDS)
① Definition : A statistical technique for visualizing proximity and grouping of objects in a lower-dimensional space by identifying patterns and structures latent in the data.
② Measures the similarities and dissimilarities between objects and represents them as points in a 2D or 3D space.
⑶ Linear discriminant analysis (LDA)
① Definition : A method that reduces dimensionality by projecting data onto a lower-dimensional space, similar to PCA.
⑷ CCA (Canonical-correlation analysis) : Used in the Seurat pipeline.
⑸ RPCA (Reciprocal principal component analysis) : Used in the Seurat pipeline.
⑹ PHATE
⑺ ODA
⑻ LLE
⑼ LSA
⑽ PLSA (probabilistic latent semantic analysis)
① Used to decompose mixed components from the latent class model.
② Application : Used for dimension reduction in mass spectrometry data.
Input: 2021.12.3 16:22