Korean, Edit

Chapter 7. Dimension Reduction Algorithm

Recommended post : 【Algorithm】 Algorithm Index


1. Overview

2. Type 1. Principal Component Analysis (PCA)

3. Type 2. SNE, symmetric-SNE, tSNE

4. Type 3. UMAP

5. Type 4. SVD

6. Type 5. ICA

7. Others


a. SNE, symmetric-SNE, tSNE

b. UMAP (uniform manifold approximation and projection)



1. Overview

⑴ Definition of Dimension Reduction

① When given high-dimensional data X(1), ···, X(n) ∈ ℝp,

② Find an appropriate subspace V ⊂ ℝp (where dim(V) = k),

③ Calculate the projection of X onto V, denoted as X̃.

④ It is an unsupervised algorithm.

⑵ Characteristics

① Information preservation

② Easier model training : In machine learning, it requires fewer parameters, so it is faster with lower computation.

③ Easier result interpretation : Visualization of high-dimensional data.

④ Noise reduction

⑤ It can reveal the true dimensionality of the given data.

⑶ Methods

① Feature selection

○ Selecting a few important variables from the existing ones and discarding the rest.

○ Choose one variable with a high correlation or high variance inflation factor (VIF).

② Feature extraction

○ Combine all variables to derive new variables that best represent the data.

○ Techniques that combine existing variables to create new ones.



2. Type 1. Principal Component Analysis (PCA)

⑴ Overview

① Definition : A dimension reduction technique that transforms correlated variables into orthogonal variables.

② In other words, it is a dimension reduction method that leaves only a few meaningful coordinate systems.

③ The orthogonal variables are called principal components, which can be used when they contain as much information as the original data.

④ It originates from the awareness that the given dimensionality and the true dimensionality of the data may differ.

⑤ Introduced by Pearson in 1901.

⑥ PCA can be defined not only in subspace (passing through the origin) but also in affine space.

** Normalize by the mean and then perform PCA on the subspace.

⑧ PCA does not require multivariate normality of the given data.

⑨ It can be used when the data is linearly distributed.

⑩ Whitening: It is better to standardize features along the principal component directions when performing standardization.

⑵ Premises

① Represent the basis of the space where the data belongs as e1, ···, ep (the set of bases does not need to be orthogonal).

○ e1, ···, ep can be considered as bases representing the given dimensions (no need for orthogonality).

② Goal : To find the orthogonal basis Z1, ···, Zk of such a subspace (k < n) (typical unsupervised problem setting).

○ Z1, ···, Zk can be considered as bases representing the true dimensionality of the data.


image


Random vector

○ X represents a specific data point : x1, ···, xn are each component.


image


Mean vector : Also called population mean.


image


Variance-covariance matrix


image


Unbiased sample variance-covariance matrix : With the design matrix X, and center matrix Xc,


image


○ Used when the population variance-covariance matrix is unknown.

○ Since X ∈ ℝn×p, XTX ∈ ℝp×n × ℝn×p = ℝp×p.

⑦ Expression of the goal orthogonal basis


image


○ Zk : = k-th PC ∈ ℝp×1.

⑧ Characteristics of the goal orthogonal basis : Easily derived from linear algebra concepts.


image


○ Var, Cov, etc., are specific values related to variance, belonging to ℝ1x1.

○ Do not confuse with the variance-covariance matrix.

⑶ Definition of the PCA problem


image


⑷ Solution to the PCA problem

Method 1. Eigen-decomposition of ∑.

○ Align well along each coordinate axis so that the covariance matrix becomes diagonal.


image


○ If we set the directions of data variation along the x, y, z axes, the covariance in the x-y plane becomes 0.


image


○ Note that (PTX)T = XTPTT = XTP, and E[P] = P, E[PT] = PT.

○ Since P is a rotation transformation, the diagonal matrix is also the inverse matrix, leading to the following:


image


P = [ p1, p2, ···, pN], Z = cov( X ) is defined, so,


image


○ Hence, λi pi = Zpi, where λ is the eigenvalue and pi is the eigenvector.

○ Remove eigenvalues smaller than a certain threshold η, and perform dimension reduction by projecting the data set onto the remaining eigenvectors.

Conclusion 1. The d-th principal component PCd is the eigenvector corresponding to the d-th largest eigenvalue.

Conclusion 2. The meaning of eigenvectors with the same eigenvalue : The magnitude of the corresponding principal component is the same.

Method 2. Singular value decomposition (SVD) of the center matrix Xc: The standard method currently used in PCA algorithms.

Step 1: Center the data to obtain the centered data Xc.

Step 2: Define ​C = (1/n) XcTXc = ∑ (covariance matrix) ​ ○ Step 3: Since C is symmetric, it can be orthogonally diagonalized as C = VDVT (where V is the same as the V in Xc = U∑VT)

Method 3. Lagrange multiplier under equality constraint.

⑸ Determining the number of axes

Method 1. Scree plot for the fraction of total variance.

○ Definition of the fraction of total variance.


image


○ Graphical representation.


image

Figure 1. Example of axis setting.


image

Figure 2. Scree plot : Shows the amount of variance explained by each principal component.


Method 2. Scree plot for eigenvalues.

Method 3. Use in-sample 10-fold cross-validation to determine the number of axes p that minimizes MSPE.


image

Figure 3. Determination of the number of axes.


④ Since information about the remaining axes is lost, PCA functions as a lossy compression algorithm.

⑹ Summary

① Comparison of prediction performance.


image

Figure 4. Prediction performance comparison.


3. Type 2.SNE, symmetric-SNE, tSNE

⑴ A dimension reduction algorithm that attempts to preserve local neighborhoods in the data.

⑵ Nonlinear and non-deterministic.

⑶ Requires knowledge of cost function, updating algorithm, etc., related to deep learning.



4. Type 3. UMAP



5. Type 4. Singular Value Decomposition (SVD)

⑴ A method that extracts singular values from an M × N dimensional matrix and reduces the dimensionality of the given data.

⑵ Formula : Let A be an m × n matrix with rank r.

① Then there exists an m × n matrix ∑ whose diagonal entries are the r nonzero singular values of A, σ1 ≥ σ2 ≥ ··· ≥ σr > 0


스크린샷 2024-10-22 오전 11 44 56


② and there exists an m × m orthogonal matrix U and an n × n orthogonal matrix V such that

○ The SVD is not unique since we can switch the sign of a column of U and the sign of the corresponding column of V.


스크린샷 2024-10-22 오전 11 34 47


③ Python code


# np.linalg.svd computes the SVD of a matrix
A = np.array([[4,11,14],
            [8,7,-2]])
U, s, Vt=np.linalg.svd(A)
print(U)
print(s) # s is the vector of nonzero singular values
print(Vt)


Application 1. ATA = V(∑T∑)VT : The singular values of A are the square roots of the eigenvalues of ATA. However, some eigenvalues are set to 0 according to the rank.

Application 2. reduced SVD: Note that for the reduced version, both U and V have orthonormal columns. This means that UTU = I and VTV = I. However, U and V are not square in this version, so they are not orthogonal matrices.


image


A = np.array([[4,11,14],
            [8,7,-2]])
U, s, Vt=np.linalg.svd(A, full_matrices=False)
print(U)
print(s)
print(Vt)


Application 3. rank-k approximation

① For k < rank(A), the m × n matrix A can be split into m × k + k × n matrices to achieve compression.


image


② Given k, how to find the optimal rank-k approximation of A:

○ A = UΣVT = σ1u1v1T + ··· + σrurvrT, where σ1 ≥ ⋯ ≥ σr, remove the smaller singular values σr, σr−1, ….

○ By doing this, the Frobenius distance between A(k) and A is defined as follows:


image


Application 4. PCA(principal component analysis)

① Center the data

② Xc is the centered data and C = (1/n) XcTXc

③ C is symmetric, so it is orthogonally diagonalizable as C = VDVT. V is the same as in Xc = U∑VT in SVD.



6. Type 5. Independent Component Analysis (ICA)

⑴ Unlike PCA, it reduces dimensionality by statistically separating multivariate signals into independent components.

⑵ The distribution of independent components follows a non-normal distribution.



7. Others

⑴ Factor analysis

① Definition : A method of interpreting the structure within the data by grouping similar variables based on their correlations.

② Assumes the existence of latent variables that cannot be observed in the data.

③ Often used in social sciences or surveys.

⑵ Multi-dimensional scaling (MDS)

① Definition : A statistical technique for visualizing proximity and grouping of objects in a lower-dimensional space by identifying patterns and structures latent in the data.

② Measures the similarities and dissimilarities between objects and represents them as points in a 2D or 3D space.

⑶ Linear discriminant analysis (LDA)

① Definition : A method that reduces dimensionality by projecting data onto a lower-dimensional space, similar to PCA.

⑷ CCA (Canonical-correlation analysis) : Used in the Seurat pipeline.

⑸ RPCA (Reciprocal principal component analysis) : Used in the Seurat pipeline.

⑹ PHATE

⑺ ODA

⑻ LLE

⑼ LSA

PLSA (probabilistic latent semantic analysis)

① Used to decompose mixed components from the latent class model.

② Application : Used for dimension reduction in mass spectrometry data.



Input: 2021.12.3 16:22

results matching ""

    No results matching ""