Korean, Edit

Chapter 7-1. SNE, symmetric-SNE, tSNE

Recommended post: 【Algorithm】 Chapter 7. Dimensionality Reduction Algorithms


1. Overview

2. Type 1. SNE

3. Type 2. Symmetric-SNE

4. Type 3. t-SNE



1. Overview

⑴ Terminology

① Manifold: A shape in a high-dimensional space that can, in practice, be represented in a lower dimension

② Data point: Also called a high-dimensional data point


스크린샷 2026-03-01 오전 11 08 56


③ Map point: Also called a low-dimensional data point


스크린샷 2026-03-01 오전 11 09 18


⑵ Characteristics

① Introduces the concept of probability to solve the drawbacks of MDS (multidimensional scaling)

Characteristic 1. A dimensionality reduction algorithm that tries to preserve local neighborhoods in data: A local metric that does not consider extremely far data points

Characteristic 2. Nonlinear and non-deterministic



2. Type 1. SNE

⑴ Definition: A method that converts Euclidean distances in high-dimensional space into conditional probabilities that represent similarity between data points

① Euclidean distance: Distance using the Pythagorean theorem

② Mapping: A mapping that corresponds high-dimensional data points to low-dimensional data points


스크린샷 2026-03-01 오전 11 10 04


⑵ Conditional probabilities expressing similarity of high-dimensional data


스크린샷 2026-03-01 오전 11 10 40


① $x_i$: High-dimensional data point

② $x_j$: High-dimensional data point

③ $\sigma_i$: Standard deviation of the Gaussian distribution centered at xi

⑶ Conditional probabilities expressing similarity of low-dimensional data


스크린샷 2026-03-01 오전 11 10 59


① Means clustering so that $\sigma_y = 0.707106781$

Main assumption: If the mapping is good, it is as follows


스크린샷 2026-03-01 오전 11 11 28


KL divergence


스크린샷 2026-03-01 오전 11 11 51


② Cost function: Sum of KL divergences over all data points


스크린샷 2026-03-01 오전 11 12 12


③ Strategy: Update Q in the direction that minimizes the cost function

⑸ Perplexity


스크린샷 2026-03-01 오전 11 12 50


① Perp(pi): A measure of the effective number of neighbors of data point xi


스크린샷 2026-03-01 오전 11 13 19

Figure 1. SNE shape according to Perp]

(Note that the above example is t-SNE)


② In general, perplexity is set between 5 and 50

③ Experimentally observed to have a range of 1 ≤ perplexity ≤ 99

④ Set $\sigma_i$ differently for each $i$ so as to have a constant perplexity

○ As σ increases, perplexity decreases (a decreasing function)

○ Define σi using the bisection method

○ 1st. Set LEFT and RIGHT values

○ 2nd. MID ← (LEFT + RIGHT) / 2

○ 3rd. $\sigma_i$ ← MID

○ 4th. If Perp($\sigma_i$) < Perp, then LEFT ← MID; move to 2nd

○ 5th. If Perp($\sigma_i$) > Perp, then RIGHT ← MID; move to 2nd

○ 6th. After repeating N times, define the MID value as $\sigma_i$

⑹ Rate of change of the cost function


스크린샷 2026-03-01 오전 11 15 05


① $p_{j \mid i} - q_{j \mid i} + p_{i \mid j} - q_{i \mid j}$: Mismatch between similarity of data points and similarity of map points

② $y_i - y_j$: Distance on the map

⑺ Updating algorithm


스크린샷 2026-03-01 오전 11 16 26


① Learning rate η: Generally around 100

② Momentum α: Introduce momentum to avoid falling into a local minimum. Around 0.5 in the early stage, around 0.8 in the later stage

③ Iteration number t: Generally around 1000

Application: Setting $\sigma_i$ so that Perp(pi) is constant in SNE

① Definition


스크린샷 2026-03-01 오전 11 17 08


② Updating algorithm


스크린샷 2026-03-01 오전 11 17 59


③ Conclusion: Since s differs on a log scale, a constant learning rate should not be applied


스크린샷 2026-03-01 오전 11 18 22



3. Type 2. Symmetric-SNE

⑴ Overview

① Problems of SNE: Optimization is difficult; the crowding problem occurs

② t-SNE: When computing similarity between two points, uses the Student t-distribution rather than a normal distribution

③ symmetric-SNE: An improved version of t-SNE

⑵ Joint probability expressing similarity of high-dimensional data


스크린샷 2026-03-01 오전 11 18 44


⑶ Joint probability expressing similarity of low-dimensional data


스크린샷 2026-03-01 오전 11 18 58


⑷ Cost function


스크린샷 2026-03-01 오전 11 19 25


① SNE strategy: Minimize the KL divergence between conditional probability distributions p j i and q j i

② Symmetric-SNE strategy: Minimize the KL divergence between joint probability distributions p ji and q ji

⑸ Rate of change of the cost function


스크린샷 2026-03-01 오전 11 19 47


4. Type 3. t-SNE

⑴ Overview

① t-SNE has better expressive power than SNE (ref)

② References: (ref1, ref2, ref3)

⑵ Joint probability expressing similarity of high-dimensional data: Convert to Gaussian-distribution similarity


스크린샷 2026-03-01 오전 11 20 27


⑶ Joint probability expressing similarity of low-dimensional data: Convert to t-distribution similarity with 1 degree of freedom


스크린샷 2026-03-01 오전 11 20 48


⑷ Cost function


스크린샷 2026-03-01 오전 11 21 10


⑸ Rate of change of the cost function


스크린샷 2026-03-01 오전 11 21 46


⑹ Updating algorithm


스크린샷 2026-03-01 오전 11 22 15


① Learning rate η: Generally around 100

② Momentum α: Introduce momentum to avoid falling into a local minimum. Around 0.5 in the early stage, around 0.8 in the later stage

③ Iteration number t: Generally around 1000

④ Note: In the paper, there is a (+) in front of η, but I think it should be (-) when η > 0

R programming


install.packages("Rtsne")
library(Rtsne)

# data consists of only numbers; no rownames, no colnames
data <- read.csv("C:/Users/data.csv")

tsne <- Rtsne(data, dims = 2, perplexity = 30, verbose = TRUE, max_iter = 500)
# Performing PCA
# Read the 14893 x 50 data matrix successfully!
# OpenMP is working. 1 threads.
# Using no_dims = 2, perplexity = 30.000000, and theta = 0.500000
# Computing input similarities...
# Building tree...
# - point 10000 of 14893
# Done in 5.06 seconds (sparsity = 0.007533)!
# Learning embedding...
# Iteration 50: error is 103.272821 (50 iterations in 4.11 seconds)
# Iteration 100: error is 88.450651 (50 iterations in 4.85 seconds)
# Iteration 150: error is 79.372908 (50 iterations in 3.18 seconds)
# Iteration 200: error is 76.927413 (50 iterations in 3.10 seconds)
# Iteration 250: error is 75.783877 (50 iterations in 3.02 seconds)
# Iteration 300: error is 2.946416 (50 iterations in 2.96 seconds)
# Iteration 350: error is 2.515109 (50 iterations in 2.84 seconds)
# Iteration 400: error is 2.244128 (50 iterations in 2.88 seconds)
# Iteration 450: error is 2.053382 (50 iterations in 2.86 seconds)
# Iteration 500: error is 1.912981 (50 iterations in 2.90 seconds)
# Fitting performed in 32.67 seconds.

exeTimeTsne<- system.time(Rtsne(data, dims = 2, perplexity=30, verbose=TRUE, max_iter = 500))

plot(tsne$Y, t = 'n', xlab = "Y1", ylab = "Y2", main = "tsne")
text(tsne$Y, labels = "○", col = "light blue")

write.csv(tsne$Y, "C:/Users/map.csv")


① Rtsne cannot allocate vectors larger than 2.2 Gb (e.g. a 10000 × 10000 matrix)

② Processing a 1000 × 10000 matrix takes 1 minute 30 seconds

③ ERROR message: “Remove duplicates before running TSNE”

○ Solution: Rtsne(data, dims = 2, perplexity = 30, verbose = TRUE, max_iter = 500, check_duplicates = FALSE)



Input: 2019.10.05 17:32

results matching ""

    No results matching ""