Korean, Edit

Chapter 7. Continuous Probability Distribution

Higher category: 【Statistics】 Statistics Overview


1. Uniform distribution

2. Normal distribution 

3. Gamma distribution 

4. Exponential distribution

5. Beta distribution 

6. Pareto distribution 

7. Logistic distribution

8. Dirichlet distribution

9. Gumbel model



image


1. Uniform distribution

⑴ definition: probability distribution with a constant probability for all random variables

⑵ probability density function: X ~ u[a, b], p(x) = 1 / (b - a) I{a ≤ x ≤ b} 


스크린샷 2025-07-15 오후 3 24 43


Figure 1. graph of x - p(x) on X ~ u[1, 9]


① Python programming: Bokeh is used for web-page visualization 


from bokeh.plotting import figure, output_file, show

output_file("uniform_distribution.html")
p = figure(width=400, height=400, title = "Uniform Distribution", 
           tooltips=[("x", "$x"), ("y", "$y")])
p.line([1, 2, 3, 4, 5, 6, 7, 8, 9], [1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8], 
       line_width=2)
show(p)


⑶ statistics

① moment generating function


image


② average: E(X) = (a + b) / 2


image


③ Variance: VAR(X) = (b - a)2 / 12


image


④ marginal probability distribution has the meaning of length ÷ total area

⑷ Example

Example problems for uniform distribution

Example problems for joint uniform distribution



2. Normal distribution 

⑴ definition: the limit of nCx θx (1 - θ)n-x by n → ∞

① as it is universally observed, it is called normal distribution

② generally, the standard normal distribution density function is expressed as φ(·) and the cumulative distribution function as Φ(·)

③ central limit theorem: if X = ∑Xi, taking n → ∞ will lead us to the normal distribution

④ first induced to approximate binomial distribution (De Moivre, 1721)

⑤ used to analyze model error in astronomy (Gaus, 1809)

○ by the fact, this is also known as Gaussian distribution

⑵ probability density function


image


스크린샷 2025-07-15 오후 3 25 10


Figure 2. probability density function of standard normal distribution


① Python programming: Bokeh is used for web-page visualization 


# see https://stackoverflow.com/questions/10138085/how-to-plot-normal-distribution
import numpy as np
import scipy.stats as stats
from bokeh.plotting import figure, output_file, show

output_file("normal_distribution.html")
x = np.linspace(-3, 3, 100)
y = stats.norm.pdf(x, 0, 1)

p = figure(width=400, height=400, title = "Normal Distribution", 
           tooltips=[("x", "$x"), ("y", "$y")])
p.line(x, y, line_width=2)
show(p)


⑶ statistics

① moment generating function


image


② average: E(X) = μ


image


③ variance: VAR(X) = σ2


image


⑷ characteristic

characteristic 1. symmetric around μ

characteristic 2. if X ~ N(μ, σ2), Y = aX + b ~ N(aμ + b, a2σ2)


image


characteristic 3. if Xi ~ N(μi, σi2), X = ∑Xi ~ N(∑μi, ∑σi2)

characteristic 4. uncorrelatedness: if X and Y are jointly normal and uncorrelated, X and Y are independent

⑸ standard normal distribution 

① definition: a normal distribution with a mean of 0 and a standard deviation of 1

② normalization: if X ~ N(μ, σ2), Z = (X - μ) / σ

③ cumulative distribution function Φ(z) of the standard normal distribution 


image


④ zα: zα value is the value where the probability that X has a greater value than zα is α

⑹ normal distribution table 


image

Table 1. normal distribution table


⑺ Example

Example problems for normal distribution

Example problems for central limit theorem

Application 1. Log-Normal Distribution

① Definition: The distribution of a random variable whose logarithm follows a normal distribution. In other words, the random variable itself is an exponential function where the exponent is a normally distributed random variable.

② Mathematical Representation: If ln X ~ N(μ, σ2), then

○ E[X] = exp(μ + σ2 / 2) (∵ derived from the moment-generating function)

○ E[X2] = exp(2μ + 2σ2) (∵ derived from the moment-generating function)

○ Var(X) = E[X2] - (E[X])2

○ The sample mean X̄ can be said to follow a normal distribution with a mean of exp(μ + σ2 / 2) and a variance of Var(X) / n.

③ Example: In sequencing data, count values per sample/cell/spot often follow a log-normal distribution.

Application 2. Cauchy Distribution

① Definition: The ratio of two independent random variables X1 and X2 that follow a normal distribution.

Application 3. Rayleigh Distribution

① Definition: The instantaneous value of the envelope of a mean zero, narrowband noise signal.

② If X and Y are independent random variables following N(0, σ2), then (X2 + Y2)1/2 follows Rayleigh(σ2).

③ Mathematical formulation

○ Probability density function


스크린샷 2025-03-03 오전 8 19 13


○ Cumulative distribution funciton


스크린샷 2025-03-03 오전 8 19 32


○ Mean and variance


스크린샷 2025-03-03 오전 8 19 50



3. Gamma distribution 

⑴ gamma function

definition 1. for x > 0, 


image


definition 2. 


image


③ characteristic

○ Γ(-3/2) = 4/3 √π

○ Γ(-1/2) = -2 √π 

○ Γ(1/2) = √π 

○ Γ(1) = 1

○ Γ(3/2) = 1/2 √π

○ Γ(a + 1) = aΓ(a)

○ Γ(n + 1) = n! 

⑵ gamma distribution

① probability density function: for x, r, λ > 0, 


image


스크린샷 2025-07-15 오후 3 25 50


Figure 3. probability density function of gamma distribution


○ Python programming: Bokeh is used for web-page visualization 


# see https://www.statology.org/gamma-distribution-in-python/

import numpy as np
import scipy.stats as stats
from bokeh.plotting import figure, output_file, show

output_file("gamma_distribution.html")
x = np.linspace(0, 40, 100)
y1 = stats.gamma.pdf(x, a = 5, scale = 3)
y2 = stats.gamma.pdf(x, a = 2, scale = 5)
y3 = stats.gamma.pdf(x, a = 4, scale = 2)

p = figure(width=400, height=400, title = "Normal Distribution", 
           tooltips=[("x", "$x"), ("y", "$y")])
p.line(x, y1, line_width=2, color = 'red', legend_label = 'shape=5, scale=3')
p.line(x, y2, line_width=2, color = 'green', legend_label = 'shape=2, scale=5')
p.line(x, y3, line_width=2, color = 'blue', legend_label = 'shape=4, scale=2')

show(p)


② meaning

○ the probability distribution of time until the r-th event occurs

○ r (shape parameter)

○ λ (rate parameter): the average number of events per unit period

○ β (scale paramete): β = 1 / λ

⑶ statistics

① moment generating function


image


② average: E(X) = r / λ 


image


③ variance: VAR(X) = r / λ2


image


⑷ relationship with different probability distributions

①  binomial distribution


image


② negative binomial distribution 


image


③ beta distribution


image


④ Chi-squared distribution: When λ = 1/2 and r = ν/2, a chi-squared distribution with ν degrees of freedom is obtained.


스크린샷 2025-04-25 오후 6 52 06



4. Exponential distribution

⑴ Overview

① A probability distribution that measures the time elapsed from a designated point until a certain event occurs.

○ In other words, the duration until the first occurrence of the event.

○ Derivation: For an event that occurs λ times per unit time,


스크린샷 2025-04-25 오후 6 53 14


② Special case with α = 1 in gamma distribution

③ meaning of parameter

○ β (survival parameter

○ λ (rate parameter): average number of events per unit period

Poisson distribution: duration is fixed. number of events is the random variable

⑵ probability density function: for x > 0, 


image


스크린샷 2025-07-15 오후 3 26 11


Figure 4. probability density function of exponential distribution


① Python programming: Bokeh is used for web-page visualization 


# see https://www.alphacodingskills.com/scipy/scipy-exponential-distribution.php

import numpy as np
from scipy.stats import expon
from bokeh.plotting import figure, output_file, show

output_file("exponential_distribution.html")
x = np.arange(-1, 10, 0.1)
y = expon.pdf(x, 0, 2)

p = figure(width=400, height=400, title = "Exponential Distribution", 
           tooltips=[("x", "$x"), ("y", "$y")])
p.line(x, y, line_width=2, legend_label = 'loc=0, scale=2')

show(p)


⑶ statistics

① moment generating function


image


② average: E(X) = 1 / λ

○ meaning: intuitively, 1 / λ can be seen


image


③ variance: VAR(X) = 1 / λ2


image


⑷ memorylessness

① definition


image


② example: when battery life time follows exponential distribution, existing usage time doesn’t affect the remaining life time


⑸ Example

Example problems for exponential distribution



5. Beta distribution 

⑴ beta function: for α, β > 0, 


image


⑵ beta distribution


drawing

스크린샷 2025-07-15 오후 3 26 41


Figure 5. probability density function of beta distribution


① Python programming: Bokeh is used for web-page visualization 


# see https://vitalflux.com/beta-distribution-explained-with-python-examples/
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import beta
from bokeh.plotting import figure, output_file, show

output_file("beta_distribution.html")
x = np.linspace(0, 1, 100)
y1 = beta.pdf(x, 2, 8)
y2 = beta.pdf(x, 5, 5)
y3 = beta.pdf(x, 8, 2)

p = figure(width=400, height=400, title = "Beta Distribution", 
           tooltips=[("x", "$x"), ("y", "$y")])
p.line(x, y1, line_width=2, color = 'red', legend_label = 'a=2, b=8')
p.line(x, y2, line_width=2, color = 'green', legend_label = 'a=5, b=5')
p.line(x, y3, line_width=2, color = 'blue', legend_label = 'a=8, b=2')

show(p)


② E(X) = α ÷ (α + β) 

③ VAR(X) = αβ ÷ ((α + β)2(α + β + 1))

⑵ relationship with gamma function


drawing

⑶ characteristic

① commutative law: B(α, β) = B(β, α) 

② equivalent expression


drawing

③ Beta Binomial Distribution

○ The distribution of the number of successes when an event with a beta distribution is repeated several times

○ The beta binomial distribution has greater variance than the binomial distribution

⑷ generalized beta distribution



6. Pareto distribution

⑴ simple Pareto distribution

① probability density function: for shape parameter a, 


drawing

스크린샷 2025-07-15 오후 3 27 05


Figure 6. probability density function of simple Pareto distribution


○ Python programming: Bokeh is used for web-page visualization   


# see https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.pareto.html

import matplotlib.pyplot as plt
from scipy.stats import pareto
from bokeh.plotting import figure, output_file, show

output_file("pareto_distribution.html")
x = np.linspace(1, 10, 100)
y1 = pareto.pdf(x, 1)
y2 = pareto.pdf(x, 2)
y3 = pareto.pdf(x, 3)

p = figure(width=400, height=400, title = "Pareto Distribution", 
           tooltips=[("x", "$x"), ("y", "$y")])
p.line(x, y1, line_width=2, color = 'red', legend_label = 'a=1')
p.line(x, y2, line_width=2, color = 'green', legend_label = 'a=2')
p.line(x, y3, line_width=2, color = 'blue', legend_label = 'a=3')

show(p)


② probability distribution function


drawing

⑵ generalized Pareto distribution

① probability density function: for scale parameter b,


drawing

② probability distribution function 


drawing


7. Logistic distribution

⑴ simple logistic distribution

① probability density function


drawing

스크린샷 2025-07-15 오후 3 27 25


Figure 7. simple logistic distribution


○ Python programming: Bokeh is used for web-page visualization 


# see https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.logistic.html

import matplotlib.pyplot as plt
from scipy.stats import logistic
from bokeh.plotting import figure, output_file, show

output_file("logistic_distribution.html")
x = np.linspace(1, 10, 100)
y = logistic.pdf(x)

p = figure(width=400, height=400, title = "Logistic Distribution", 
           tooltips=[("x", "$x"), ("y", "$y")])
p.line(x, y, line_width=2)

show(p)


⑵ generalized logistic distribution

① probability density function


drawing


8. Dirichlet distribution

⑴ Overview

① A multivariate extension of the beta distribution, where each random variable always takes a value between 0 and 1, and their sum must be 1.

② Due to the constraint that the sum of the proportions in the Dirichlet distribution is fixed at 1, optimization using this distribution is somewhat more complex than with other distributions.

③ It is notable for its ability to analyze a simplex.

⑵ Probability density function: For x = (x1, ···, xD) and positive parameters (λ1, ···, λD)


drawing

drawing

Figure 8. Dirichlet distribution


⑶ Dirichlet-Multinomial conjugacy


스크린샷 2025-07-15 오후 2 59 47



9. Gumbel model

⑴ Gumbel-Softmax

① Let z be a categorical variable with class probabilities π1, π2, ···, πk.

○ e.g., π = [0.2, 0.3, 0.5]

② Categorical samples are encoded as k-dimensional one-hot vectors lying on the (k−1)-dimensional simplex, ∆𝑘−1.

○ Reason: Since the sum of all probabilities is 1, the degrees of freedom are reduced by 1.

○ e.g., Class 1, 2, and 3 correspond to [1, 0, 0], [0, 1, 0], and [0, 0, 1], respectively.

③ Gumbel-Softmax uses softmax to produce continuous outputs, but as 𝜏 approaches 0, the Gumbel-Softmax output eventually becomes the same as argmax, resulting in a one-hot vector.


스크린샷 2025-07-15 오후 3 03 38


○ The original xi = log ⁡πi cannot be reconstructed from 𝑦 after passing through the softmax function due to insufficient information, making the inverse transformation impossible.

○ To compensate for this, we define an equivalent sampling process that subtracts off the last element, (xk + gk) ∕ 𝜏 before the softmax:


스크린샷 2025-07-15 오후 3 05 02


⑵ Gumbel Model

① The probability density of a Gumbel distribution with scale β = 1 and mean μ at z is


스크린샷 2025-07-15 오후 3 05 59


② We first derive the density for the “centered” multivariate Gumbel density:


스크린샷 2025-07-15 오후 3 06 49


③ We can now compute the density of this distribution by marginalizing gk:


image


Categorical Reparameterization with Gumbel-Softmax

① Given samples u1, ···, uk-1 from the centered Gumbel distribution, we can apply a deterministic transformation ℎ to yield the first k−1 coordinates of the sample from the Gumbel-Softmax:


image


② The primary contribution of this work is the reparameterization Gumbel-Softmax distribution, whose corresponding estimator affords low-variance path derivative gradients for the categorical distribution.

③ For learning, there is a tradeoff between small temperatures, where samples are close to one-hot but the variance of the gradients is large, and large temperatures, where samples are smooth but the variance of the gradients is small. In practice, we start at a high temperature and anneal to a small but non-zero temperature.

④ Gumbel-Softmax allows us to backpropagate through y ~ q𝜙(𝑦 𝑥) for single sample gradient estimation, and achieves a cost of 𝒪(𝐷+𝐼+𝐺) per training step (dramatic speedup), where 𝐷, 𝐼, 𝐺 are the computational cost of sampling from q𝜙(𝑦 𝑥), q𝜙(𝑧 𝑥, 𝑦), and p𝜙(𝑥 𝑦, 𝑧).

⑤ Gumbel-Softmax and ST Gumbel-Softmax outperform existing stochastic gradient estimators: Score-Function (SF), DARN, MuProp, Straight-Through (ST), and Slope-Annealed ST.



Input : 2019.06.19 00:27

results matching ""

    No results matching ""