Chapter 7. Continuous Probability Distribution

Higher category: 【Statistics】 Statistics Overview

1. Uniform distribution

2. Normal distribution

3. Gamma distribution

4. Exponential distribution

5. Beta distribution

6. Pareto distribution

7. Logistic distribution

8. Dirichlet distribution

9. Gumbel model

1. Uniform distribution

⑴ definition: probability distribution with a constant probability for all random variables

⑵ probability density function: X ~ u[a, b], p(x) = 1 / (b - a) I｛a ≤ x ≤ b｝

Figure 1. graph of x - p(x) on X ~ u[1, 9]

① Python programming: Bokeh is used for web-page visualization

from bokeh.plotting import figure, output_file, show

output_file("uniform_distribution.html")
p = figure(width=400, height=400, title = "Uniform Distribution", 
           tooltips=[("x", "$x"), ("y", "$y")])
p.line([1, 2, 3, 4, 5, 6, 7, 8, 9], [1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8], 
       line_width=2)
show(p)

⑶ statistics

① moment generating function

② average: E(X) = (a + b) / 2

③ Variance: VAR(X) = (b - a)2 / 12

④ marginal probability distribution has the meaning of length ÷ total area

⑷ Example

① Example problems for uniform distribution

② Example problems for joint uniform distribution

2. Normal distribution

⑴ definition: the limit of _nC_x θ^x (1 - θ)^n-x by n → ∞

① as it is universally observed, it is called normal distribution

② generally, the standard normal distribution density function is expressed as φ(·) and the cumulative distribution function as Φ(·)

③ central limit theorem: if X = ∑X_i, taking n → ∞ will lead us to the normal distribution

④ first induced to approximate binomial distribution (De Moivre, 1721)

⑤ used to analyze model error in astronomy (Gaus, 1809)

○ by the fact, this is also known as Gaussian distribution

⑵ probability density function

Figure 2. probability density function of standard normal distribution

① Python programming: Bokeh is used for web-page visualization

# see https://stackoverflow.com/questions/10138085/how-to-plot-normal-distribution
import numpy as np
import scipy.stats as stats
from bokeh.plotting import figure, output_file, show

output_file("normal_distribution.html")
x = np.linspace(-3, 3, 100)
y = stats.norm.pdf(x, 0, 1)

p = figure(width=400, height=400, title = "Normal Distribution", 
           tooltips=[("x", "$x"), ("y", "$y")])
p.line(x, y, line_width=2)
show(p)

⑶ statistics

① moment generating function

② average: E(X) = μ

③ variance: VAR(X) = σ²

⑷ characteristic

① characteristic 1. symmetric around μ

② characteristic 2. if X ~ N(μ, σ²), Y = aX + b ~ N(aμ + b, a²σ²)

③ characteristic 3. if X_i ~ N(μ_i, σ_i²), X = ∑X_i ~ N(∑μ_i, ∑σ_i²)

④ characteristic 4. uncorrelatedness: if X and Y are jointly normal and uncorrelated, X and Y are independent

⑸ standard normal distribution

① definition: a normal distribution with a mean of 0 and a standard deviation of 1

② normalization: if X ~ N(μ, σ²), Z = (X - μ) / σ

③ cumulative distribution function Φ(z) of the standard normal distribution

④ z_α: z_α value is the value where the probability that X has a greater value than z_α is α

⑹ normal distribution table

Table 1. normal distribution table

⑺ Example

① Example problems for normal distribution

② Example problems for central limit theorem

⑻ Application 1. Log-Normal Distribution

① Definition: The distribution of a random variable whose logarithm follows a normal distribution. In other words, the random variable itself is an exponential function where the exponent is a normally distributed random variable.

② Mathematical Representation: If ln X ~ N(μ, σ²), then

○ E[X] = exp(μ + σ² / 2) (∵ derived from the moment-generating function)

○ E[X²] = exp(2μ + 2σ²) (∵ derived from the moment-generating function)

○ Var(X) = E[X²] - (E[X])²

○ The sample mean X̄ can be said to follow a normal distribution with a mean of exp(μ + σ² / 2) and a variance of Var(X) / n.

③ Example: In sequencing data, count values per sample/cell/spot often follow a log-normal distribution.

⑼ Application 2. Cauchy Distribution

① Definition: The ratio of two independent random variables X₁ and X₂ that follow a normal distribution.

⑽ Application 3. Rayleigh Distribution

① Definition: The instantaneous value of the envelope of a mean zero, narrowband noise signal.

② If X and Y are independent random variables following N(0, σ²), then (X² + Y²)^1/2 follows Rayleigh(σ²).

③ Mathematical formulation

○ Probability density function

○ Cumulative distribution funciton

○ Mean and variance

3. Gamma distribution

⑴ gamma function

① definition 1. for x ＞ 0,

② definition 2.

③ characteristic

○ Γ(-3/2) = 4/3 √π

○ Γ(-1/2) = -2 √π

○ Γ(1/2) = √π

○ Γ(1) = 1

○ Γ(3/2) = 1/2 √π

○ Γ(a + 1) = aΓ(a)

○ Γ(n + 1) = n!

⑵ gamma distribution

① probability density function: for x, r, λ ＞ 0,

Figure 3. probability density function of gamma distribution

○ Python programming: Bokeh is used for web-page visualization

# see https://www.statology.org/gamma-distribution-in-python/

import numpy as np
import scipy.stats as stats
from bokeh.plotting import figure, output_file, show

output_file("gamma_distribution.html")
x = np.linspace(0, 40, 100)
y1 = stats.gamma.pdf(x, a = 5, scale = 3)
y2 = stats.gamma.pdf(x, a = 2, scale = 5)
y3 = stats.gamma.pdf(x, a = 4, scale = 2)

p = figure(width=400, height=400, title = "Normal Distribution", 
           tooltips=[("x", "$x"), ("y", "$y")])
p.line(x, y1, line_width=2, color = 'red', legend_label = 'shape=5, scale=3')
p.line(x, y2, line_width=2, color = 'green', legend_label = 'shape=2, scale=5')
p.line(x, y3, line_width=2, color = 'blue', legend_label = 'shape=4, scale=2')

show(p)

② meaning

○ the probability distribution of time until the r-th event occurs

○ r (shape parameter)

○ λ (rate parameter): the average number of events per unit period

○ β (scale paramete): β = 1 / λ

⑶ statistics

① moment generating function

② average: E(X) = r / λ

③ variance: VAR(X) = r / λ²

⑷ relationship with different probability distributions

① binomial distribution

② negative binomial distribution

③ beta distribution

④ Chi-squared distribution: When λ = 1/2 and r = ν/2, a chi-squared distribution with ν degrees of freedom is obtained.

4. Exponential distribution

⑴ Overview

① A probability distribution that measures the time elapsed from a designated point until a certain event occurs.

○ In other words, the duration until the first occurrence of the event.

○ Derivation: For an event that occurs λ times per unit time,

② Special case with α = 1 in gamma distribution

③ meaning of parameter

○ β (survival parameter

○ λ (rate parameter): average number of events per unit period

④ Poisson distribution: duration is fixed. number of events is the random variable

⑵ probability density function: for x ＞ 0,

Figure 4. probability density function of exponential distribution

① Python programming: Bokeh is used for web-page visualization

# see https://www.alphacodingskills.com/scipy/scipy-exponential-distribution.php

import numpy as np
from scipy.stats import expon
from bokeh.plotting import figure, output_file, show

output_file("exponential_distribution.html")
x = np.arange(-1, 10, 0.1)
y = expon.pdf(x, 0, 2)

p = figure(width=400, height=400, title = "Exponential Distribution", 
           tooltips=[("x", "$x"), ("y", "$y")])
p.line(x, y, line_width=2, legend_label = 'loc=0, scale=2')

show(p)

⑶ statistics

① moment generating function

② average: E(X) = 1 / λ

○ meaning: intuitively, 1 / λ can be seen

③ variance: VAR(X) = 1 / λ²

⑷ memorylessness

① definition

② example: when battery life time follows exponential distribution, existing usage time doesn’t affect the remaining life time

⑸ Example

① Example problems for exponential distribution

5. Beta distribution

⑴ beta function: for α, β ＞ 0,

⑵ beta distribution

Figure 5. probability density function of beta distribution

① Python programming: Bokeh is used for web-page visualization

# see https://vitalflux.com/beta-distribution-explained-with-python-examples/
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import beta
from bokeh.plotting import figure, output_file, show

output_file("beta_distribution.html")
x = np.linspace(0, 1, 100)
y1 = beta.pdf(x, 2, 8)
y2 = beta.pdf(x, 5, 5)
y3 = beta.pdf(x, 8, 2)

p = figure(width=400, height=400, title = "Beta Distribution", 
           tooltips=[("x", "$x"), ("y", "$y")])
p.line(x, y1, line_width=2, color = 'red', legend_label = 'a=2, b=8')
p.line(x, y2, line_width=2, color = 'green', legend_label = 'a=5, b=5')
p.line(x, y3, line_width=2, color = 'blue', legend_label = 'a=8, b=2')

show(p)

② E(X) = α ÷ (α + β)

③ VAR(X) = αβ ÷ ((α + β)²(α + β + 1))

⑵ relationship with gamma function

⑶ characteristic

① commutative law: B(α, β) = B(β, α)

② equivalent expression

③ Beta Binomial Distribution

○ The distribution of the number of successes when an event with a beta distribution is repeated several times

○ The beta binomial distribution has greater variance than the binomial distribution

⑷ generalized beta distribution

6. Pareto distribution

⑴ simple Pareto distribution

① probability density function: for shape parameter a,

Figure 6. probability density function of simple Pareto distribution

○ Python programming: Bokeh is used for web-page visualization

# see https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.pareto.html

import matplotlib.pyplot as plt
from scipy.stats import pareto
from bokeh.plotting import figure, output_file, show

output_file("pareto_distribution.html")
x = np.linspace(1, 10, 100)
y1 = pareto.pdf(x, 1)
y2 = pareto.pdf(x, 2)
y3 = pareto.pdf(x, 3)

p = figure(width=400, height=400, title = "Pareto Distribution", 
           tooltips=[("x", "$x"), ("y", "$y")])
p.line(x, y1, line_width=2, color = 'red', legend_label = 'a=1')
p.line(x, y2, line_width=2, color = 'green', legend_label = 'a=2')
p.line(x, y3, line_width=2, color = 'blue', legend_label = 'a=3')

show(p)

② probability distribution function

⑵ generalized Pareto distribution

① probability density function: for scale parameter b,

② probability distribution function

7. Logistic distribution

⑴ simple logistic distribution

① probability density function

Figure 7. simple logistic distribution

○ Python programming: Bokeh is used for web-page visualization

# see https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.logistic.html

import matplotlib.pyplot as plt
from scipy.stats import logistic
from bokeh.plotting import figure, output_file, show

output_file("logistic_distribution.html")
x = np.linspace(1, 10, 100)
y = logistic.pdf(x)

p = figure(width=400, height=400, title = "Logistic Distribution", 
           tooltips=[("x", "$x"), ("y", "$y")])
p.line(x, y, line_width=2)

show(p)

⑵ generalized logistic distribution

① probability density function

8. Dirichlet distribution

⑴ Overview

① A multivariate extension of the beta distribution, where each random variable always takes a value between 0 and 1, and their sum must be 1.

② Due to the constraint that the sum of the proportions in the Dirichlet distribution is fixed at 1, optimization using this distribution is somewhat more complex than with other distributions.

③ It is notable for its ability to analyze a simplex.

⑵ Probability density function: For x = (x₁, ···, x_D) and positive parameters (λ₁, ···, λ_D)

Figure 8. Dirichlet distribution

⑶ Dirichlet-Multinomial conjugacy

9. Gumbel model

⑴ Gumbel-Softmax

① Let z be a categorical variable with class probabilities π₁, π₂, ···, π_k.

○ e.g., π = [0.2, 0.3, 0.5]

② Categorical samples are encoded as k-dimensional one-hot vectors lying on the (k−1)-dimensional simplex, ∆^𝑘−1.

○ Reason: Since the sum of all probabilities is 1, the degrees of freedom are reduced by 1.

○ e.g., Class 1, 2, and 3 correspond to [1, 0, 0], [0, 1, 0], and [0, 0, 1], respectively.

③ Gumbel-Softmax uses softmax to produce continuous outputs, but as 𝜏 approaches 0, the Gumbel-Softmax output eventually becomes the same as argmax, resulting in a one-hot vector.

○ The original x_i = log ⁡π_i cannot be reconstructed from 𝑦 after passing through the softmax function due to insufficient information, making the inverse transformation impossible.

○ To compensate for this, we define an equivalent sampling process that subtracts off the last element, (x_k + g_k) ∕ 𝜏 before the softmax:

⑵ Gumbel Model

① The probability density of a Gumbel distribution with scale β = 1 and mean μ at z is

② We first derive the density for the “centered” multivariate Gumbel density:

③ We can now compute the density of this distribution by marginalizing g_k:

⑶ Categorical Reparameterization with Gumbel-Softmax

① Given samples u₁, ···, u_k-1 from the centered Gumbel distribution, we can apply a deterministic transformation ℎ to yield the first k−1 coordinates of the sample from the Gumbel-Softmax:

② The primary contribution of this work is the reparameterization Gumbel-Softmax distribution, whose corresponding estimator affords low-variance path derivative gradients for the categorical distribution.

③ For learning, there is a tradeoff between small temperatures, where samples are close to one-hot but the variance of the gradients is large, and large temperatures, where samples are smooth but the variance of the gradients is small. In practice, we start at a high temperature and anneal to a small but non-zero temperature.

④ Gumbel-Softmax allows us to backpropagate through y ~ q_𝜙(𝑦 │ 𝑥) for single sample gradient estimation, and achieves a cost of 𝒪(𝐷+𝐼+𝐺) per training step (dramatic speedup), where 𝐷, 𝐼, 𝐺 are the computational cost of sampling from q_𝜙(𝑦 │ 𝑥), q_𝜙(𝑧 │ 𝑥, 𝑦), and p_𝜙(𝑥 │ 𝑦, 𝑧).

⑤ Gumbel-Softmax and ST Gumbel-Softmax outperform existing stochastic gradient estimators: Score-Function (SF), DARN, MuProp, Straight-Through (ST), and Slope-Annealed ST.

Input : 2019.06.19 00:27

1627

Chapter 7. Continuous Probability Distribution

1. Uniform distribution

2. Normal distribution

3. Gamma distribution

4. Exponential distribution

5. Beta distribution

6. Pareto distribution

7. Logistic distribution

8. Dirichlet distribution

9. Gumbel model

results matching ""

No results matching ""