Chapter 7. Continuous Probability Distribution
Higher category: 【Statistics】 Statistics Overview
a. Q-Q plot
1. uniform distribution
⑴ definition: probability distribution with a constant probability for all random variables
⑵ probability density function: X ~ u[a, b], p(x) = 1 / (b - a) I{a ≤ x ≤ b}
Figure 1. graph of x - p(x) on X ~ u[1, 9]
① Python programming: Bokeh is used for web-page visualization
from bokeh.plotting import figure, output_file, show
output_file("uniform_distribution.html")
p = figure(width=400, height=400, title = "Uniform Distribution",
tooltips=[("x", "$x"), ("y", "$y")])
p.line([1, 2, 3, 4, 5, 6, 7, 8, 9], [1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8],
line_width=2)
show(p)
⑶ statistics
① moment generating function
② average: E(X) = (a + b) / 2
③ Variance: VAR(X) = (b - a)2 / 12
④ marginal probability distribution has the meaning of length ÷ total area
⑷ Example
2. Normal distribution
⑴ definition: the limit of nCx θx (1 - θ)n-x by n → ∞
① as it is universally observed, it is called normal distribution
② generally, the standard normal distribution density function is expressed as φ(·) and the cumulative distribution function as Φ(·)
③ central limit theorem: if X = ∑Xi, taking n → ∞ will lead us to the normal distribution
④ first induced to approximate binomial distribution (De Moivre, 1721)
⑤ used to analyze model error in astronomy (Gaus, 1809)
○ by the fact, this is also known as Gaussian distribution
⑵ probability density function
Figure 2. probability density function of standard normal distribution
① Python programming: Bokeh is used for web-page visualization
# see https://stackoverflow.com/questions/10138085/how-to-plot-normal-distribution
import numpy as np
import scipy.stats as stats
from bokeh.plotting import figure, output_file, show
output_file("normal_distribution.html")
x = np.linspace(-3, 3, 100)
y = stats.norm.pdf(x, 0, 1)
p = figure(width=400, height=400, title = "Normal Distribution",
tooltips=[("x", "$x"), ("y", "$y")])
p.line(x, y, line_width=2)
show(p)
⑶ statistics
① moment generating function
② average: E(X) = μ
③ variance: VAR(X) = σ2
⑷ characteristic
① characteristic 1. symmetric around μ
② characteristic 2. if X ~ N(μ, σ2), Y = aX + b ~ N(aμ + b, a2σ2)
③ characteristic 3. if Xi ~ N(μi, σi2), X = ∑Xi ~ N(∑μi, ∑σi2)
④ characteristic 4. uncorrelatedness: if X and Y are jointly normal and uncorrelated, X and Y are independent
⑸ standard normal distribution
① definition: a normal distribution with a mean of 0 and a standard deviation of 1
② normalization: if X ~ N(μ, σ2), Z = (X - μ) / σ
③ cumulative distribution function Φ(z) of the standard normal distribution
④ zα: zα value is the value where the probability that X has a greater value than zα is α
⑹ normal distribution table
Table 1. normal distribution table
⑺ Example
⑻ Application 1. Log-Normal Distribution
① Definition: The distribution of a random variable whose logarithm follows a normal distribution. In other words, the random variable itself is an exponential function where the exponent is a normally distributed random variable.
② Mathematical Representation: If ln X ~ N(μ, σ2), then
○ E[X] = exp(μ + σ2 / 2) (∵ derived from the moment-generating function)
○ E[X2] = exp(2μ + 2σ2) (∵ derived from the moment-generating function)
○ Var(X) = E[X2] - (E[X])2
○ The sample mean X̄ can be said to follow a normal distribution with a mean of exp(μ + σ2 / 2) and a variance of Var(X) / n.
③ Example: In sequencing data, count values per sample/cell/spot often follow a log-normal distribution.
⑼ Application 2. Cauchy Distribution
① Definition: The ratio of two independent random variables ( X_1 ) and ( X_2 ) that follow a normal distribution.
3. gamma distribution
⑴ gamma function
① definition 1. for x > 0,
② definition 2.
③ characteristic
○ Γ(-3/2) = 4/3 √π
○ Γ(-1/2) = -2 √π
○ Γ(1/2) = √π
○ Γ(1) = 1
○ Γ(3/2) = 1/2 √π
○ Γ(a + 1) = aΓ(a)
○ Γ(n + 1) = n!
⑵ gamma distribution
① probability density function: for x, r, λ > 0,
Figure 3. probability density function of gamma distribution
○ Python programming: Bokeh is used for web-page visualization
# see https://www.statology.org/gamma-distribution-in-python/
import numpy as np
import scipy.stats as stats
from bokeh.plotting import figure, output_file, show
output_file("gamma_distribution.html")
x = np.linspace(0, 40, 100)
y1 = stats.gamma.pdf(x, a = 5, scale = 3)
y2 = stats.gamma.pdf(x, a = 2, scale = 5)
y3 = stats.gamma.pdf(x, a = 4, scale = 2)
p = figure(width=400, height=400, title = "Normal Distribution",
tooltips=[("x", "$x"), ("y", "$y")])
p.line(x, y1, line_width=2, color = 'red', legend_label = 'shape=5, scale=3')
p.line(x, y2, line_width=2, color = 'green', legend_label = 'shape=2, scale=5')
p.line(x, y3, line_width=2, color = 'blue', legend_label = 'shape=4, scale=2')
show(p)
② meaning
○ the probability distribution of time until the r-th event occurs
○ r (shape parameter)
○ λ (rate parameter): the average number of events per unit period
○ β (scale paramete): β = 1 / λ
⑶ statistics
① moment generating function
② average: E(X) = r / λ
③ variance: VAR(X) = r / λ2
⑷ relationship with different probability distributions
① binomial distribution
② negative binomial distribution
③ beta distribution
4. Exponential distribution
⑴ Overview
① definition: a special case where α = 1 in the gamma distribution
○ That is, the period until the first event occurs
② Special case with α = 1 in gamma distribution
③ meaning of parameter
○ β (survival parameter
○ λ (rate parameter): average number of events per unit period
④ Poisson distribution: duration is fixed. number of events is the random variable
⑵ probability density function: for x > 0,
Figure 4. probability density function of exponential distribution
① Python programming: Bokeh is used for web-page visualization
# see https://www.alphacodingskills.com/scipy/scipy-exponential-distribution.php
import numpy as np
from scipy.stats import expon
from bokeh.plotting import figure, output_file, show
output_file("exponential_distribution.html")
x = np.arange(-1, 10, 0.1)
y = expon.pdf(x, 0, 2)
p = figure(width=400, height=400, title = "Exponential Distribution",
tooltips=[("x", "$x"), ("y", "$y")])
p.line(x, y, line_width=2, legend_label = 'loc=0, scale=2')
show(p)
⑶ statistics
① moment generating function
② average: E(X) = 1 / λ
○ meaning: intuitively, 1 / λ can be seen
③ variance: VAR(X) = 1 / λ2
⑷ memorylessness
① definition
② example: when battery life time follows exponential distribution, existing usage time doesn’t affect the remaining life time
⑸ Example
5. beta distribution
⑴ beta function: for α, β > 0,
⑵ beta distribution
Figure 5. probability density function of beta distribution
① Python programming: Bokeh is used for web-page visualization
# see https://vitalflux.com/beta-distribution-explained-with-python-examples/
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import beta
from bokeh.plotting import figure, output_file, show
output_file("beta_distribution.html")
x = np.linspace(0, 1, 100)
y1 = beta.pdf(x, 2, 8)
y2 = beta.pdf(x, 5, 5)
y3 = beta.pdf(x, 8, 2)
p = figure(width=400, height=400, title = "Beta Distribution",
tooltips=[("x", "$x"), ("y", "$y")])
p.line(x, y1, line_width=2, color = 'red', legend_label = 'a=2, b=8')
p.line(x, y2, line_width=2, color = 'green', legend_label = 'a=5, b=5')
p.line(x, y3, line_width=2, color = 'blue', legend_label = 'a=8, b=2')
show(p)
② E(X) = α ÷ (α + β)
③ VAR(X) = αβ ÷ ((α + β)2(α + β + 1))
⑵ relationship with gamma function
⑶ characteristic
① commutative law: B(α, β) = B(β, α)
② equivalent expression
⑷ generalized beta distribution
6. Pareto distribution
⑴ simple Pareto distribution
① probability density function: for shape parameter a,
Figure 6. probability density function of simple Pareto distribution
○ Python programming: Bokeh is used for web-page visualization
# see https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.pareto.html
import matplotlib.pyplot as plt
from scipy.stats import pareto
from bokeh.plotting import figure, output_file, show
output_file("pareto_distribution.html")
x = np.linspace(1, 10, 100)
y1 = pareto.pdf(x, 1)
y2 = pareto.pdf(x, 2)
y3 = pareto.pdf(x, 3)
p = figure(width=400, height=400, title = "Pareto Distribution",
tooltips=[("x", "$x"), ("y", "$y")])
p.line(x, y1, line_width=2, color = 'red', legend_label = 'a=1')
p.line(x, y2, line_width=2, color = 'green', legend_label = 'a=2')
p.line(x, y3, line_width=2, color = 'blue', legend_label = 'a=3')
show(p)
② probability distribution function
⑵ generalized Pareto distributino
① probability density function: for scale parameter b,
② probability distribution function
7. logistic distribution
⑴ simple logistic distribution
① probability density function
Figure 7. simple logistic distribution
○ Python programming: Bokeh is used for web-page visualization
# see https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.logistic.html
import matplotlib.pyplot as plt
from scipy.stats import logistic
from bokeh.plotting import figure, output_file, show
output_file("logistic_distribution.html")
x = np.linspace(1, 10, 100)
y = logistic.pdf(x)
p = figure(width=400, height=400, title = "Logistic Distribution",
tooltips=[("x", "$x"), ("y", "$y")])
p.line(x, y, line_width=2)
show(p)
⑵ generalized logistic distribution
① probability density function
8. Dirichlet distribution
⑴ Overviwe: Drawing attention for being able to analyze the simplex
⑵ Probability density function: for x = (x1, ····, xD) and positive parameters (λ1, ····, λD),
Figure 8. Dirichlet distribution
Input : 2019.06.19 00:27