Chapter 7. Continuous Probability Distribution
Higher category : 【Statistics】 Statistics Overview
a. Q-Q plot
1. uniform distribution
⑴ definition: probability distribution with a constant probability for all random variables
⑵ probability density function: X ~ u[a, b], p(x) = 1 / (b - a) I{a ≤ x ≤ b}
① Python programming: Bokeh is used for web-page visualization
from bokeh.plotting import figure, output_file, show output_file("uniform_distribution.html") p = figure(width=400, height=400, title = "Uniform Distribution", tooltips=[("x", "$x"), ("y", "$y")]) p.line([1, 2, 3, 4, 5, 6, 7, 8, 9], [1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8], line_width=2) show(p)
⑶ statistics
① moment generating function
② average : E(X) = (a + b) / 2
③ variance: VAR(X) = (b - a)2 / 12
⑷ note
① marginal probability distribution has the meaning of length ÷ total area
2. normal distribution
⑴ definition : the limit of nCx θx (1 - θ)n-x by n → ∞
① as it is universally observed, it is called normal distribution
② generally, the standard normal distribution density function is expressed as φ(·) and the cumulative distribution function as Φ(·)
③ central limit theorem: if X = ∑Xi, taking n → ∞ will lead us to the normal distribution
④ first induced to approximate binomial distribution (De Moivre, 1721)
⑤ used to analyze model error in astronomy (Gaus, 1809)
○ by the fact, this is also known as Gaussian distribution
⑵ probability density function
① Python programming: Bokeh is used for web-page visualization
# see https://stackoverflow.com/questions/10138085/how-to-plot-normal-distribution import numpy as np import scipy.stats as stats from bokeh.plotting import figure, output_file, show output_file("normal_distribution.html") x = np.linspace(-3, 3, 100) y = stats.norm.pdf(x, 0, 1) p = figure(width=400, height=400, title = "Normal Distribution", tooltips=[("x", "$x"), ("y", "$y")]) p.line(x, y, line_width=2) show(p)
⑶ statistics
① moment generating function
② average : E(X) = μ
③ variance : VAR(X) = σ2
⑷ characteristic
① characteristic 1. symmetric around μ
② characteristic 2. if X ~ N(μ, σ2), Y = aX + b ~ N(aμ + b, a2σ2)
③ characteristic 3. if Xi ~ N(μi, σi2), X = ∑Xi ~ N(∑μi, ∑σi2)
④ characteristic 4. uncorrelatedness : if X and Y are jointly normal and uncorrelated, X and Y are independent
⑸ standard normal distribution
① definition: a normal distribution with a mean of 0 and a standard deviation of 1
② normalization: if X ~ N(μ, σ2), Z = (X - μ) / σ
③ cumulative distribution function Φ(z) of the standard normal distribution
④ zα : zα value is the value where the probability that X has a greater value than zα is α
⑹ normal distribution table
3. gamma distribution
⑴ gamma function
① definition 1. for x > 0,
② definition 2.
③ characteristic
○ Γ(-3/2) = 4/3 √π
○ Γ(-1/2) = -2 √π
○ Γ(1/2) = √π
○ Γ(1) = 1
○ Γ(3/2) = 1/2 √π
○ Γ(a + 1) = aΓ(a)
○ Γ(n + 1) = n!
⑵ gamma distribution
① probability density function: for x, r, λ > 0,
○ Python programming: Bokeh is used for web-page visualization
# see https://www.statology.org/gamma-distribution-in-python/ import numpy as np import scipy.stats as stats from bokeh.plotting import figure, output_file, show output_file("gamma_distribution.html") x = np.linspace(0, 40, 100) y1 = stats.gamma.pdf(x, a = 5, scale = 3) y2 = stats.gamma.pdf(x, a = 2, scale = 5) y3 = stats.gamma.pdf(x, a = 4, scale = 2) p = figure(width=400, height=400, title = "Normal Distribution", tooltips=[("x", "$x"), ("y", "$y")]) p.line(x, y1, line_width=2, color = 'red', legend_label = 'shape=5, scale=3') p.line(x, y2, line_width=2, color = 'green', legend_label = 'shape=2, scale=5') p.line(x, y3, line_width=2, color = 'blue', legend_label = 'shape=4, scale=2') show(p)
② meaning
○ the probability distribution of time until the r-th event occurs
○ r (shape parameter)
○ λ (rate parameter): the average number of events per unit period
○ β (scale paramete): β = 1 / λ
⑶ statistics
① moment generating function
② average : E(X) = r / λ
③ variance: VAR(X) = r / λ2
⑷ relationship with different probability distributions
① binomial distribution
② negative binomial distribution
③ beta distribution
4. exponential distribution
⑴ Overview
① definition: a special case where α = 1 in the gamma distribution
○ That is, the period until the first event occurs
② Special case with α = 1 in gamma distribution
③ meaning of parameter
○ β (survival parameter
○ λ (rate parameter): average number of events per unit period
④ Poisson distribution : duration is fixed. number of events is the random variable
⑵ probability density function: for x > 0,
① Python programming : Bokeh is used for web-page visualization
# see https://www.alphacodingskills.com/scipy/scipy-exponential-distribution.php import numpy as np from scipy.stats import expon from bokeh.plotting import figure, output_file, show output_file("exponential_distribution.html") x = np.arange(-1, 10, 0.1) y = expon.pdf(x, 0, 2) p = figure(width=400, height=400, title = "Exponential Distribution", tooltips=[("x", "$x"), ("y", "$y")]) p.line(x, y, line_width=2, legend_label = 'loc=0, scale=2') show(p)
⑶ statistics
① moment generating function
② average: E(X) = 1 / λ
○ meaning: intuitively, 1 / λ can be seen
③ variance: VAR(X) = 1 / λ2
⑷ memorylessness
① definition
② example: when battery life time follows exponential distribution, existing usage time doesn’t affect the remaining life time
5. beta distribution
⑴ beta function: for α, β > 0,
⑵ beta distribution
① Python programming: Bokeh is used for web-page visualization
# see https://vitalflux.com/beta-distribution-explained-with-python-examples/ import numpy as np import matplotlib.pyplot as plt from scipy.stats import beta from bokeh.plotting import figure, output_file, show output_file("beta_distribution.html") x = np.linspace(0, 1, 100) y1 = beta.pdf(x, 2, 8) y2 = beta.pdf(x, 5, 5) y3 = beta.pdf(x, 8, 2) p = figure(width=400, height=400, title = "Beta Distribution", tooltips=[("x", "$x"), ("y", "$y")]) p.line(x, y1, line_width=2, color = 'red', legend_label = 'a=2, b=8') p.line(x, y2, line_width=2, color = 'green', legend_label = 'a=5, b=5') p.line(x, y3, line_width=2, color = 'blue', legend_label = 'a=8, b=2') show(p)
② E(X) = α ÷ (α + β)
③ VAR(X) = αβ ÷ ((α + β)2(α + β + 1))
⑵ relationship with gamma function
⑶ characteristic
① commutative law: B(α, β) = B(β, α)
② equivalent expression
⑷ generalized beta distribution
6. Pareto distribution
⑴ simple Pareto distribution
① probability density function: for shape parameter a,
○ Python programming: Bokeh is used for web-page visualization
# see https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.pareto.html import matplotlib.pyplot as plt from scipy.stats import pareto from bokeh.plotting import figure, output_file, show output_file("pareto_distribution.html") x = np.linspace(1, 10, 100) y1 = pareto.pdf(x, 1) y2 = pareto.pdf(x, 2) y3 = pareto.pdf(x, 3) p = figure(width=400, height=400, title = "Pareto Distribution", tooltips=[("x", "$x"), ("y", "$y")]) p.line(x, y1, line_width=2, color = 'red', legend_label = 'a=1') p.line(x, y2, line_width=2, color = 'green', legend_label = 'a=2') p.line(x, y3, line_width=2, color = 'blue', legend_label = 'a=3') show(p)
② probability distribution function
⑵ generalized Pareto distributino
① probability density function: for scale parameter b,
② probability distribution function
7. logistic distribution
⑴ simple logistic distribution
① probability density function
○ Python programming: Bokeh is used for web-page visualization
#see https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.logistic.html import matplotlib.pyplot as plt from scipy.stats import logistic from bokeh.plotting import figure, output_file, show output_file("logistic_distribution.html") x = np.linspace(1, 10, 100) y = logistic.pdf(x) p = figure(width=400, height=400, title = "Logistic Distribution", tooltips=[("x", "$x"), ("y", "$y")]) p.line(x, y, line_width=2) show(p)
⑵ generalized logistic distribution
① probability density function
8. Dirichlet distribution
⑴ Overviwe : Drawing attention for being able to analyze the simplex
⑵ Probability density function : for x = (x1, ····, xD) and positive parameters (λ1, ····, λD),
Input : 2019.06.19 00:27