Chapter 6. Discrete probability distribution
Higher category: 【Statistics】 Statistics Overview
5. Hypergeometric distribution
7. Negative binomial distribution
8. Negative hypergeometric distribution
1. Uniform distribution
⑴ definition: probability distribution with constant probabilities for all random variables
⑵ probability mass function: p(x) = (1 / n) I{x = x1, ···, xn}
Figure 1. probability mass function of uniform distribution
① Python programming: Bokeh is used for web-page visualization
from bokeh.plotting import figure, output_file, show
output_file("uniform_distribution.html")
graph = figure(width = 400, height = 400, title = "Uniform Distribution",
tooltips=[("x", "$x"), ("y", "$y")] )
x = [1, 2, 3, 4, 5, 6, 7, 8]
top = [1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8]
width = 0.5
graph.vbar(x, top = top, width = width, color = "navy", alpha = 0.5)
show(graph)
⑶ Example
2. Bernoulli distribution
⑴ Bernoulli trials: implementation in which the result of implementation is successful (X = 1) or failure (X = 0)
⑵ Bernoulli distribution: probability distribution when Bernoulli’s implementation is once
⑶ probability mass function: p(x) = θ I{x = 1}+ (1 - θ) I{x = 0}
Figure 2. probability mass function of Bernoulli distribution at θ = 0.6
① Python programming: Bokeh is used for web-page visualization
from bokeh.plotting import figure, output_file, show
output_file("Bernoulli_distribution.html")
x = [0, 1]
top = [0.4, 0.6]
width = 0.5
graph = figure(width = 400, height = 400, title = "Bernoulli Distribution",
tooltips=[("x", "$x"), ("y", "$y")] )
graph.vbar(x, top = top, width = width, color = "navy", alpha = 0.5)
show(graph)
⑷ statistics
① moment generating function
② average: E(X) = θ
③ variances: VAR(X) = E(X2) - E(X)2 = θ - θ2 = θ (1 - θ)
3. Binomial distribution
⑴ definition : probability distribution of the number of successes when Bernoulli’s trials are repeated n times
① the number of trials and the probability of implementation are fixed
⑵ probability mass function
① p(x) = nCx θx (1 - θ)n-x
② p(x) : the probability of succeeding only x times out of n
③ nCx : the number of x-number combinations among the numbers 1, 2, · · · and n
④ θx : the probability of success when there are the above x-number combinations
⑤ (1 - θ)n-x : the probability of fail if it’s not the above x-number combinations
Figure 3. probability mass function of binomial distribution at n = 30, p = 0.6
⑥ Python programming: Bokeh is used for web-page visualization
# see https://www.geeksforgeeks.org/python-binomial-distribution/
from scipy.stats import binom
from bokeh.plotting import figure, output_file, show
output_file("binomial_distribution.html")
n = 30
p = 0.6
x = list(range(n+1))
top = [binom.pmf(r,n,p) for r in x]
width = 0.5
graph = figure(width = 400, height = 400, title = "Binomial Distribution",
tooltips=[("x", "$x"), ("y", "$y")] )
graph.vbar(x, top = top, width = width, color = "navy", alpha = 0.5)
show(graph)
⑶ statistics
① idea : since the i-th Bernoulli trial follows the Bernoulli distribution,
② moment generating function
③ average: E(X) = nθ
④ variance: VAR(X) = nθ(1 - θ)
⑷ Example problems for binomial distribution
4. Multinomial distribution
⑴ multinomial trials: extend the Bernoulli’s trial by three or more in case of results
⑵ multinomial distribution: probability distribution when multinomial trials are repeated n times.
⑶ probability mass function
① premise : x1 + x2 + ··· + xk = n
② p(x1, x2, ··· , xk) = nCx1 × n-x1Cx2 × ··· × xkCxk × θ1x1 θ2x2 ··· θkxk
5. Hypergeometric distribution
⑴ definition : If the number of successes is M out of all N, the probability distribution of the number of successes extracted when n are extracted without-replacement
⑵ probability mass function
Figure 4. probability mass function of hypergeometric distribution at [M, n, N] = [20, 7, 12]
① Python programming: Bokeh is used for web-page visualization
# see https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.hypergeom.html
from scipy.stats import hypergeom
from bokeh.plotting import figure, output_file, show
output_file("hypergeometric_distribution.html")
[M, n, N] = [20, 7, 12]
rv = hypergeom(M, n, N)
x = np.arange(0, n+1)
top = rv.pmf(x)
width = 0.5
graph = figure(width = 400, height = 400, title = "Hypergeometric Distribution",
tooltips=[("x", "$x"), ("y", "$y")] )
graph.vbar(x, top = top, width = width, color = "navy", alpha = 0.5)
show(graph)
⑶ statistics
① average: E(X) = nM / N
○ similar to the binary distribution of E(X) = nθ = nM / N
② variance: VAR(X) = [(N-n) / (N-1)] × [nM / N] × [1 - M / N]
⑷ the relationship with the binomial distribution
① the conditional distribution of binomial distribution: hypergeometric distribution
② the limit of the hypergeometric distribution (n → ∞) : binominal distribution (n → ∞)
③ binary distribution is based on with-replacement
6. Geometric distribution
⑴ definition: for extraction with a probability of success of θ, the probability distribution for the number of trials until successful.
① the probability of implementation is fixed and the number of implementations changes
⑵ probability mass function: p(x) = θ (1 - θ)x-1 I{x = 1, 2, ···}
Figure 5. geometric distribution at θ = 0.5
① Python programming: Bokeh is used for web-page visualization
# see https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.geom.html
from scipy.stats import geom
from bokeh.plotting import figure, output_file, show
output_file("geometric_distribution.html")
n = 10
p = 0.6
x = np.arange(0, n+1)
top = geom.pmf(x, p)
width = 0.5
graph = figure(width = 400, height = 400, title = "Geometric Distribution",
tooltips=[("x", "$x"), ("y", "$y")] )
graph.vbar(x, top = top, width = width, color = "navy", alpha = 0.5)
show(graph)
⑶ statistics
① moment generating function
② average: E(X) = 1 / θ
○ meaning: intuitively, average number of trials × probability of success = 1 is established
③ variance: VAR(X) = (1 - θ) / θ2
7. Negative binomial distribution
⑴ definition: if the probability of success is θ, the probability distribution for the number of trials until the r-th success is achieved
① in the binomial distribution, the number of trials and the probability of implementation are fixed, and the number of successes varies
② in the negative binomial distribution, the number of successes and probability of implementation are fixed, and the number of trials varies
⑵ probability mass function
① type 1. fixes the number of successes with r
○ x: number of trials
○ r: number of successes
○ θ: probability of success
○ x-1Cr-1 : the number of cases where the x-th is success, and only r-1 trials is successful in the previous x-1 trials.
② type 2. fixes the number of failures with r*
○ k: number of successes
○ r*: number of failures
○ p: probability of success
○ k+r-1</sub>Ck : the number of cases where k+rth is a failure, and only r-1 fails in the previous k+r-1 trial
③ graph
Figure 6. probability mass function of negative binomial distribution at r = 5, θ = 0.6
④ Python programming: Bokeh is used for web-page visualization
# see https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.nbinom.html
from scipy.stats import nbinom
from bokeh.plotting import figure, output_file, show
output_file("negative_binomial_distribution.html")
n = 5
p = 0.6
x = np.arange(0, 13)
top = nbinom.pmf(x, n, p)
width = 0.5
graph = figure(width = 400, height = 400, title = "Negative Binomial Distribution",
tooltips=[("x", "$x"), ("y", "$y")] )
graph.vbar(x, top = top, width = width, color = "navy", alpha = 0.5)
show(graph)
⑶ statistics
① statistics for type 1
○ idea: X = ∑Xi
○ Xi : the number of trials of i-th success after the i-1 times of successes. follows geometric distribution
○ moment generating function
○ average: E(X) = r / θ
○ variance: VAR(X) = r(1-θ) / θ2
② statistics for type 2
○ average: E(X) = r*p / (1-p)
○ variance: VAR(X) = r*p / (1-p)2
⑷ example
① situation: one of the n types of figures will be randomly provided in each game
② X : the number of games to watch until all figures are collected
③ question: E(X)
④ idea: X = X1 + ··· + Xn
⑤ Xi : the number of games you have to watch until you collect the i-th new figure. follows geometric distribution
⑥ E(X)
⑸ Example problems for negative binomial distribution
8. Negative hypergeometric distribution
⑴ definition
① situation: out of N, the number of successes is k
② question: if you pick one success by without-replacement, the number of failures you picked until then
9. Poisson distribution
⑴ definition: for an event that occurs λ times on average during a unit time, Poisson distribution is defined as the probability distribution of the number of times the event occurs in a unit time
① λ : parameter (∈ ℝ)
② for a time interval k times of the unit time, the Poisson distribution of λ* = kλ is considered
③ indeed, actively used.
⑵ probability mass function
① idea: binominal distribution and limit
② if the unit time is divided by n equal parts, the probability of the event occurring in each equal part is λ / n
③ the probability that an event will occur x times in a unit time.
④ probability mass function: you just have to take the limit of ③ by n → ∞
⑤ graph
Figure 7. probability mass function of Poisson distribution at λ = 0.6
⑥ Python programming: Bokeh is used for web-page visualization
# see https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.poisson.html
from scipy.stats import poisson
from bokeh.plotting import figure, output_file, show
output_file("poisson_distribution.html")
lam = 0.6
x = np.arange(0, 4)
top = poisson.pmf(x, lam)
width = 0.5
graph = figure(width = 400, height = 400, title = "Poisson Distribution",
tooltips=[("x", "$x"), ("y", "$y")] )
graph.vbar(x, top = top, width = width, color = "navy", alpha = 0.5)
show(graph)
⑶ statistics
① moment generating function
② average: E(X) = λ
③ variance: VAR(X) = λ
⑷ characteristic
① the sum of independent probability variables following the Poisson distribution also follows the Poisson distribution
⑸ relationship with binomial distribution
① conditional distribution of Poisson distribution: binominal distribution
② the limit of binomial distribution (n → ∞): Poisson distribution
⑹ example
① situation: got an average of 30 calls per hour
② question: the probability of getting 2 calls in 3 minutes.
③ as λ = 30, λ* = 30 ÷ 20 = 1.5
④ calculation
⑺ Example problems for Poisson distribution
Input: 2019.06.18 23:48