Chapter 3. Probability Space
Higher category: 【Statistics】 Statistics Overview
1. set theory
a. Inclusion-Exclusion Principle
1. Set theory
⑴ case(outcome): can be defined arbitrarily. marked as ωi
⑵ sample space: set Ω that satisfies the following conditions
① Ω = { ω1, ω2, ···, ωn }
② ωi and ωj are disjoint
③ Ω includes all possible results
④ ((1, 2, 3), (3, 4, 5, 6)) (×)
⑤ ((1, 2), (3, 4, 5, 6)) (○)
⑶ field: a set F of all subsets of Ω
① assumption: Ω = {1, 2, 3}
② F = {∅, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}
⑷ events: each element of F, i.e. subset of sample space
① total event: Ω
② null event: ∅
③ complementary event: Ac = Ω - A
④ union event: A ∪ B
⑤ intersection event: A ∩ B
⑥ mutually exclusive events: relationship between events A and B that satisfy A ∩ B = ∅
⑦ when number of cases is n, the number of events is 2n
⑸ probability
① intuitive definition: probability of a particular event = number of a particular event ÷ number of all cases
② axiomatic approach: proposed by Kolmogorov, the father of statistics
○ probability is a kind of function: F → ℝ
○ axiom 1. P(E) ≥ 0
○ axiom 2. P(Ω) = 1
○ axiom 3. addition rule of probability: about disjoint Ei and Ej
○ Theorem 1. P(∅) = 0
○ P(U) = P(U) + P(∅) ⇔ P(∅) = 0
○ note that even though E ≠ ∅, P(E) can be 0
○ Theorem 2. P(Ec) = 1 - P(E)
○ 1 = P(E ∪ Ec) = P(E) + P(Ec)
○ Theorem 3. E ⊂ F → P(E) ≤ P(F)
○ F = E ∪ (Ec ∩ F) → P(F) = P(E) + P(Ec ∩ F) ≥ P(E)
○ Theorem 4. P(E ∪ F) = P(E) + P(F) - P(E ∩ F)
○ E ∪ F = (Ec ∩ F) ∪ (E ∩ F) ∪ (E ∩ Fc)
○ P(E ∪ F) = (P(F) - P(E ∩ F)) + P(E ∩ F) + (P(E) - P(E ∩ F))
○ Theorem 5. De Morgan’s law
○ (A ∪ B)c = ((Ac ∩ B) ∪ (A ∩ Bc) ∪ (A ∩ B))c = (U - Ac ∩ Bc)c = Ac ∩ Bc
○ Let C = Ac, D = Bc, then C ∩ D = (Cc ∪ Dc)c ⇔ (C ∩ D)c = Cc ∪ Dc
③ Cromwell’s rule: it states that we should not use probabilities of 1 or 0, except for logical statements
⑹ inclusion-exclusion principle
① example: P(A ∪ B ∪ C) = P(A) + P(B) + P(C) - P(A ∩ B) - P(B ∩ C) - P(C ∩ A) + P(A ∩ B ∩ C)
⑺ Example problems for set theory
2. Conditional probability
⑴ conditional probability
① P(A | B): the probability that event A will occur when event B already occurs
② expression : P(A | B) = P(A ∩ B) ÷ P(B)
③ P(A | B) ≠ P(B | A)
④ multiplication rule of probability: P(A ∩ B) = P(A) × P(B | A) = P(B) × P(A | B)
⑤ Multiplication theorem of probabilities with more than one premise
○ The following equation is self-evident because there are implicitly certain prerequisites (e.g., α, β) for any probability
⑵ chain rule of conditional probability
① example : P(A ∩ B ∩ C) = P(A) × P(B | A) × P(C | A ∩ B)
⑶ independence of events: if the conditional probability is equal to the intrinsic probability
① expression: P(A ∩ B) = P(A) × P(B)
② application
○ P(A | B) = P(A ∩ B) ÷ P(B) = P(A)P(A B) = P(A ∩ B) ÷ P(B) = P(A)
○ P(B | A) = P(A ∩ B) ÷ P(A) = P(B)
○ E and F are independent ⇔ Ec and F are independent ⇔ E and Fc are independent ⇔ Ec and Fc are independent
○ if P(A) = 0, A and B are always independent
○ P(A), P(B) > 0, and A and B are independent → A and B are not disjoint
○ if A and B are disjoint, P(A) = 0, P(B) = 0, or A and B are not independent (contrapositive proposition)
③ intuitive meaning
○ information about one variable does not tell you any information about the other
④ relationship with conditional probabilities
○ Pr(X = x | Y = y) = Pr(X = x)
⑷ mutual independence
① condition 1.
② condition 2. includes condition 1
⑸ conditional independence
① definition
② conditional independence is not always independent
③ independence is not always conditional independence
⑹ law of total probability
① partition
○ definition 1. (exhaustive) Ω = B1 ∪ B2 ∪ ··· ∪ Bn
○ definition 2. (disjoint) Bi ∩ Bj = ∅
② conclusion
○ tip: drawing a Ben diagram is easy to understand
③ important: used to induce Bayes’ theorem
⑺ Bayes’ theroem
① assumption 1. S = B1 ∪ B2 ∪ ··· ∪ Bn
② assumption 2. ∀ i, j, Bi ∩ Bj = ∅
③ conclusion
④ Importance 1. a totally unrelated P (A | B) may be used to calculate P (B | A)
⑤ Importance 2. results (posterior, a posterior) cand predict causes (prior, a pryori)
○ A: events observed as a result
○ B: a causal event
○ P(Bi): prior
○ P(Bi | A) : when A is observed, the probability that Bi is the cause (posterior)
○ P(A | Bi) : when Bi is the cause, the probability of A occurring (likelihood)
○ infers probabilities without identifying the population
○ claims that probability is nothing but human belief
⑻ example 1. diagnostic rate
① percentage of cancer patients
P(A) = 1%
② positive probability (hit rate) at diagnosis
P(B)
③ positive probability when diagnosing cancer patients
P(B A) = 95%
④ probability of false positive when diagnosing non-cancer patients
P(B Ac) = 5%
⑤ question : probability of a positive person really getting cancer during diagnosis
P(A B)
⑥ proportion of positive in diagnosis: the law of total probability is used
P(B) = P(B A) + P(B Ac) = 0.95 × 0.01 + 0.05 × 0.99
⑦ probability that any person is a cancer patient and is positive in diagnosis
P(A ∩ B) = P(B A) × P(A) = 0.95 × 0.01
⑧ answer
P(A B) = P(A ∩ B) ÷ P(B) = 0.16
⑨ interpretation : since there are many people who are not cancer patients, the false positive rate is quite high
⑼ Example 2. Monty hall problem
① situation
○ supercar stands behind one of the three doors
○ show participant choose one of the three doors
○ the show host opens any of the doors that the show participant did not open
○ show participant can stick to or change their existing selections
○ question: which choice is reasonable?
② premise
○ show participant first select door 1
○ show host opens door 3
③ definition
○ P(i): probability of supercar behind door 1
○ P(ii): probability of supercar behind door 2
○ P(iii): probability of supercar behind door 3
○ P(Ⅰ): probability of the show host choosing door 1
○ P(Ⅱ): probability of the show host choosing door 2
○ P(Ⅲ): probability of the show host choosing door 3
○ P(i), P(ii), and P(iii) are priors (causes)
○ P(Ⅰ), P(Ⅱ), and P(Ⅲ) are posteriors (results)
④ conditional probability
○ P(ⅰ | Ⅲ): when the show host selects door 3, the probability that there is a supercar behind door 1
○ P(Ⅲ | ⅰ): if there’s a supercar behind door 1, the probability that the show host will choose door 3
⑤ Bayes’ theorem
○ P(ⅰ | Ⅲ)
○ P(ⅱ | Ⅲ)
○ P(ⅲ | Ⅲ)
⑥ conclusion: it’s advantageous for show participant to change their choices
Input: 2019.06.16 22:02