Lecture 3-3. Sigma Algebra(σ-algebra)

Recommended post: 【Statistics】 Lecture 3. Probability Space

1. Sigma Algebra

2. Random Variable

3. Filtration

4. Appendix

1. Sigma Algebra

⑴ Probability Space (Ω, ℱ)

① Ω: Sample Space

② ℱ: Sigma Algebra (σ-algebra, event space), i.e., a collection of subsets of Ω

○ Example 1. When Ω = {1, 2, 3}, the σ-algebra ℱ = {∅, Ω} corresponds to the case of knowing nothing

○ Example 2. When Ω = {1, 2, 3}, the σ-algebra ℱ = {∅, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}, Ω} corresponds to the case of being able to see all events

○ Example 3. When Ω = {1, 2, 3}, the σ-algebra ℱ = {∅, {1}, {2, 3}, Ω} represents an intermediate case

③ ω ∈ Ω: Realized sample. In a random process, it means a sample path.

⑵ Algebra

① Condition 1. non-empty: Ω ∈ ℱ or ∅ ∈ ℱ holds

② Condition 2. Closed under complement: If A ∈ ℱ, then A^C = Ω - A ∈ ℱ also holds

○ Considering together with the non-empty condition implies that ∅ and Ω must be elements of ℱ

③ Condition 3. Closed under finite union: If A, B ∈ ℱ, then A ∪ B ∈ ℱ also holds

○ If A₁, ⋯, A_n ∈ ℱ, then ∪_i A_i = A₁ + ⋯ + A_n ∈ ℱ also holds

⑶ Sigma Algebra (σ-algebra)

① Motivation: When the sample space is very large (e.g., Ω = ℝ), formal probability theory does not apply well (e.g., ℱ = 2^Ω), so it is necessary to restrict ℱ to a sigma algebra. Related to Caratheodory’s extension theorem.

② Condition 1. Must be an algebra

③ Condition 2. Closed under countably infinite unions: For A_i ∈ ℱ, ∪_i A_i = A₁ + ⋯ + A_∞ ∈ ℱ

④ Intuitive meaning of sigma algebra

○ A collection of subsets on a non-empty set Ω

○ The set of all events to which probability can be assigned

○ The set of all functions/random variables that can be generated

⑤ σ-algebras can vary in size

○ Trivial σ-algebra: {∅, ℝ} (smallest)

○ σ(𝒜): The smallest σ-algebra containing all elements of 𝒜, i.e., generated by 𝒜

○ Borel σ-algebra: ℬ(ℝ) (the smallest σ-algebra containing all open sets)

○ Countable/co-countable σ-algebra: the collection of sets that are countable or co-countable

○ Power-set σ-algebra: 𝒫(ℝ) (largest)

○ σ-algebra of Lebesgue measurable sets: ℒ (larger than Borel; a prototypical “completion”)

○ Intersections of σ-algebras are again σ-algebras.

⑥ Borel σ-algebra: The smallest sigma algebra containing all open sets

○ Ω = ℝ, ℱ = ℬ(ℝ)

○ Using the properties of sigma algebra, open intervals → closed intervals, half-open intervals, singletons {x}, [1,3] ∪ [4,5] are also included in the Borel algebra

○ Complicated sets like the set of rationals and irrationals are also Borel sets

○ More complex sets obtained by countably many unions, differences, or intersections of intervals are all Borel sets

○ Not limited only to ℝ; can be defined for any topological space X: for example, on [0,1], on ℝⁿ, or on any general topological space, each has its own Borel σ-algebra

○ In fact, there exist sets such as Lebesgue non-measurable sets and Vitali subsets that cannot be made by the Borel σ-algebra: related to uncountable infinity

2. Random Variable

⑴ Probability Distribution

① A function that assigns values to elements of ℱ, i.e., ℙ: ℱ ↦ [0, 1]

② Condition 1. ℙ(Ω) = 1

③ Condition 2. For countably infinite, mutually exclusive {A_i}_i∈ℕ, ℙ(A₁ + ⋯ + A_∞) = ℙ(A₁) + ⋯ + ℙ(A_∞)

○ Disjoint: A_i ∩ A_j = ∅

⑵ Random Variable (measurable function): Linking events to values

① Expression 1. If there exists a function X such that ^∀B ∈ ℬ(ℝ), X^-1(B) ∈ ℱ (measurable), meaning ℙ(X ∈ B) is well-defined, then X is measurable and that function is called a random variable.

② Expression 2. X: Ω → ℝ is a random variable ⇔ X^-1(A) = {ω ∈ Ω: X(ω) ∈ A} ∈ ℱ ∀ A ∈ ℬ(ℝ)

○ Meaning 1. Existence of inverse image: i.e., ℬ(ℝ) is ℱ-measurable. If X is continuous, this usually holds.

○ Meaning 2. Existence of range of inverse image: i.e., the inverse image’s range is a subset of ℱ

○ ℱ can be viewed as equivalent to the collection of all random variables (or functions) measurable with respect to it

○ Example: ℙ_X(x ∈ A) = ℙ(X^-1(A))

③ Expression 3. X is measurable ⇔ ∀a ∈ ℝ, {ω : X(ω) ≤ a} ∈ ℱ

④ Precise distinction between a random variable and “measurable”

○ Measurable can be defined without a measure: it only requires the pairs (Ω, ℱ) and (ℝ, 𝒢). In practice, we usually take 𝒢 = ℬ(ℝ).

○ A random variable is a measurable function on a probability space, i.e., with the measure ℙ included.

⑤ Example 1. Bernoulli distribution

○ Domain = Ω = {Head, Tail}

○ ℱ = 2^Ω = {∅, {Head}, {Tail}, {Head, Tail}}

○ Codomain = {0, 1}

○ 𝒢 = 2^Codomain = {∅, {0}, {1}, {0, 1}}

○ As there is an element of ℱ corresponding to an arbitrary element of 𝒢, X : Ω → {0, 1} is measurable.

⑥ Example 2. An example of a function which is not measurable

○ Ω = [0, 1], ℱ = {∅, [0, 1]}

○ 𝒢 = ℬ(ℝ) contains [0, 1/2], but there is no element of ℱ corresponding to this.

○ Thus, X : (Ω, ℱ) → (ℝ, 𝒢) is not measurable. Specifically, it is called “not ℱ-measurable”, and ℱ needs more information.

⑦ General measurable space

○ Definition: If X: Ω → Ω₁ between two measurable spaces (Ω, ℱ) and (Ω₁, ℱ₁) satisfies the following condition, then X is called a random variable

⑧ Random Process (stochastic process)

○ Definition: X: ℐ × Ω ↦ E, where for each i ∈ ℐ, there exists a random variable X(i, ·): Ω ↦ E

⑶ π-class and λ-class

① Definition of π-class: If A, B ∈ 𝒞 ⊂ 2^Ω, then A ∩ B ∈ 𝒞

② Definition of λ-class

③ Property of λ-class

④ Dynkin’s theorem: If 𝒟 is a π-class, 𝒞 is a λ-class, and 𝒟 ⊂ 𝒞, then σ(𝒟) ⊂ 𝒞

⑷ Stationary

① Strictly stationary

② Wide-sense stationary: Strictly stationary is also wide-sense stationary

⑸ Independence

① Definition of independence using joint distribution: ℙ(X₁ ∈ B₁, X₂ ∈ B₂) = ℙ(X₁ ∈ B₁) ℙ(X₂ ∈ B₂) ∀B₁, B₂ ∈ ℬ(ℝ)

② Definition of independence using moments

③ Definition of independence using moment generating function

④ Definition of independence using σ-algebra: σ(x₁) and σ(x₂) are independent (where σ(X) = {X^-1(A): A ∈ ℬ(ℝ)})

⑹ Markov Process

① Bayes’ Rule: ℙ(A | B) = ℙ(A ∩ B) / ℙ(B) if ℙ(B) > 0

② Conditional expectation 𝔼[X | 𝒢]

③ Markov process: ∀A ∈ 𝓔, ℙ(X_{i_n} ∈ A | X_i₁, X_i₂, ⋯, X_{i_n-1}) = ℙ(X_{i_n} ∈ A | X_{i_n-1}), i.e., the current state depends only on the immediately preceding state

3. Filtration

⑴ Doob’s Theorem

① σ(X₁, X₂, ···, X_n): The smallest σ-algebra making X₁, X₂, ···, X_n measurable

② Doob’s Theorem: σ(X₁, X₂, ···, X_n) is equivalent to the collection of all functions of the form g(X₁, X₂, ···, X_n)

③ The larger the σ-algebra, the greater the number of measurable functions with respect to it — i.e., more information

⑵ Filtration

① A collection of σ-algebras arranged in an increasing order by inclusion

② Ordered by ⊆, and if ℱ₁ ⊆ ℱ₂, then ℱ₂ is after ℱ₁

③ For convenience, with time index t = 0, 1, 2, ⋯, filtration is {ℱ_t}_t∈ℤ⁺, satisfying ℱ_s ⊆ ℱ_t for all s ≤ t

④ Intuitive meaning: Represents a situation where information increases as time passes and observations accumulate

⑶ Martingale

① Property of conditional expectation

○ For any random variable Y, 𝔼[Y | X₁, ···, X_n] = 𝔼[Y | σ(X₁, ···, X_n)] holds

○ Reason: σ(X₁, ···, X_n) is equivalent to the set of all functions generated by X₁, ···, X_n

○ Additionally, when σ(Y) ⊂ σ(Z), 𝔼[𝔼[X ㅣ Z] ㅣ Y] = 𝔼[𝔼[X ㅣ Y] ㅣ Z] = 𝔼[X ㅣ Y] is established.

② Martingale: A stochastic process {X_t}_t∈ℤ⁺ adapted to filtration {ℱ_t}_t∈ℤ⁺ satisfies all the following conditions

○ Condition 1. For all t ∈ ℤ⁺, X_t is ℱ_t-measurable

○ If s ≤ t ≤ s’, ℱ_s ⊆ ℱ_t ⊆ ℱ_s’, x_t ∈ ℱ_t is not ℱ_s-measurable (∵ lack of information), but ℱ_s’-measurable.

○ Condition 2. For all t ∈ ℤ⁺, 𝔼[|X_t|] is finite

○ Condition 3. For all t ∈ ℤ⁺, 𝔼[X_t | ℱ_s] = X_s, almost surely for all s ≤ t

○ Interpretation: Given only the information up to time s (ℱ_s), the optimal prediction of X_t equals X_s (i.e., the prediction is constrained to X_s).

○ Remark: The martingale property is needed only when predicting the future from the past. In particular, for s > t we have 𝔼[X_t ㅣ ℱ_s] = X_t regardless of whether (X_t) is a martingale (assuming integrability).