Chapter 5. Statistical Quantity
Higher category: 【Statistics】 Statistics Overview
3. Covariance and correlation coefficient
a. SSIM
b. Distance Function and Similarity
1. Expected value
⑴ definition: the expected value of the random variable X, i.e. E(X), is the X value obtained on average as a result of the implementation
① discrete random variable
② continuous random variable
⑵ joint probability distribution function
① discrete random variable
② continuous probability variable
⑶ Properties of expected values
① Linearity: E(aX + bY + c) = aE(X) + bE(Y) + c
② If X and Y are independent, E(XY) = E(X) × E(Y)
⑷ example
① X: if you mix n hats and extract one without-replacement, the number of people who correctly found their hats
② problem purpose: it is difficult to calculate E(X) after obtaining p(X)
③ X = X1 + ··· + Xn. Xi: if i-th person found his hat, the value is 1, if not 0
④ Approach 1. number of cases
⑤ Approach 2. when the i-th person first extracts or not, the expected value is consistent based on symmetry.
⑸ Cauchy distribution: the expected value is not defined
⑹ Example problems for expected value
2. Standard deviation
⑴ Deviation
① Definition: D = X - E(X)
② Characteristic 1. E(D) = E(X - E(X)) = E(X) - E(X) = 0
⑵ Variance
① Definition: When E(X) = μ, VAR(X) = E((X - μ)2) = E(D2)
② Characteristic 1. VAR(X) = E(X2) - μ2
○ Proof: VAR(X) = E((X - μ)2) = E(X2) - 2μE(X) + μ2 = E(X2) - 2μ2 + μ2 = E(X2) - μ2
③ Characteristic 2. VAR(aX + b) = a2 VAR(X)
④ Characteristic 3. introduction to covariance: VAR(X + Y) = VAR(X) + VAR(Y) + 2 COV(X, Y)
○ Created by R.A. Fisher in 1936.
○ Proof
○ Generalization
○ Linearity: When X and Y are independent, VAR(X + Y) = VAR(X) + VAR(Y)
○ Definition of covariance: Given a data set of non-overlapping (x1, y1), ···, (xn, yn), the covariance of x and y is given as follows
○ If redundancy is allowed, the definition of covariance is modified as follows by introducing the sample ratio pi: if yi = xi, then covariance = variance
○ Two-dimensional covariance matrix Σ (where x = (x1, x2)T = (x, y)T)
○ Σ = E[(x-E[x])(x-E[x])T] is established not only for two dimensions but also for n dimensions.
⑤ Characteristic 4. VAR(X) = 0 ⇔ P(X = constant) = 1 (∵ Chebyshev inequality)
⑶ Standard deviation
① Definition: standard deviation of X, i.e. σ or SD(X) = √ VAR(X) ⇔ σ2 = VAR(X)
② Idea: X and variance differ in unit, but X and standard deviation are same in unit
③ Characteristic: variance and σ are always non-negative. covariance can have negative value
⑷ Coefficient of variation
① Standard deviation divided by mean
② Used to relatively compare the degree of scattering of data with different units of measurement
3. Covariance and correlation coefficient
⑴ Covariance
① definition: about E(X) = μx , E(Y) = μy,
○ COV(X, Y) = σxy = E{(X - μx)(Y - μy)}
② meaning: when X changes, the degree of change of Y
③ characteristic 1. COV(X, Y) = E(XY) - E(X)E(Y)
○ proof: COV(X, Y) = E((X - μx)(Y - μy)) = E(XY) - μxE(Y) - μyE(X) + μxμy = E(XY) - μxμy
④ characteristic 2. if X = Y, COV(X, Y) = VAR(X)
⑤ characteristic 3. if X and Y are independent, COV(X, Y) = 0
○ proof: COV(X, Y) = E(XY) - E(X)E(Y) = E(X)E(Y) - E(X)E(Y) = 0
○ because independence is a more stringent condition, even if COV (X, Y) = 0, it is not possible to conclude that X and Y are independent
⑥ characteristic 4. COV(aX + b, cY + d) = ac COV(X, Y)
⑦ characteristic 5. COV(a1 X1 + a2 X2, Y) = a1 COV(X1, Y) + a2 COV(X2, Y)
⑧ Limitation: by characteristic 4, covariance contains both association and size information, so you cannot say only association
⑵ correlation coefficient: also referred to as Pearson correlation coefficient
① definition: about standard deviation X and Y, i.e. σx, σy each,
○ Multiple correlation coefficients: the representation of correlation coefficients when there are three or more variables
○ Complete correlation: ρ = 1
○ No correlation: ρ = 0
② Background: to show only association information except size information. related to the limitation of covariance
③ Characteristics
○ Correlation between two variables measured on an interval or ratio scale.
○ Targeted towards continuous variables.
○ Assumption of normality.
○ Widely utilized in most cases.
④ characteristic 1. -1 ≤ ρ(X, Y) ≤ 1 (correlation inequality)
○ proof: Coshi-Schwarz inequality
○ ρ(X, Y) = 1: X and Y are fully proportional
○ ρ(X, Y) = -1: complete inverse relationship of X and Y
○ ρ(X, Y) = 0 does not mean X and Y are independent
○ Exception 1. p(x) = ⅓ I{x = -1, 0, 1} , Y = X2
○ COV(X, Y) = E(XY) - E(X)E(Y) = E(XY) - E(X3) = 0
○ because p(1, 1) = ⅓, p(x = 1) = ⅓, p(y = 1) = ⅔, p(x, y) ≠ p(x) × p(y)
○ disagreements in the definition of independence
○ Exception 2. S ={(x, y) | -1 ≤ x ≤ 1, x2 ≤ y ≤ x2 + 1/10}, p = 5 I {(x, y) ∈ S}
○ COV(X, Y) = E(XY) - E(X)E(Y) = E(XY) = 0
○ in the definition of independence, constant = p(x, y) = p(x) × p(y) should be met. however, p(y) is not constant
○ disagreements in the definition of independence
⑤ characteristic 2. ρ(X, X) = 1, ρ(X, -X) = -1
⑥ characteristic 3. ρ(X, Y) = ρ(Y, X)
⑦ characteristic 4. exclusion of size information: ρ(aX + b, cY + d) = ρ(X, Y)
○ Proof: ρ(aX + b, cY + d) = COV(aX + b, cY + d) ÷ aσx ÷ cσy = COV(X, Y) ÷ σxσy = ρ(X, Y)
⑧ characteristic 5. association information : | ρ(X, Y) | = 1 and Y = aX + b, (a ≠ 0, b constant) are necessary and sufficient condition
○ proof of forward direction: The idea of setting Z comes from simple regression analysis
○ proof of reward direction
○ null hypothesis H0: correlation coefficient = 0
○ alternative hypothesis H1: correlation coefficient ≠ 0
○ calculation of t statistics: about the correlation coefficient r obtained from the sample,
○ the above statistics follow the student t distribution with a degree of freedom of n-2 (assuming the number of samples is n)
○
cor(x, y)
○
cor(x, y, method = "pearson")
○
cor.test(x, y)
○
cor.test(x, y, method = "pearson")
⑶ Spearman correlation coefficient
① definition: about x’ = rank(x) and y’ = rank(x),
② Characteristics
○ A method of measuring the correlation between two variables that are in ordinal scale.
○ A non-parametric method targeting ordinal variables.
○ Advantageous in data with many ties (zeroes).
○ Sensitive to deviations or errors within the data.
○ Tends to yield higher values than Kendall’s correlation coefficient.
○
cor(x, y, method = "spearman")
○
cor.test(x, y, method = "spearman")
⑷ Kendall correlation coefficient
① definition: defined about concordant pair and discordant pair
② Characteristics
○ A method of measuring the correlation between two variables that are in ordinal scale.
○ A non-parametric method targeting ordinal variables.
○ Advantageous in data with many ties (zeroes).
○ Useful when the sample size is small or when there are many tied values in the data.
③ Procedure
○ step 1. sort y values in ascending order for x values
○ step 2. for each yi, count the number of concordant pairs in which yj > yi (assuming j > i)
○ step 3. for each yi, count the number of discordant pairs in which yj < yi (assuming j > i)
○ step 4. define correlation coefficient as follows:
○ nc: total number of concordnat pairs
○ nd: total number of discordant pairs
○ n: size of x and y
○
cor(x, y, method = "kendall")
○
cor.test(x, y, method = "kendall")
⑸ Matthew correlation coefficient (MCC)
⑹ χ2: A measure of the suitability of the approximation
① If the measurement data is xm, ym, and the approximate function is f(x)
② Calculating the infinitesimal point through the differential of χ2 when obtaining an approximate function.
③ Use in non-linear regression such as quadratic approximation function
4. Anscombe’s quartet
⑴ showing that the mean, standard deviation, and correlation coefficient cannot describe the shape of a given data
⑵ example 1
Figure 1. example of Anscombe’s quartet
⑶ example 2
Figure 2. 2nd example of Anscombe’s quartet
5. Ordinal statistics
⑴ Overview
① Assumption: Xi and Xj are independent
② Definition: set Yi to be Y1 < ··· < Yn by rearranging X1, ···, and Xn
⑵ Statistic
① Joint probability distribution
② Marginal probability distribution
③ Expected value
⑶ Example problems for order statistics
① Question Type: Questions are asked on the distribution and statistics of the maximum or minimum values out of n values, or the distribution of the k-th order statistic.
② Example 1: A random sample of size 3 is drawn from a uniform distribution on [0, 1]. Calculate the probability that the maximum value of the sample is greater than 0.7.
○ Solution
Pr(Y > 0.7) = 1 - (Pr(X ≤ 0.7))3 = 1 - 0.73 = 0.657
③ Example 2: X follows an exponential distribution with a mean of 1. A sample of size 3 is drawn. Calculate the expected value of the median of the three values.
○ Solution
fY(x) = (3! / 1!1!1!)·(1 - e-x)·e-x·e-x = 6(e-2x - e-3x)
∴ E[Y] = ∫0 to ∞ 6x(e-2x - e-3x) dx = 5/6
6. Conditional statistics
⑴ Conditional expectation
① Definition
② Characteristic
○ E(XY | Y) = YE(X | Y)
○ E(aX1 + bX2 | Y) = aE(X1 | Y) + b(X2 | Y)
③ Law of iterated expectation
○ Lemma
○ Proof
○ Example
when selecting a point of Y randomly at[0, ℓ] as a uniform distribution, and then a point of X randomly at [0, y] as a uniform distribution,
④ Mean independence
○ Independence ⊂ mean independence ⊂ uncorrelatedness
○ Average independence
○ Uncorrelatedness: if the correlation coefficient is 0
○ Normal distribution: if X and Y are jointly normal and uncorrelated, then X and Y are independent
⑵ conditional variance
① Definition: the conditional variance of Y for a given probability variable X
② Law of total variance (decomposition of variance)
○ lemma
○ Proof
○ Meaning
○ Situation: when X ~ P1(θ), Y ~ P2(X)
○ use P2 to calculate VAR(Y | X) and E(Y | X)
○ use P1 to calculate E{·}, VAR{·}
○ E(VAR(X | Y)): intra-group variance
○ VAR(E(X | Y)): inter-group variance
○ Example 1.
○ X: laid-off worker’s unemployment period
○ probability density function of X: exponential distribution
○ 20% of the total workforce: skilled labor force. λ = 0.4
○ 80% of the total workforce: unskilled workers. λ = 0.1
○ calculation of VAR(X)
○ Example 2.
○ Question: Let P be the proportion of policyholders that renew their auto policies. P varies by agent. P follows a beta distribution with mean 0.8 and variance 0.25. A group of 10 policyholders is selected from all policyholders of an insurance company. Let N be the number of policyholders who renew their auto policies. Calculate Var[N].
○ Solution: Var[N] = E[Var[N | P]] + Var[E[N | P]] = E[10P(1-P)] + Var[10P] = 10E[P] - 10E[P2] + 100Var[P] = 24.1
○ Note: The distributions of P1, P2, ···, P10 are not completely independent, as they come from the same distribution. Therefore, Var[N] ≠ ∑i Var[Pi].
Input: 2019.06.17 14:15