Chapter 5. Statistical Quantity
Higher category : 【Statistics】 Statistics Overview
3. Covariance and correlation coefficient
b. Distance Function and Similarity
1. Expected value
⑴ definition : the expected value of the random variable X, i.e. E(X), is the X value obtained on average as a result of the implementation
① discrete random variable
② continuous random variable
⑵ joint probability distribution function
① discrete random variable
② continuous probability variable
⑶ the properties of expected values
① linearity : E(aX + bY + c) = aE(X) + bE(Y) + c
② if X and Y are independent, E(XY) = E(X) × E(Y)
⑷ example
① X: if you mix n hats and extract one without-replacement, the number of people who correctly found their hats
② problem purpose: it is difficult to calculate E(X) after obtaining p(X)
③ X = X1 + ··· + Xn. Xi : if i-th person found his hat, the value is 1, if not 0
④ approach 1. number of cases
⑤ approach 2. when the i-th person first extracts or not, the expected value is consistent based on symmetry.
⑸ Cauchy distribution: the expected value is not defined
2. Standard deviation
⑴ deviation
① definition: D = X - E(X)
② characteristic 1. E(D) = E(X - E(X)) = E(X) - E(X) = 0
⑵ variance
① definition: when E(X) = μ, VAR(X) = E((X - μ)2) = E(D2)
② characteristic 1. VAR(X) = E(X2) - μ2
○ proof: VAR(X) = E((X - μ)2) = E(X2) - 2μE(X) + μ2 = E(X2) - 2μ2 + μ2 = E(X2) - μ2
③ characteristic 2. VAR(aX + b) = a2 VAR(X)
④ characteristic 3. introduction to covariance: VAR(X + Y) = VAR(X) + VAR(Y) + 2 COV(X, Y)
○ proof
○ generalization
○ linearity: when X and Y are independent, VAR(X + Y) = VAR(X) + VAR(Y)
○ Definition of covariance : Given a data set of non-overlapping (x1, y1), ···, (xn, yn), the covariance of x and y is given as follows
○ If redundancy is allowed, the definition of covariance is modified as follows by introducing the sample ratio pi : if yi = xi, then covariance = variance
○ Two-dimensional covariance matrix Σ (where x = (x1, x2)T = (x, y)T)
○ Σ = E[(x-E[x])(x-E[x])T] is established not only for two dimensions but also for n dimensions.
⑤ characteristic 4. VAR(X) = 0 ⇔ P(X = constant) = 1 (∵ Chebyshev inequality)
⑶ standard deviation
① definition: standard deviation of X, i.e. σ or SD(X) = √ VAR(X) ⇔ σ2 = VAR(X)
② idea: X and variance differ in unit, but X and standard deviation are same in unit
③ characteristic: variance and σ are always non-negative. covariance can have negative value
⑷ coefficient of variation
① Standard deviation divided by mean
② Used to relatively compare the degree of scattering of data with different units of measurement
3. Covariance and correlation coefficient
⑴ covariance
① definition: about E(X) = μx , E(Y) = μy,
② meaning : when X changes, the degree of change of Y
③ characteristic 1. COV(X, Y) = E(XY) - E(X)E(Y)
○ proof : COV(X, Y) = E((X - μx)(Y - μy)) = E(XY) - μxE(Y) - μyE(X) + μxμy = E(XY) - μxμy
④ characteristic 2. if X = Y, COV(X, Y) = VAR(X)
⑤ characteristic 3. if X and Y are independent, COV(X, Y) = 0
○ proof: COV(X, Y) = E(XY) - E(X)E(Y) = E(X)E(Y) - E(X)E(Y) = 0
○ because independence is a more stringent condition, even if COV (X, Y) = 0, it is not possible to conclude that X and Y are independent
⑥ characteristic 4. COV(aX + b, cY + d) = ac COV(X, Y)
⑦ characteristic 5. COV(a1 X1 + a2 X2, Y) = a1 COV(X1, Y) + a2 COV(X2, Y)
⑧ limitation: by characteristic 4, covariance contains both association and size information, so you cannot say only association
⑵ correlation coefficient: also referred to as Pearson correlation coefficient
① definition: about standard deviation X and Y, i.e. σx, σy each,
○ Multiple correlation coefficients: the representation of correlation coefficients when there are three or more variables
○ Complete correlation: ρ = 1
○ No correlation: ρ = 0
② background: to show only association information except size information. related to the limitation of covariance
③ Characteristics
○ Correlation between two variables measured on an interval or ratio scale.
○ Targeted towards continuous variables.
○ Assumption of normality.
○ Widely utilized in most cases.
④ characteristic 1. -1 ≤ ρ(X, Y) ≤ 1 (correlation inequality)
○ proof: Coshi-Schwarz inequality
○ ρ(X, Y) = 1: X and Y are fully proportional
○ ρ(X, Y) = -1: complete inverse relationship of X and Y
○ ρ(X, Y) = 0 does not mean X and Y are independent
○ exception 1. p(x) = ⅓ I{x = -1, 0, 1} , Y = X2
○ COV(X, Y) = E(XY) - E(X)E(Y) = E(XY) - E(X3) = 0
○ because p(1, 1) = ⅓, p(x = 1) = ⅓, p(y = 1) = ⅔, p(x, y) ≠ p(x) × p(y)
○ disagreements in the definition of independence
○ exception 2. S ={(x, y) | -1 ≤ x ≤ 1, x2 ≤ y ≤ x2 + 1/10}, p = 5 I {(x, y) ∈ S}
○ COV(X, Y) = E(XY) - E(X)E(Y) = E(XY) = 0
○ in the definition of independence, constant = p(x, y) = p(x) × p(y) should be met. however, p(y) is not constant
○ disagreements in the definition of independence
⑤ characteristic 2. ρ(X, X) = 1, ρ(X, -X) = -1
⑥ characteristic 3. ρ(X, Y) = ρ(Y, X)
⑦ characteristic 4. exclusion of size information: ρ(aX + b, cY + d) = ρ(X, Y)
○ proof: ρ(aX + b, cY + d) = COV(aX + b, cY + d) ÷ aσx ÷ cσy = COV(X, Y) ÷ σxσy = ρ(X, Y)
⑧ characteristic 5. association information : | ρ(X, Y) | = 1 and Y = aX + b, (a ≠ 0, b constant) are necessary and sufficient condition
○ proof of forward direction: The idea of setting Z comes from simple regression analysis
○ proof of reward direction
○ null hypothesis H0 : correlation coefficient = 0
○ alternative hypothesis H1 : correlation coefficient ≠ 0
○ calculation of t statistics: about the correlation coefficient r obtained from the sample,
○ the above statistics follow the student t distribution with a degree of freedom of n-2 (assuming the number of samples is n)
○ cor(x, y)
○ cor(x, y, method = “pearson”)
○ cor.test(x, y)
○ cor.test(x, y, method = “pearson”)
⑶ Spearman correlation coefficient
① definition: about x’ = rank(x) and y’ = rank(x),
② Characteristics
○ A method of measuring the correlation between two variables that are in ordinal scale.
○ A non-parametric method targeting ordinal variables.
○ Advantageous in data with many ties (zeroes).
○ Sensitive to deviations or errors within the data.
○ Tends to yield higher values than Kendall’s correlation coefficient.
○ cor(x, y, method = “spearman”)
○ cor.test(x, y, method = “spearman”)
⑷ Kendall correlation coefficient
① definition: defined about concordant pair and discordant pair
② Characteristics
○ A method of measuring the correlation between two variables that are in ordinal scale.
○ A non-parametric method targeting ordinal variables.
○ Advantageous in data with many ties (zeroes).
○ Useful when the sample size is small or when there are many tied values in the data.
③ Procedure
○ step 1. sort y values in ascending order for x values
○ step 2. for each yi, count the number of concordant pairs in which yj > yi (assuming j > i)
○ step 3. for each yi, count the number of discordant pairs in which yj < yi (assuming j > i)
○ step 4. define correlation coefficient as follows:
○ nc : total number of concordnat pairs
○ nd : total number of discordant pairs
○ n : size of x and y
○ cor(x, y, method = “kendall”)
○ cor.test(x, y, method = “kendall”)
⑸ χ2 : A measure of the suitability of the approximation
① If the measurement data is xm, ym, and the approximate function is f(x)
② Calculating the infinitesimal point through the differential of χ2 when obtaining an approximate function.
③ Use in non-linear regression such as quadratic approximation function
4. Anscombe’s quartet
⑴ showing that the mean, standard deviation, and correlation coefficient cannot describe the shape of a given data
⑵ example 1
⑶ example 2
5. Ordinal statistics
⑴ assumption : Xi and Xj are independent
⑵ definition : set Yi to be Y1 < ··· < Yn by rearranging X1, ···, and Xn
⑶ joint probability distribution
⑷ marginal probability distribution
⑸ expected value
6. Conditional statistics
⑴ conditional expectation
① definition
② characteristic
○ E(XY | Y) = YE(X | Y)
○ E(aX1 + bX2 | Y) = aE(X1 | Y) + b(X2 | Y)
③ law of iterated expectation
○ lemma
○ proof
○ example
when selecting a point of Y randomly at[0, ℓ] as a uniform distribution, and then a point of X randomly at [0, y] as a uniform distribution,
④ mean independence
○ independence ⊂ mean independence ⊂ uncorrelatedness
○ average independence
○ uncorrelatedness: if the correlation coefficient is 0
○ normal distribution: if X and Y are jointly normal and uncorrelated, then X and Y are independent
⑵ conditional variance
① definition: the conditional variance of Y for a given probability variable X
② law of total variance (decomposition of variance)
○ lemma
○ proof
○ meaning
○ situation: when X ~ P1(θ), Y ~ P2(X)
○ use P2 to calculate VAR(Y | X) and E(Y | X)
○ use P1 to calculate E{·}, VAR{·}
○ E(VAR(X | Y)) : intra-group variance
○ VAR(E(X | Y)) : inter-group variance
○ example
○ X : laid-off worker’s unemployment period
○ probability density function of X: exponential distribution
○ 20% of the total workforce: skilled labor force. λ = 0.4
○ 80% of the total workforce: unskilled workers. λ = 0.1
○ calculation of VAR(X)
Input : 2019.06.17 14:15