Chapter 17. Non-linear Regression Analysis
Higher category: 【Statistics】 Statistics Overview
2. polynomial regression model
5. interaction
1. quadratic regression model
⑴ formula
data:image/s3,"s3://crabby-images/02f68/02f68d906fbd281314e82ce5ca7091cd6a6307d8" alt="drawing"
⑵ determination of coefficients
① multiple linear regression model is used : consider Xi and Xi2 as different variables and interpret them
② it is possible because Xi and Xi2 do not have perfect multi-collinearity
⑶ linearity test
data:image/s3,"s3://crabby-images/49a15/49a15049e2134f84fdf7ed0e1aed789dc2c10e98" alt="drawing"
⑷ confidence interval of the amount of change
① effect : the effect of Y by a unit change of X is as follows
data:image/s3,"s3://crabby-images/b7781/b7781ab2c230863ddbf9b8c1d3f5712b28bbdab1" alt="drawing"
② marginal effect
data:image/s3,"s3://crabby-images/49855/498554f147c51a873a82335a2aa25dc992f055ea" alt="drawing"
③ standard deviation of the amount of change
data:image/s3,"s3://crabby-images/41147/41147ef170d5e0d96fffb84f84f540d0e77da59a" alt="drawing"
④ confidence interval of the amount of change
data:image/s3,"s3://crabby-images/4c732/4c732b18462c165712ab87db49b4401b71748455" alt="drawing"
2. polynomial regression model
⑴ general equation
data:image/s3,"s3://crabby-images/e4df8/e4df88a51db5faf8f9737557d02d1d198b872238" alt="drawing"
⑵ determination of coefficients : multiple linear regression model is used
⑶ linearity test
data:image/s3,"s3://crabby-images/b04bf/b04bf0b55106bc93e3e5dadff82382a2278ec432" alt="drawing"
⑷ decision of degree (order) of polynomial 1. top-down method
① the most commonly adopted method
② 1st. set the maximum value of r
③ 2nd. test H0 : βr = 0
④ 3rd. if H0 is rejected, r is the degree of regression line
⑤ 4th. if H0 is not rejected, eliminate Xir and repeat 2nd step for βr-1, ···
⑸ decision of degree (order) of polynomial 2. bottom-top method
① a way to see if there is a significant effect on explaining a given sample when a term with a higher order of one level is added
② procedure
○ 1st. assume that the coefficient for all terms is significant up to the (r-1)-order polynomial of in the botom-top manner
○ 2nd. add a r-order term
○ 3rd. calculate the sum of squares by the r-order regression line (degree of freedom: r)
○ 4th. subtract the sum of squares by the (r-1)-order regression line (degree of freedom: r-1) from the value obtained from 3rd step
○ 5th. calculate the sum of squares by the residual of the r-order regression line (degree of freedom: n-1-r)
○ 6th. calculate the mean square by dividing the value obtained from 5th step by (n-1-r)
○ 7th. divide the difference of the sum of squares obtained from 4th step by the mean square obtained from 6th step: the degree of freedom of the difference of the sum of squares is 1
○ 8th. calculate p value by substituting F statistic obtained in 7th step from F(1, n-1-r)
③ example
○ problem situation
model | sum of squares | df | mean square |
---|---|---|---|
linear | 3971.46 | 1 | 3971.46 |
error | 372515.09 | 18 | 20695.28 |
quadratic | 367833.58 | 2 | 183916.79 |
error | 8652.97 | 17 | 509.10 |
cubic | 369211.71 | 3 | 123070.57 |
error | 7274.84 | 16 | 454.68 |
○ table of results
model | difference of sum of squares | df | sum of squares of residuals | df | mean of sum of squares of residuals | F ratio |
---|---|---|---|---|---|---|
quadratic | 367833.58 | 2 | 8652.97 | 17 | 509.10 | F1,17 = 714.72 |
linear | 3971.46 | 1 | p < 0.001 | |||
difference | 363862.12 | 1 | ||||
cubic | 369211.71 | 3 | 7274.84 | 16 | 454.68 | F1,16 = 3.03 |
quadratic | 367833.58 | 2 | NS | |||
difference | 1378.13 | 1 |
④ drawbacks: sequential type Ⅰ error accumulation is problematic
○ the study of statistics is to analyze them with a phenomenon that occurs at once
○ it’s very difficult to analyze a phenomenon of a certain probability by applying another phenomenon that has already appeared with a different probability
○ difficulty means that the statistic may not follow the F distribution
○ bottom-top method of degree determination is to analyze a phenomenon of a different probability in a phenomenon of a particular probability
○ the phenomenon of a particular probability refers to the (r-1)-order regression equation
○ the phenomenon of a different probability refers to the r-order regression equation
○ (note) it is impossible to clearly show the difference of sum of squares follows the F distribution
3. logarithm regression model
⑴ (note) logarithmic approximation
data:image/s3,"s3://crabby-images/b4db5/b4db539ea9ac46b20306fd56e6167db7baa4fe9d" alt="drawing"
⑵ class 1. linear-log model
① formula
data:image/s3,"s3://crabby-images/d4a23/d4a2367ac76e936b07eaa6e09c6a514f4aa5473c" alt="drawing"
② if Xi increases by 1%, Yi increases by 0.01β1
⑶ class 2. log-linear model
① formula
data:image/s3,"s3://crabby-images/f904c/f904cb863268718f5e86a880dc86015c018bf5aa" alt="drawing"
② if Xi increases by 1,Yi</sub increases by 100β1%
⑷ class **3.** log-log model
① formula
data:image/s3,"s3://crabby-images/a4c6d/a4c6d347d8cf2acc30ca50a7cbdcdab799fa7033" alt="drawing"
② if Xi increases by 1%, Yi increases by β1%
⑸ we can select the more appropriate model by compairing adjusted R2 between log-linear model and log-log model
⑹ it’s pointless to compare linear-log model with other two models, as the dependent variable of linear-log model differs
4. probability model: the case of dependent variable being a binary variable
⑴ linear probability model (LPM)
① formula
data:image/s3,"s3://crabby-images/a7ec2/a7ec27c4f9d01e671fdaec73b1abde78a2a01558" alt="drawing"
data:image/s3,"s3://crabby-images/89592/89592d2528d979899daedd14915e5fa08596df1a" alt="drawing"
② issue : the dependent variable does not always show values between 0 and 1
⑵ probit regression model
① overview : the most frequently used probability model
② formula
○ simple model
data:image/s3,"s3://crabby-images/7ad6a/7ad6ab6a8f39830dc8bcb7ffd28beb35d5db61c9" alt="drawing"
data:image/s3,"s3://crabby-images/d50f4/d50f46b3062f0530dd7221cc1d71fe6b4665ec7c" alt="drawing"
○ multiple model
data:image/s3,"s3://crabby-images/9f8ad/9f8ad24746652576db43dadcfa83d7b3abc8d3fb" alt="drawing"
③ effect
○ formula
data:image/s3,"s3://crabby-images/49f25/49f25208de813e827fe6ae8888b7924a266de507" alt="drawing"
○ marginal effect
data:image/s3,"s3://crabby-images/67fac/67facc8eeab4c696dede86860b48aad0efb78d34" alt="drawing"
④ statistical estimation
○ there is no exact form of function of the estimator of each coefficient : find the maximum likelihood estimator through numerical analysis
○ once obtained, the maximum likelihood estimator satisfies consistency and normality
⑶ logistic regression model
① formula
○ logistic function
data:image/s3,"s3://crabby-images/b0cf8/b0cf8a34e08551fe4f895f0ed6f1568f3138a14e" alt="drawing"
○ modeling: put the linear regression form of βx + β0 to logistic function, which is a kind of a linking function
data:image/s3,"s3://crabby-images/813d6/813d616b2b53866320136b3c0817537062e66a04" alt="drawing"
○ log-odd (logarithmic of odds ratio) : also called logit. A logarithmic conversion of the odds ratio. having a value from negative to positive infinity.
data:image/s3,"s3://crabby-images/5139b/5139ba95d9547dd24158d33f8c5b06503a7f54ce" alt="drawing"
data:image/s3,"s3://crabby-images/57067/570676a8d88a47243bc2ba15149b3e1830d66691" alt="drawing"
○ The logistic function is the inverse function of the logit function.
○ The logistic function transforms an input variable that takes values from negative infinity to positive infinity into an output variable that ranges from 0 to 1.
○ assume the independent variable is not ome-dimensional variable of xi but multidimensional variable of xi, and use the Bernoulli function
data:image/s3,"s3://crabby-images/19fd6/19fd6a3dbb34447a8d84248fdfa79aae8f4cbc49" alt="drawing"
○ definition of likelihood function of L(θ) and log likelihood function of ℓ(θ) : here, L(θ) is defined as a cross-entropy
data:image/s3,"s3://crabby-images/ebd16/ebd1616fbc2d93bfe1be1d01d03a7a9e0835688e" alt="drawing"
○ lemma : L(θ) and ℓ (θ) are convex function. the minimal solution is not local solution but global solution. proof is complicated
○ step 1. definition of gradient
data:image/s3,"s3://crabby-images/2520f/2520f8fe01090812b9716975b4a93b044cde7b2d" alt="drawing"
○ step 2. definition of Hessian matrix
data:image/s3,"s3://crabby-images/0ce6a/0ce6a71c62a663789c67742b5050651e4aee8628" alt="drawing"
○ step 3. get the Taylor series for θk for the secondary approximated equation and obtain the maximum solution θk+1 = θk + dk of the approximated equation
data:image/s3,"s3://crabby-images/7ab5a/7ab5ac711d32cf9710256179bc6671dc2abb81d9" alt="drawing"
data:image/s3,"s3://crabby-images/8810f/8810feb17c022e290ab6e264a335935e500b02cd" alt="drawing"
○ step 4. updating θk in a way of Newton-Raphson method reaches the global maximum
○ this is obtained by numerical analysis, but does not have the exact function form of the estimator of each coefficient.
③ idea for the proof of consistency (assuming symbols may differ from the above)
data:image/s3,"s3://crabby-images/4f5aa/4f5aa0f7cc6a6f41474b90af6a8aeb3438581c72" alt="drawing"
④ idea for the proof of asymptotic normality (assuming symbols may differ from the above)
data:image/s3,"s3://crabby-images/e319c/e319ce1b19fd39006b3f7df2669a223dbce98f95" alt="drawing"
⑤ application : multiclass classification
○ intoduction: as logistic regression is a binary classification, it cannot be applied directly to multiclass classification
○ method 1. performing 1 vs {2, 3} at first, and 2 vs 3 afterward
○ method 2. softmax function
○ definition
data:image/s3,"s3://crabby-images/4b322/4b322bf286e3061794ba37f445cf5d1c3b0074ee" alt="drawing"
○ softmax function in multiclass classification
data:image/s3,"s3://crabby-images/aedac/aedacdd735ca35f758cc54a543cc6d61946f8075" alt="drawing"
○ proof : logistic regression is a special example of softmax function
data:image/s3,"s3://crabby-images/8a972/8a97275f9cbda7a9f8fef5e880d6a44a18c0d0ba" alt="drawing"
⑷ Comparison of LPM, probit, and logistic function
① unable to compare coefficients between LPM, probit, and logistic because models are different
○ example 1. comparison between probit regression model and logistic regression model
○ showing very similar plots
data:image/s3,"s3://crabby-images/830b5/830b5af30869cba9eba0106680536b66ab6fbe88" alt="drawing"
○ the difference in coefficients is very significant: there is no mathematical meaning to this difference
data:image/s3,"s3://crabby-images/417d7/417d7f9f816640694c0754a397c2a033c9900b7b" alt="drawing"
⑸ Dirichlet regression model:
① Overview : Used for regression analysis while considering the topology (simplex) of the data.
data:image/s3,"s3://crabby-images/2f134/2f134a9484c5d1ed25781e9d1fb735759ab12ac1" alt="drawing"
</br>
② Sample space: A multi-dimensional vector representing the proportions or probabilities of each item
data:image/s3,"s3://crabby-images/4806b/4806bd14155499175f2bb818bf4e58d9963519a9" alt="drawing"
</br>
○ Non-negative data
○ Unit-sum
○ D: The number of components, i.e., the size of the dimension
○ d = D - 1
③ Dirichlet distribution: Gains attention because it can analyze the simplex.
④ Estimation of Dirichlet regression:
○ Aitchison (2003) first introduced the log-ratio transformation.
○ Log-likelihood function: Given n data points,
data:image/s3,"s3://crabby-images/25a1a/25a1aff9ed7f283dab410a68288c2728521c63ed" alt="drawing"
</br>
○ Zero data can be a problem in the log-likelihood function.
○ Solution 1: Replace zero data with very small nonzero values, as proposed by Palarea-Albaladejo, Martín-Fernández, and others.
○ Solution 2: Use a dual model that handles zero data separately, as proposed by Zadora, Scealy, Welsh, Stewart, Field, Bear, Billheimer, and others.
○ Solution 3: Use an improved regression model that robustly applies to zero data, as proposed by Tsagris, Stewart, and others.
5. interaction
⑴ modeling
① interaction regressor or interaction term is introduced
data:image/s3,"s3://crabby-images/56c34/56c3482a4ebb394c4f8bf8e28f78574f4ce9b904" alt="drawing"
② unable to compare coefficients with models without interaction terms
③ 3 or more multiple interactions can also be defined
⑵ effect: the change of Yi by a unit change of Xi is as follows
data:image/s3,"s3://crabby-images/aeee4/aeee4cd066511640764f3988cdf2573f9b1712d0" alt="drawing"
⑶ elasticity
① intuitively, it means the degree to which the absolute value of the slope is large
② in microeconomics, elasticity means the slope multiplied by (-1)
⑷ applicatio n: interaction of binary variables
① modeling
data:image/s3,"s3://crabby-images/382eb/382eb4652ebaafc8107b9a3a294e7e6f8f904f64" alt="drawing"
② effect : the effect of Y by a unic change of X is as follows
data:image/s3,"s3://crabby-images/93fda/93fda086c3139c12ee2709e24c3536cee84ab9fb" alt="drawing"
③ H0 : the proposition that Y is not affected by D can be tested by F statistic concerning β2 = β3 = 0. determinant check
④ H0 : the proposition that the effect of Y by a unit change of X is not affected by D can be tested by t statistic concerning β3 = 0
⑤ the entire regression line can be obtained by using the regression line of D = 0 and the regression line of D = 1
data:image/s3,"s3://crabby-images/64231/642316ecd4cfee1f6cea5cecda3641c2f656fd60" alt="drawing"
⑸ application: interaction of two binary variables (dummy variables)
① modeling
data:image/s3,"s3://crabby-images/a5feb/a5feb47e69b39a328b7f373dacb64890b47dace6" alt="drawing"
② knowing 2 × 2 table for D1, D2 can lead to regression line equation
Input : 2019.06.21 12:10