Read the new data set where students’ math achievement and IQ were collected across schools and countries. The nesting structure has three levels; that is, students:schools:countries.
dat1 <- read.table("lecture13ex1.csv", sep=",", header=TRUE)
See descriptive statistics of all variables within the data set.
library(psych)
describe(dat1)
## vars n mean sd median trimmed mad min
## countryid 1 96347 46.75 27.07 47.00 46.69 35.58 1.00
## schoolid 2 96347 2424.47 1391.43 2434.00 2430.31 1790.98 1.00
## studentid 3 96347 48174.00 27813.13 48174.00 48174.00 35711.39 1.00
## math 4 96347 53.05 10.19 53.00 53.07 10.38 10.00
## iq 5 96347 89.89 16.26 90.00 90.05 16.31 10.00
## private 6 96347 0.51 0.50 1.00 0.51 0.00 0.00
## quality 7 96347 53.37 12.64 55.80 54.26 12.16 27.00
## opportunity 8 96347 48.11 7.58 45.19 46.87 4.95 40.16
## max range skew kurtosis se
## countryid 93.00 92.00 0.01 -1.21 0.09
## schoolid 4787.00 4786.00 -0.03 -1.21 4.48
## studentid 96347.00 96346.00 0.00 -1.20 89.60
## math 97.00 87.00 -0.01 -0.08 0.03
## iq 156.00 146.00 -0.11 0.02 0.05
## private 1.00 1.00 -0.05 -2.00 0.00
## quality 78.20 51.20 -0.56 -0.58 0.04
## opportunity 69.79 29.63 1.31 0.84 0.02
Let’s start with the null model which math
is the
dependent variable. Note that (1|schoolid)
and
(1|countryid)
means that the random intercepts are varied
across schools and countries, respectively.
m10 <- lmer(math ~ 1 + (1|schoolid) + (1|countryid), data=dat1, REML=FALSE)
summary(m10)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: math ~ 1 + (1 | schoolid) + (1 | countryid)
## Data: dat1
##
## AIC BIC logLik deviance df.resid
## 687159.2 687197.1 -343575.6 687151.2 96343
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.3023 -0.6571 -0.0020 0.6547 4.3026
##
## Random effects:
## Groups Name Variance Std.Dev.
## schoolid (Intercept) 39.1543 6.2573
## countryid (Intercept) 0.3918 0.6259
## Residual 64.8436 8.0526
## Number of obs: 96347, groups: schoolid, 4787; countryid, 93
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 53.0481 0.1156 458.8
Next, check the null model where iq
is the dependent
variable.
m11 <- lmer(iq ~ 1 + (1|schoolid) + (1|countryid), data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
summary(m11)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: iq ~ 1 + (1 | schoolid) + (1 | countryid)
## Data: dat1
## Control: lmerControl(optimizer = "Nelder_Mead")
##
## AIC BIC logLik deviance df.resid
## 712240.0 712277.9 -356116.0 712232.0 96343
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.6916 -0.6557 0.0020 0.6563 3.9177
##
## Random effects:
## Groups Name Variance Std.Dev.
## schoolid (Intercept) 99.51 9.975
## countryid (Intercept) 82.70 9.094
## Residual 81.16 9.009
## Number of obs: 96347, groups: schoolid, 4787; countryid, 93
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 89.640 0.955 93.87
For the following analysis, iq
is subtracted by 100 and
divided by 15, which are the mean and standard deviation of the standard
IQ score. The transformation will help model easier to converge.
dat1$iqc <- (dat1$iq - 100)/15
Centering in three-level models is quite complicated. In this
example, two predictors will be centered: iqc
and
private
. iqc
is a level-1 predictor. First,
make the school means of IQ and deviation scores of IQ from school
means. This process is similar to those in two-level models.
dat1$aveschooliqc <- ave(dat1$iqc, dat1$schoolid)
dat1$diffschooliqc <- dat1$iqc - dat1$aveschooliqc
From the school means of IQ, the country means of IQ and school-level deviation scores of IQ from country means will be calculated. To find the country means, the school-level dataset is calculated.
dat1school <- dat1[!duplicated(dat1$schoolid),]
Next, calculate the country means of IQ and school-level deviation scores of IQ from country means.
dat1school$avecountryiqc <- ave(dat1school$aveschooliqc, dat1school$countryid)
dat1school$diffcountryiqc <- dat1school$aveschooliqc - dat1school$avecountryiqc
private
is a level-2 predictor. Thus, the centering must
be implemented in the school-level data.
dat1school$aveprivate <- ave(dat1school$private, dat1school$countryid)
dat1school$diffprivate <- dat1school$private - dat1school$aveprivate
Four new transformed variables in the school-level data will be attached in the original data. First, the school ID between both data are matched. The result will show that the school in each row of the original data is in which rows in the school-level data.
matchit <- match(dat1$schoolid, dat1school$schoolid)
Then, the result of the match
function is used to attach
transformed variables from the school-level data to the student-level
(original) data.
dat1$avecountryiqc <- dat1school[matchit, "avecountryiqc"]
dat1$diffcountryiqc <- dat1school[matchit, "diffcountryiqc"]
dat1$aveprivate <- dat1school[matchit, "aveprivate"]
dat1$diffprivate <- dat1school[matchit, "diffprivate"]
After this, many models centered and uncentered variables are shown to illustrate their differences.
First, only deviation-from-school IQ is used in the model. The slope represents the student-level effect (controlling schools) of IQ on math achievement.
m1xa1 <- lmer(math ~ 1 + diffschooliqc + (1|schoolid) + (1|countryid), data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
summary(m1xa1)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: math ~ 1 + diffschooliqc + (1 | schoolid) + (1 | countryid)
## Data: dat1
## Control: lmerControl(optimizer = "Nelder_Mead")
##
## AIC BIC logLik deviance df.resid
## 686606.8 686654.2 -343298.4 686596.8 96342
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.3126 -0.6554 -0.0010 0.6537 4.2581
##
## Random effects:
## Groups Name Variance Std.Dev.
## schoolid (Intercept) 39.1797 6.2594
## countryid (Intercept) 0.3918 0.6259
## Residual 64.4522 8.0282
## Number of obs: 96347, groups: schoolid, 4787; countryid, 93
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 53.04809 0.11562 458.80
## diffschooliqc 1.04171 0.04418 23.58
##
## Correlation of Fixed Effects:
## (Intr)
## diffscholqc 0.000
Next, the predictor is uncentered IQ. The slope represents all student-level (controlling schools), school-level (controlling countries), and country-levle effects. The slopes in three levels are equally constrained.
m1xa2 <- lmer(math ~ 1 + iqc + (1|schoolid) + (1|countryid), data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
summary(m1xa2)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: math ~ 1 + iqc + (1 | schoolid) + (1 | countryid)
## Data: dat1
## Control: lmerControl(optimizer = "Nelder_Mead")
##
## AIC BIC logLik deviance df.resid
## 686654.0 686701.4 -343322.0 686644.0 96342
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.3186 -0.6556 -0.0011 0.6537 4.2594
##
## Random effects:
## Groups Name Variance Std.Dev.
## schoolid (Intercept) 39.3368 6.2719
## countryid (Intercept) 0.7775 0.8818
## Residual 64.4556 8.0284
## Number of obs: 96347, groups: schoolid, 4787; countryid, 93
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 53.67777 0.13561 395.83
## iqc 0.93657 0.04139 22.63
##
## Correlation of Fixed Effects:
## (Intr)
## iqc 0.207
Next, deviation-from-school IQ, deviation-from-country IQ, and country IQ means are the predictors. The slope of deviation-from-school IQ represents the student-level effect (controlling schools) of IQ on math achievement. The slope of deviation-from-country IQ represents the school-level effect (controlling countries) of IQ on math achievement. The slope of country IQ means represents the country-level effect of IQ on math achievement.
m1xb1 <- lmer(math ~ 1 + diffschooliqc + diffcountryiqc + avecountryiqc + (1|schoolid) + (1|countryid), data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
summary(m1xb1)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: math ~ 1 + diffschooliqc + diffcountryiqc + avecountryiqc + (1 |
## schoolid) + (1 | countryid)
## Data: dat1
## Control: lmerControl(optimizer = "Nelder_Mead")
##
## AIC BIC logLik deviance df.resid
## 686605.8 686672.1 -343295.9 686591.8 96340
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.3125 -0.6553 -0.0010 0.6537 4.2579
##
## Random effects:
## Groups Name Variance Std.Dev.
## schoolid (Intercept) 39.1370 6.2560
## countryid (Intercept) 0.3898 0.6243
## Residual 64.4520 8.0282
## Number of obs: 96347, groups: schoolid, 4787; countryid, 93
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 52.97429 0.17251 307.075
## diffschooliqc 1.04171 0.04418 23.581
## diffcountryiqc 0.30603 0.14078 2.174
## avecountryiqc -0.10970 0.19023 -0.577
##
## Correlation of Fixed Effects:
## (Intr) dffsch dffcnt
## diffscholqc 0.000
## diffcntryqc -0.001 0.000
## avecontryqc 0.743 0.000 -0.002
Next, deviation-from-school IQ, school IQ means, and country IQ means are the predictors. The slope of deviation-from-school IQ represents the student-level effect (controlling schools) of IQ on math achievement. The slope of school IQ means still represents the school-level effect (controlling countries) of IQ on math achievement. However, the slope of country IQ means represents the country-level effect subtracted by the school-level effect.
m1xb2 <- lmer(math ~ 1 + diffschooliqc + aveschooliqc + avecountryiqc + (1|schoolid) + (1|countryid), data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
summary(m1xb2)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: math ~ 1 + diffschooliqc + aveschooliqc + avecountryiqc + (1 |
## schoolid) + (1 | countryid)
## Data: dat1
## Control: lmerControl(optimizer = "Nelder_Mead")
##
## AIC BIC logLik deviance df.resid
## 686605.8 686672.1 -343295.9 686591.8 96340
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.3125 -0.6553 -0.0010 0.6537 4.2579
##
## Random effects:
## Groups Name Variance Std.Dev.
## schoolid (Intercept) 39.1370 6.2560
## countryid (Intercept) 0.3898 0.6243
## Residual 64.4520 8.0282
## Number of obs: 96347, groups: schoolid, 4787; countryid, 93
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 52.97429 0.17251 307.075
## diffschooliqc 1.04171 0.04418 23.581
## aveschooliqc 0.30603 0.14078 2.174
## avecountryiqc -0.41573 0.23685 -1.755
##
## Correlation of Fixed Effects:
## (Intr) dffsch avschl
## diffscholqc 0.000
## aveschoolqc -0.001 0.000
## avecontryqc 0.597 0.000 -0.596
Next, student IQ scores, school IQ means, and country IQ means are the predictors. The slope of student IQ scores represents the student-level effect (controlling schools) of IQ on math achievement. The slope of school IQ means represents the school-level effect subtracted by the student-level effect. The slope of country IQ means represents the country-level effect subtracted by the school-level effect.
m1xb3 <- lmer(math ~ 1 + iqc + aveschooliqc + avecountryiqc + (1|schoolid) + (1|countryid), data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
summary(m1xb3)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: math ~ 1 + iqc + aveschooliqc + avecountryiqc + (1 | schoolid) +
## (1 | countryid)
## Data: dat1
## Control: lmerControl(optimizer = "Nelder_Mead")
##
## AIC BIC logLik deviance df.resid
## 686605.8 686672.1 -343295.9 686591.8 96340
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.3125 -0.6553 -0.0010 0.6537 4.2579
##
## Random effects:
## Groups Name Variance Std.Dev.
## schoolid (Intercept) 39.1370 6.2560
## countryid (Intercept) 0.3898 0.6243
## Residual 64.4520 8.0282
## Number of obs: 96347, groups: schoolid, 4787; countryid, 93
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 52.97429 0.17251 307.075
## iqc 1.04171 0.04418 23.581
## aveschooliqc -0.73568 0.14755 -4.986
## avecountryiqc -0.41573 0.23685 -1.755
##
## Correlation of Fixed Effects:
## (Intr) iqc avschl
## iqc 0.000
## aveschoolqc -0.001 -0.299
## avecontryqc 0.597 0.000 -0.568
Let’s illustrate the impact of centering on level-2 predictors.
Initially, the deviation-from-country calculated from
private
is used. The slope represents the school-level
effect (controlling countries) of type of schools on math
achievement.
m1wa1 <- lmer(math ~ 1 + diffprivate + (1|schoolid) + (1|countryid), data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
summary(m1wa1)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: math ~ 1 + diffprivate + (1 | schoolid) + (1 | countryid)
## Data: dat1
## Control: lmerControl(optimizer = "Nelder_Mead")
##
## AIC BIC logLik deviance df.resid
## 683548.4 683595.8 -341769.2 683538.4 96342
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.2685 -0.6573 -0.0024 0.6568 4.2666
##
## Random effects:
## Groups Name Variance Std.Dev.
## schoolid (Intercept) 15.895 3.9868
## countryid (Intercept) 0.836 0.9143
## Residual 64.847 8.0528
## Number of obs: 96347, groups: schoolid, 4787; countryid, 93
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 53.0230 0.1152 460.42
## diffprivate 9.6575 0.1307 73.87
##
## Correlation of Fixed Effects:
## (Intr)
## diffprivate -0.001
Next, the original private
variable is the predictor.
The slope represents both school-level effect (controlling countries)
and country-level effect of type of schools on math achievement. The
effects on both levels are equally constrained.
m1wa2 <- lmer(math ~ 1 + private + (1|schoolid) + (1|countryid), data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
summary(m1wa2)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: math ~ 1 + private + (1 | schoolid) + (1 | countryid)
## Data: dat1
## Control: lmerControl(optimizer = "Nelder_Mead")
##
## AIC BIC logLik deviance df.resid
## 683514.3 683561.7 -341752.2 683504.3 96342
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.2663 -0.6575 -0.0015 0.6565 4.2681
##
## Random effects:
## Groups Name Variance Std.Dev.
## schoolid (Intercept) 15.8884 3.9860
## countryid (Intercept) 0.4717 0.6868
## Residual 64.8470 8.0528
## Number of obs: 96347, groups: schoolid, 4787; countryid, 93
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 48.1443 0.1169 411.67
## private 9.6409 0.1300 74.15
##
## Correlation of Fixed Effects:
## (Intr)
## private -0.564
Next, the deviation-from-country calculated from private
and the country means of private
are used as predictors.
The first slope represents the school-level effect (controlling
countries) of type of schools on math achievement. The second slope
represents the country-level effect.
m1wb1 <- lmer(math ~ 1 + diffprivate + aveprivate + (1|schoolid) + (1|countryid), data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
summary(m1wb1)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: math ~ 1 + diffprivate + aveprivate + (1 | schoolid) + (1 | countryid)
## Data: dat1
## Control: lmerControl(optimizer = "Nelder_Mead")
##
## AIC BIC logLik deviance df.resid
## 683514.7 683571.6 -341751.4 683502.7 96341
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.2663 -0.6578 -0.0015 0.6567 4.2683
##
## Random effects:
## Groups Name Variance Std.Dev.
## schoolid (Intercept) 15.8884 3.9860
## countryid (Intercept) 0.4561 0.6753
## Residual 64.8471 8.0528
## Number of obs: 96347, groups: schoolid, 4787; countryid, 93
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 48.9286 0.6335 77.230
## diffprivate 9.6579 0.1307 73.883
## aveprivate 8.0930 1.2364 6.546
##
## Correlation of Fixed Effects:
## (Intr) dffprv
## diffprivate -0.001
## aveprivate -0.989 0.000
Next, the original private
variable and the country
means of private
are the predictors. The first slope
represents both school-level effect (controlling countries) and
country-level effect of type of schools on math achievement. The second
slope represent the country-level effect subtracted by the school-level
effect.
m1wb2 <- lmer(math ~ 1 + private + aveprivate + (1|schoolid) + (1|countryid), data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
summary(m1wb2)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: math ~ 1 + private + aveprivate + (1 | schoolid) + (1 | countryid)
## Data: dat1
## Control: lmerControl(optimizer = "Nelder_Mead")
##
## AIC BIC logLik deviance df.resid
## 683514.7 683571.6 -341751.4 683502.7 96341
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.2663 -0.6578 -0.0015 0.6567 4.2683
##
## Random effects:
## Groups Name Variance Std.Dev.
## schoolid (Intercept) 15.8884 3.9860
## countryid (Intercept) 0.4561 0.6753
## Residual 64.8471 8.0528
## Number of obs: 96347, groups: schoolid, 4787; countryid, 93
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 48.9286 0.6335 77.230
## private 9.6579 0.1307 73.883
## aveprivate -1.5648 1.2432 -1.259
##
## Correlation of Fixed Effects:
## (Intr) privat
## private -0.001
## aveprivate -0.983 -0.105
Let’s combine all school-mean and country-mean centered variables. Also, add country’s educational quality and country’s educational opportunity indices in the model. The full random intercept model is as follows:
dat1country <- dat1[!duplicated(dat1$countryid),]
apply(dat1country, 2, mean)
## countryid schoolid studentid math iq
## 47.0000000 2412.3763441 47916.0000000 48.9354839 88.6129032
## private quality opportunity iqc aveschooliqc
## 0.0000000 53.3268817 47.9759140 -0.7591398 -0.7138807
## diffschooliqc avecountryiqc diffcountryiqc aveprivate diffprivate
## -0.0452591 -0.6911782 -0.0227025 0.5058572 -0.5058572
m13 <- lmer(math ~ 1 + diffschooliqc + diffcountryiqc + I(avecountryiqc+0.691) + diffprivate + I(aveprivate-0.506) + I(quality-53.33) + I(opportunity-47.98) + (1|schoolid) + (1|countryid), data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
summary(m13)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: math ~ 1 + diffschooliqc + diffcountryiqc + I(avecountryiqc +
## 0.691) + diffprivate + I(aveprivate - 0.506) + I(quality -
## 53.33) + I(opportunity - 47.98) + (1 | schoolid) + (1 | countryid)
## Data: dat1
## Control: lmerControl(optimizer = "Nelder_Mead")
##
## AIC BIC logLik deviance df.resid
## 682935.3 683039.5 -341456.7 682913.3 96336
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.2348 -0.6588 -0.0024 0.6569 4.2182
##
## Random effects:
## Groups Name Variance Std.Dev.
## schoolid (Intercept) 15.8440 3.9804
## countryid (Intercept) 0.3116 0.5582
## Residual 64.4547 8.0284
## Number of obs: 96347, groups: schoolid, 4787; countryid, 93
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 53.026113 0.087048 609.157
## diffschooliqc 1.041705 0.044177 23.580
## diffcountryiqc 0.405794 0.095543 4.247
## I(avecountryiqc + 0.691) -0.514889 0.218560 -2.356
## diffprivate 9.665964 0.130507 74.065
## I(aveprivate - 0.506) 7.904676 1.142031 6.922
## I(quality - 53.33) 0.007607 0.014989 0.508
## I(opportunity - 47.98) 0.056968 0.021359 2.667
##
## Correlation of Fixed Effects:
## (Intr) dffsch dffcnt I(+0.6 dffprv I(-0.5 I(-53.
## diffscholqc 0.000
## diffcntryqc 0.001 0.000
## I(vc+0.691) -0.023 0.000 -0.003
## diffprivate -0.001 0.000 0.015 0.000
## I(vp-0.506) -0.012 0.000 0.000 0.127 0.001
## I(ql-53.33) 0.012 0.000 0.001 -0.518 0.001 0.010
## I(pp-47.98) -0.008 0.000 0.000 -0.012 -0.001 -0.041 -0.711
Next, let’s check whether the student-level effect
(diffschooliqc
) on math is random across
schools. See that the only differences between two
models in anova
is
(1 + diffschooliqc|schoolid)
. The effect was not
significant.
m14 <- lmer(math ~ 1 + diffschooliqc + diffcountryiqc + I(avecountryiqc+0.691) + diffprivate + I(aveprivate-0.506) + I(quality-53.33) + I(opportunity-47.98) + (1 + diffschooliqc|schoolid) + (1|countryid), data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead", calc.derivs=FALSE))
## boundary (singular) fit: see help('isSingular')
anova(m13, m14)
## Data: dat1
## Models:
## m13: math ~ 1 + diffschooliqc + diffcountryiqc + I(avecountryiqc + 0.691) + diffprivate + I(aveprivate - 0.506) + I(quality - 53.33) + I(opportunity - 47.98) + (1 | schoolid) + (1 | countryid)
## m14: math ~ 1 + diffschooliqc + diffcountryiqc + I(avecountryiqc + 0.691) + diffprivate + I(aveprivate - 0.506) + I(quality - 53.33) + I(opportunity - 47.98) + (1 + diffschooliqc | schoolid) + (1 | countryid)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## m13 11 682935 683040 -341457 682913
## m14 13 682939 683062 -341457 682913 0.2966 2 0.8622
Next, check whether the student-level effect
(diffschooliqc
) on math is random across
countries. See that the only differences between two
models in anova
is
(1 + diffschooliqc|countryid)
. The effect was
significant.
m15 <- lmer(math ~ 1 + diffschooliqc + diffcountryiqc + I(avecountryiqc+0.691) + diffprivate + I(aveprivate-0.506) + I(quality-53.33) + I(opportunity-47.98) + (1|schoolid) + (1 + diffschooliqc|countryid), data=dat1, REML=FALSE)
anova(m13, m15)
## Data: dat1
## Models:
## m13: math ~ 1 + diffschooliqc + diffcountryiqc + I(avecountryiqc + 0.691) + diffprivate + I(aveprivate - 0.506) + I(quality - 53.33) + I(opportunity - 47.98) + (1 | schoolid) + (1 | countryid)
## m15: math ~ 1 + diffschooliqc + diffcountryiqc + I(avecountryiqc + 0.691) + diffprivate + I(aveprivate - 0.506) + I(quality - 53.33) + I(opportunity - 47.98) + (1 | schoolid) + (1 + diffschooliqc | countryid)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## m13 11 682935 683040 -341457 682913
## m15 13 682914 683038 -341444 682888 24.951 2 3.819e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Next, check whether the school-level effect
(diffcountryiqc
) on math is random across
countries. The last model, m15
, is used as
the restricted model. The full model added the random effect of
diffcountryiqc
across countries, which shown in
(1 + diffschooliqc + diffcountryiqc|countryid)
. The effect
was not significant. Then, m15
is retained.
m15 <- lmer(math ~ 1 + diffschooliqc + diffcountryiqc + I(avecountryiqc+0.691) + diffprivate + I(aveprivate-0.506) + I(quality-53.33) + I(opportunity-47.98) + (1|schoolid) + (1 + diffschooliqc|countryid), data=dat1, REML=FALSE)
anova(m13, m15)
## Data: dat1
## Models:
## m13: math ~ 1 + diffschooliqc + diffcountryiqc + I(avecountryiqc + 0.691) + diffprivate + I(aveprivate - 0.506) + I(quality - 53.33) + I(opportunity - 47.98) + (1 | schoolid) + (1 | countryid)
## m15: math ~ 1 + diffschooliqc + diffcountryiqc + I(avecountryiqc + 0.691) + diffprivate + I(aveprivate - 0.506) + I(quality - 53.33) + I(opportunity - 47.98) + (1 | schoolid) + (1 + diffschooliqc | countryid)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## m13 11 682935 683040 -341457 682913
## m15 13 682914 683038 -341444 682888 24.951 2 3.819e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Next, check whether the school-level effect of type
of schools (diffprivate
) on math is random across
countries. m15
is used as the restricted
model. The full model added the random effect of
diffprivate
across countries, which shown in
(1 + diffschooliqc + diffprivate|countryid)
. The effect was
significant. Then, the full model, m17
, is retained.
m17 <- lmer(math ~ 1 + diffschooliqc + diffcountryiqc + I(avecountryiqc+0.691) + diffprivate + I(aveprivate-0.506) + I(quality-53.33) + I(opportunity-47.98) + (1|schoolid) + (1 + diffschooliqc + diffprivate|countryid), data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
anova(m15, m17)
## Data: dat1
## Models:
## m15: math ~ 1 + diffschooliqc + diffcountryiqc + I(avecountryiqc + 0.691) + diffprivate + I(aveprivate - 0.506) + I(quality - 53.33) + I(opportunity - 47.98) + (1 | schoolid) + (1 + diffschooliqc | countryid)
## m17: math ~ 1 + diffschooliqc + diffcountryiqc + I(avecountryiqc + 0.691) + diffprivate + I(aveprivate - 0.506) + I(quality - 53.33) + I(opportunity - 47.98) + (1 | schoolid) + (1 + diffschooliqc + diffprivate | countryid)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## m15 13 682914 683038 -341444 682888
## m17 16 682893 683045 -341430 682861 27.361 3 4.944e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Check the output of the final random-slope model.
summary(m17)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: math ~ 1 + diffschooliqc + diffcountryiqc + I(avecountryiqc +
## 0.691) + diffprivate + I(aveprivate - 0.506) + I(quality -
## 53.33) + I(opportunity - 47.98) + (1 | schoolid) + (1 + diffschooliqc +
## diffprivate | countryid)
## Data: dat1
## Control: lmerControl(optimizer = "Nelder_Mead")
##
## AIC BIC logLik deviance df.resid
## 682893.0 683044.6 -341430.5 682861.0 96331
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.2371 -0.6574 -0.0020 0.6555 4.2284
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## schoolid (Intercept) 15.4898 3.9357
## countryid (Intercept) 0.3185 0.5643
## diffschooliqc 0.1646 0.4057 -0.13
## diffprivate 1.4769 1.2153 -0.01 0.36
## Residual 64.3947 8.0246
## Number of obs: 96347, groups: schoolid, 4787; countryid, 93
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 53.025482 0.087039 609.214
## diffschooliqc 1.050549 0.061301 17.137
## diffcountryiqc 0.398762 0.095029 4.196
## I(avecountryiqc + 0.691) -0.526992 0.218097 -2.416
## diffprivate 9.674217 0.181380 53.337
## I(aveprivate - 0.506) 8.034948 1.139680 7.050
## I(quality - 53.33) 0.009231 0.014959 0.617
## I(opportunity - 47.98) 0.056229 0.021321 2.637
##
## Correlation of Fixed Effects:
## (Intr) dffsch dffcnt I(+0.6 dffprv I(-0.5 I(-53.
## diffscholqc -0.061
## diffcntryqc 0.001 0.000
## I(vc+0.691) -0.022 -0.002 -0.003
## diffprivate -0.007 0.172 0.010 0.000
## I(vp-0.506) -0.012 -0.001 0.000 0.127 0.000
## I(ql-53.33) 0.012 0.001 0.001 -0.518 0.001 0.010
## I(pp-47.98) -0.008 -0.001 0.000 -0.012 -0.001 -0.042 -0.711
Only one random effect is in the school level, which is the random intercept. However, three random effects are in the country level: (a) random intercepts, (b) the effect of student-level IQ, and (c) the effect of school-level type of schools. The correlation between random effects are plotted here. Note that the random intercepts are the adjusted means of math achievement when the education quality and opportunity of a country are equal to their grand means.
ranefm17 <- ranef(m17)
names(ranefm17)
## [1] "schoolid" "countryid"
ranefm17country <- ranefm17$countryid
plot(ranefm17country)
Because the random slopes are only in the country level. Thus, four country-level predictors are used to predict both random slopes. As a result, there are eight cross-level interactions.
m18 <- lmer(math ~ 1 + diffschooliqc + diffcountryiqc + I(avecountryiqc+0.691) + diffprivate + I(aveprivate-0.506) + I(quality-53.33) + I(opportunity-47.98) + diffschooliqc*I(avecountryiqc+0.691) + diffschooliqc*I(aveprivate-0.506) + diffschooliqc*I(quality-53.33) + diffschooliqc*I(opportunity-47.98) + diffprivate*I(avecountryiqc+0.691) + diffprivate*I(aveprivate-0.506) + diffprivate*I(quality-53.33) + diffprivate*I(opportunity-47.98) + (1|schoolid) + (1 + diffschooliqc + diffprivate|countryid), data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
## boundary (singular) fit: see help('isSingular')
summary(m18)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: math ~ 1 + diffschooliqc + diffcountryiqc + I(avecountryiqc +
## 0.691) + diffprivate + I(aveprivate - 0.506) + I(quality -
## 53.33) + I(opportunity - 47.98) + diffschooliqc * I(avecountryiqc +
## 0.691) + diffschooliqc * I(aveprivate - 0.506) + diffschooliqc *
## I(quality - 53.33) + diffschooliqc * I(opportunity - 47.98) +
## diffprivate * I(avecountryiqc + 0.691) + diffprivate * I(aveprivate -
## 0.506) + diffprivate * I(quality - 53.33) + diffprivate *
## I(opportunity - 47.98) + (1 | schoolid) + (1 + diffschooliqc +
## diffprivate | countryid)
## Data: dat1
## Control: lmerControl(optimizer = "Nelder_Mead")
##
## AIC BIC logLik deviance df.resid
## 682910.1 683137.5 -341431.1 682862.1 96323
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.2370 -0.6578 -0.0016 0.6570 4.2232
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## schoolid (Intercept) 15.62031 3.9523
## countryid (Intercept) 0.19396 0.4404
## diffschooliqc 0.07027 0.2651 -1.00
## diffprivate 1.20862 1.0994 -0.22 0.22
## Residual 64.41749 8.0261
## Number of obs: 96347, groups: schoolid, 4787; countryid, 93
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 53.025569 0.078997 671.238
## diffschooliqc 1.051519 0.052241 20.128
## diffcountryiqc 0.397736 0.095326 4.172
## I(avecountryiqc + 0.691) -0.513586 0.198817 -2.583
## diffprivate 9.678679 0.173578 55.760
## I(aveprivate - 0.506) 7.926608 1.038274 7.634
## I(quality - 53.33) 0.007056 0.013614 0.518
## I(opportunity - 47.98) 0.057469 0.019334 2.972
## diffschooliqc:I(avecountryiqc + 0.691) -0.155255 0.131403 -1.182
## diffschooliqc:I(aveprivate - 0.506) 1.381428 0.683646 2.021
## diffschooliqc:I(quality - 53.33) 0.021002 0.008970 2.341
## diffschooliqc:I(opportunity - 47.98) -0.012144 0.012723 -0.955
## I(avecountryiqc + 0.691):diffprivate -0.190015 0.436370 -0.435
## diffprivate:I(aveprivate - 0.506) 2.104978 2.305515 0.913
## diffprivate:I(quality - 53.33) 0.053550 0.029953 1.788
## diffprivate:I(opportunity - 47.98) -0.105116 0.042527 -2.472
##
## Correlation matrix not shown by default, as p = 16 > 12.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
## optimizer (Nelder_Mead) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
Unfortunately, the warning, boundary (singular) fit
, is
shown. It means that at least a pair of random effects have a perfect
correlation. The result is not trustworthy. Usually, a random effect
should be dropped. However, in this case, the random effect of
student-level IQ is predicted by two school-level predictors. The
perfect random effect is not shown even though the two added cross-level
interactions were not significant. Thus, this model is used for probing
interactions.
m19 <- lmer(math ~ 1 + diffschooliqc + diffcountryiqc + I(avecountryiqc+0.691) + diffprivate + I(aveprivate-0.506) + I(quality-53.33) + I(opportunity-47.98) + diffschooliqc*diffcountryiqc + diffschooliqc*I(avecountryiqc+0.691) + diffschooliqc*diffprivate + diffschooliqc*I(aveprivate-0.506) + diffschooliqc*I(quality-53.33) + diffschooliqc*I(opportunity-47.98) + diffprivate*I(avecountryiqc+0.691) + diffprivate*I(aveprivate-0.506) + diffprivate*I(quality-53.33) + diffprivate*I(opportunity-47.98) + (1|schoolid) + (1 + diffschooliqc + diffprivate|countryid), data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
summary(m19)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: math ~ 1 + diffschooliqc + diffcountryiqc + I(avecountryiqc +
## 0.691) + diffprivate + I(aveprivate - 0.506) + I(quality -
## 53.33) + I(opportunity - 47.98) + diffschooliqc * diffcountryiqc +
## diffschooliqc * I(avecountryiqc + 0.691) + diffschooliqc *
## diffprivate + diffschooliqc * I(aveprivate - 0.506) + diffschooliqc *
## I(quality - 53.33) + diffschooliqc * I(opportunity - 47.98) +
## diffprivate * I(avecountryiqc + 0.691) + diffprivate * I(aveprivate -
## 0.506) + diffprivate * I(quality - 53.33) + diffprivate *
## I(opportunity - 47.98) + (1 | schoolid) + (1 + diffschooliqc +
## diffprivate | countryid)
## Data: dat1
## Control: lmerControl(optimizer = "Nelder_Mead")
##
## AIC BIC logLik deviance df.resid
## 682897.1 683143.5 -341422.6 682845.1 96321
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.2382 -0.6581 -0.0022 0.6558 4.2254
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## schoolid (Intercept) 15.4932 3.9361
## countryid (Intercept) 0.3176 0.5635
## diffschooliqc 0.1359 0.3687 -0.13
## diffprivate 1.2304 1.1092 -0.01 0.34
## Residual 64.3936 8.0246
## Number of obs: 96347, groups: schoolid, 4787; countryid, 93
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 53.025597 0.086988 609.575
## diffschooliqc 1.051312 0.058743 17.897
## diffcountryiqc 0.397862 0.095013 4.187
## I(avecountryiqc + 0.691) -0.513380 0.218358 -2.351
## diffprivate 9.678933 0.173924 55.650
## I(aveprivate - 0.506) 7.924078 1.141082 6.944
## I(quality - 53.33) 0.007428 0.014977 0.496
## I(opportunity - 47.98) 0.057172 0.021349 2.678
## diffschooliqc:diffcountryiqc 0.028850 0.066203 0.436
## diffschooliqc:I(avecountryiqc + 0.691) -0.161863 0.147438 -1.098
## diffschooliqc:diffprivate 0.073798 0.089377 0.826
## diffschooliqc:I(aveprivate - 0.506) 1.370775 0.767711 1.786
## diffschooliqc:I(quality - 53.33) 0.021495 0.010087 2.131
## diffschooliqc:I(opportunity - 47.98) -0.012237 0.014372 -0.851
## I(avecountryiqc + 0.691):diffprivate -0.190150 0.437168 -0.435
## diffprivate:I(aveprivate - 0.506) 2.104055 2.309722 0.911
## diffprivate:I(quality - 53.33) 0.053836 0.030011 1.794
## diffprivate:I(opportunity - 47.98) -0.105363 0.042618 -2.472
##
## Correlation matrix not shown by default, as p = 18 > 12.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
Before probing interactions, analyze the new model without
I()
.
dat1$avecountryiqcc <- dat1$avecountryiqc + 0.691
dat1$aveprivatec <- dat1$aveprivate - 0.506
dat1$qualityc <- dat1$quality - 53.33
dat1$opportunityc <- dat1$opportunity - 47.98
m20 <- lmer(math ~ 1 + diffschooliqc + diffcountryiqc + avecountryiqcc + diffprivate + aveprivatec + qualityc + opportunityc + diffschooliqc*diffcountryiqc + diffschooliqc*avecountryiqcc + diffschooliqc*diffprivate + diffschooliqc*aveprivatec + diffschooliqc*qualityc + diffschooliqc*opportunityc + diffprivate*avecountryiqcc + diffprivate*aveprivatec + diffprivate*qualityc + diffprivate*opportunityc + (1|schoolid) + (1 + diffschooliqc + diffprivate|countryid), data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
Let’s pick values for probing interactions. The M, M -
SD, and M + SD are used fordiffschooliqc
(using student-level SD), qualityc
, and
opportunityc
(using country-level SDs).
mdiq <- mean(dat1$diffschooliqc)
sddiq <- sd(dat1$diffschooliqc)
mq <- mean(dat1country$quality)
sdq <- sd(dat1country$quality)
mo <- mean(dat1country$opportunity)
sdo <- sd(dat1country$opportunity)
diffschooliqcval <- c(mdiq - sddiq, mdiq, mdiq + sddiq)
qualitycval <- c(mq - sdq, mq, mq + sdq) - 53.33
opportunitycval <- c(mo - sdo, mo, mo + sdo) - 47.98
The interactions
package is usually used for probing
interactions. However, the sim_slopes
function took very
long time to run the analysis (longer than 12 hours). I wrote a function
to analyze simple slopes with much shorter time. To be honest, I am
still not sure why the sim_slopes
function took so long. To
use my function, the quicksimslopeslme4.R
file must be
placed in the working directory. Then, run the following function to
read the function in the file.
source("quicksimslopeslme4.R")
My function is called quick_sim_slopes
. The features are
not as many as the sim_slopes
function but it saved a lot
of time. First, investigate the interaction between the student-level
effect of IQ and the country’s educational quality. Check the slopes of
the student-level effect of IQ at each level of country’s educational
quality.
quick_sim_slopes(model=m20, pred="diffschooliqc", modx="qualityc", modx.values=qualitycval)
## $`Conditional Intercept`
## modx.values est se z p
## 1 -12.61890497 52.93187 0.20710909 255.5748 0
## 2 -0.00311828 53.02557 0.08698733 609.5781 0
## 3 12.61266841 53.11928 0.20891541 254.2621 0
##
## $`Conditional Slope`
## modx.values est se z p
## 1 -12.61890497 0.7800707 0.13869915 5.624192 0
## 2 -0.00311828 1.0512453 0.05874261 17.895790 0
## 3 12.61266841 1.3224199 0.14160476 9.338810 0
Plot the student-level effect of IQ at each level of country’s
educational quality using the interactions
package.
library(interactions)
interact_plot(model=m20, pred="diffschooliqc", modx="qualityc", modx.values=qualitycval)
Next, check the slopes of the country’s educational quality at each level of student’s IQ deviation from school means.
quick_sim_slopes(model=m20, pred="qualityc", modx="diffschooliqc", modx.values=diffschooliqcval)
## $`Conditional Intercept`
## modx.values est se z p
## 1 -5.854771e-01 52.41008 0.09539799 549.3835 0
## 2 -5.835356e-19 53.02560 0.08698788 609.5746 0
## 3 5.854771e-01 53.64112 0.09164473 585.3159 0
##
## $`Conditional Slope`
## modx.values est se z p
## 1 -5.854771e-01 -0.005157213 0.01641730 -0.3141328 0.753420
## 2 -5.835356e-19 0.007427539 0.01497748 0.4959138 0.619955
## 3 5.854771e-01 0.020012290 0.01577583 1.2685409 0.204605
Plot the the country’s educational quality effect at each level of student’s IQ deviation from school means.
interact_plot(model=m20, pred="qualityc", modx="diffschooliqc", modx.values=diffschooliqcval)
Next, investigate the interaction between the school-level effect of school types and the country’s educational opportunity. Check the slopes of the school-level effect of school type at each level of country’s educational opportunity.
quick_sim_slopes(model=m20, pred="diffprivate", modx="opportunityc", modx.values=opportunitycval)
## $`Conditional Intercept`
## modx.values est se z p
## 1 -7.475441355 52.59821 0.18238446 288.3919 0
## 2 -0.004086022 53.02536 0.08698863 609.5666 0
## 3 7.467269312 53.45252 0.18098416 295.3436 0
##
## $`Conditional Slope`
## modx.values est se z p
## 1 -7.475441355 10.466564 0.3640795 28.74802 0
## 2 -0.004086022 9.679364 0.1739255 55.65234 0
## 3 7.467269312 8.892163 0.3615502 24.59454 0
Plot the school-level effect of school type at each level of country’s educational opportunity.
interact_plot(model=m20, pred="diffprivate", modx="opportunityc", modx.values=opportunitycval)
Note that users can run the effect of opportunityc
at
each level of diffprivate
but it is very hard to interpret.
So this setup will be skipped.
Read the new data set where the standardized test data were collected from the same schools three years in a row. However, students who took the tests in each year were different across time points. Thus, the nesting structure has three levels; that is, students:years:schools.
dat2 <- read.table("lecture13ex2.csv", sep=",", header=TRUE)
See descriptive statistics of all variables within the data set.
describe(dat2)
## vars n mean sd median trimmed mad min max range
## subjectid 1 22642 11321.50 6536.33 11321.5 11321.50 8392.26 1 22642 22641
## timeid 2 22642 152.07 87.22 153.0 152.52 112.68 1 300 299
## schoolid 3 22642 51.02 29.08 51.0 51.17 37.06 1 100 99
## test 4 22642 50.00 10.00 50.0 50.05 10.38 13 90 77
## year 5 22642 2018.01 0.82 2018.0 2018.02 1.48 2017 2019 2
## public 6 22642 0.51 0.50 1.0 0.51 0.00 0 1 1
## skew kurtosis se
## subjectid 0.00 -1.20 43.44
## timeid -0.04 -1.21 0.58
## schoolid -0.04 -1.21 0.19
## test -0.05 -0.04 0.07
## year -0.02 -1.50 0.01
## public -0.04 -2.00 0.00
Center the year such that 2017, 2018, and 2019 are 0, 1, 2, respectively.
dat2$yearc <- dat2$year - 2017
Next, check whether schools can significantly explain variance in standardized tests. The random intercept across schools was significant.
m21 <- lmer(test ~ 1 + yearc + (1|timeid), data=dat2, REML=FALSE)
m22 <- lmer(test ~ 1 + yearc + (1|timeid) + (1|schoolid), data=dat2, REML=FALSE)
anova(m21, m22)
## Data: dat2
## Models:
## m21: test ~ 1 + yearc + (1 | timeid)
## m22: test ~ 1 + yearc + (1 | timeid) + (1 | schoolid)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## m21 4 152462 152494 -76227 152454
## m22 5 152311 152351 -76150 152301 153.42 1 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Check the changes in standardized tests from 2017 to 2019 where the slopes are equal across schools.
summary(m22)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: test ~ 1 + yearc + (1 | timeid) + (1 | schoolid)
## Data: dat2
##
## AIC BIC logLik deviance df.resid
## 152310.8 152350.9 -76150.4 152300.8 22637
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.1126 -0.6591 -0.0068 0.6689 3.9155
##
## Random effects:
## Groups Name Variance Std.Dev.
## timeid (Intercept) 14.69 3.833
## schoolid (Intercept) 35.80 5.983
## Residual 46.39 6.811
## Number of obs: 22642, groups: timeid, 300; schoolid, 100
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 47.7911 0.6970 68.57
## yearc 2.2008 0.2768 7.95
##
## Correlation of Fixed Effects:
## (Intr)
## yearc -0.397
Next, allow the rates of changes to be different across schools. The random slope was significant.
m23 <- lmer(test ~ 1 + yearc + (1|timeid) + (1 + yearc|schoolid), data=dat2, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
anova(m22, m23)
## Data: dat2
## Models:
## m22: test ~ 1 + yearc + (1 | timeid) + (1 | schoolid)
## m23: test ~ 1 + yearc + (1 | timeid) + (1 + yearc | schoolid)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## m22 5 152311 152351 -76150 152301
## m23 7 152261 152317 -76123 152247 53.968 2 1.91e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
See the output of the random slope model.
summary(m23)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: test ~ 1 + yearc + (1 | timeid) + (1 + yearc | schoolid)
## Data: dat2
## Control: lmerControl(optimizer = "Nelder_Mead")
##
## AIC BIC logLik deviance df.resid
## 152260.8 152317.0 -76123.4 152246.8 22635
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.1140 -0.6572 -0.0027 0.6676 3.9245
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## timeid (Intercept) 7.688 2.773
## schoolid (Intercept) 23.215 4.818
## yearc 7.003 2.646 0.31
## Residual 46.388 6.811
## Number of obs: 22642, groups: timeid, 300; schoolid, 100
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 47.7904 0.5492 87.022
## yearc 2.2014 0.3342 6.588
##
## Correlation of Fixed Effects:
## (Intr)
## yearc -0.011
Make the scatterplot to visualize the relationship between the schools’ averaged standardized tests scores in 2017 and the rate of changes in each school.
plot(ranef(m23)$schoolid, xlab="School Averaged Test Score in 2017", ylab="School Change in Test Score per Year")
Next, check whether both random intercepts and random slopes can be explained by types of schools (public vs. private).
m24 <- lmer(test ~ 1 + yearc*public + (1|timeid) + (1 + yearc|schoolid), data=dat2, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
summary(m24)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: test ~ 1 + yearc * public + (1 | timeid) + (1 + yearc | schoolid)
## Data: dat2
## Control: lmerControl(optimizer = "Nelder_Mead")
##
## AIC BIC logLik deviance df.resid
## 152197.7 152270.0 -76089.9 152179.7 22633
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.1095 -0.6573 -0.0025 0.6701 3.9244
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## timeid (Intercept) 7.688 2.773
## schoolid (Intercept) 18.166 4.262
## yearc 3.477 1.865 -0.03
## Residual 46.387 6.811
## Number of obs: 22642, groups: timeid, 300; schoolid, 100
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 50.0372 0.7090 70.571
## yearc 4.0804 0.3910 10.434
## public -4.4942 1.0022 -4.484
## yearc:public -3.7575 0.5528 -6.797
##
## Correlation of Fixed Effects:
## (Intr) yearc public
## yearc -0.321
## public -0.707 0.227
## yearc:publc 0.227 -0.707 -0.320
The interaction was significant. Set the values to probe the interaction.
publicval <- c(0, 1)
yearval <- c(0, 1, 2)
Test the rate of changes at each type of school.
quick_sim_slopes(model=m24, pred="yearc", modx="public", modx.values=publicval)
## $`Conditional Intercept`
## modx.values est se z p
## 1 0 50.03721 0.7090314 70.57121 0
## 2 1 45.54299 0.7083046 64.29859 0
##
## $`Conditional Slope`
## modx.values est se z p
## 1 0 4.0803648 0.3910467 10.4344705 0.000000
## 2 1 0.3228416 0.3907220 0.8262693 0.408651
Plot the rate of changes at each type of school.
interact_plot(model=m24, pred="yearc", modx="public", modx.values=publicval)
Test the differences between each type of schools at each year.
quick_sim_slopes(model=m24, pred="public", modx="yearc", modx.values=yearval)
## $`Conditional Intercept`
## modx.values est se z p
## 1 0 50.03721 0.7090314 70.57121 0
## 2 1 54.11757 0.6912191 78.29293 0
## 3 2 58.19794 0.8710208 66.81578 0
##
## $`Conditional Slope`
## modx.values est se z p
## 1 0 -4.494218 1.0022081 -4.484317 7e-06
## 2 1 -8.251742 0.9774164 -8.442402 0e+00
## 3 2 -12.009265 1.2318388 -9.749055 0e+00
Plot the differences between each type of schools at each year.
interact_plot(model=m24, pred="public", modx="yearc", modx.values=yearval)
Use the final model with cross-level interactions to find the proportion of variances at each level explained.
m20 <- lmer(math ~ 1 + diffschooliqc + diffcountryiqc + avecountryiqcc
+ diffprivate + aveprivatec
+ qualityc + opportunityc
+ diffschooliqc*diffcountryiqc + diffschooliqc*avecountryiqcc
+ diffschooliqc*diffprivate + diffschooliqc*aveprivatec
+ diffschooliqc*qualityc + diffschooliqc*opportunityc
+ diffprivate*avecountryiqcc + diffprivate*aveprivatec
+ diffprivate*qualityc + diffprivate*opportunityc
+ (1|schoolid) + (1 + diffschooliqc + diffprivate|countryid),
data=dat1, REML=FALSE,
control = lmerControl(optimizer ="Nelder_Mead"))
Use the model.matrix
function to get the
data.frame
containing predictors used in the model. Note
that if I()
is used, the values are already transformed in
the result of the model.matrix
function.
dat1_1 <- model.matrix(m20)
Save the results of summary
function for future use.
sumoutm20 <- summary(m20)
Next, predictors at each level must be listed and put in the
r-squared calculation function. Level-1 predictors are
diffschooliqc
and all interactions involving
diffschooliqc
.
lv1_covs <- c("diffschooliqc", "diffschooliqc:diffcountryiqc",
"diffschooliqc:avecountryiqcc",
"diffschooliqc:diffprivate",
"diffschooliqc:aveprivatec",
"diffschooliqc:qualityc",
"diffschooliqc:opportunityc")
Level-2 predictors are diffschooliqc
,
diffprivate
, and all interactions between level-2 and
level-3 predictors.
lv2_covs <- c("diffschooliqc", "diffprivate",
"avecountryiqcc:diffprivate",
"diffprivate:aveprivatec",
"diffprivate:qualityc",
"diffprivate:opportunityc")
Level-3 predictors are listed as follows.
lv3_covs <- c("avecountryiqcc", "aveprivatec",
"qualityc", "opportunityc")
Next, the covariance matrix between random effects are extracted.
ranefvar20 <- as.matrix(Matrix::bdiag(VarCorr(m20)))
The r2mlm3_manual
function in the r2mlm
package is used. The arguments of this function is explained as
follows:
data
is the data frame from the
model.matrix
functionl1_covs
is the vector of level-1 predictorsl2_covs
is the vector of level-2 predictorsl3_covs
is the vector of level-3 predictorsrandom_covs12
is the vector of level-1 predictors that
are varied across level-2 unitsrandom_covs13
is the vector of level-1 predictors that
are varied across level-3 unitsrandom_covs23
is the vector of level-2 predictors that
are varied across level-3 unitsgamma_1
is the fixed effect estimates of the level-1
predictorsgamma_2
is the fixed effect estimates of the level-2
predictorsgamma_3
is the fixed effect estimates of the level-3
predictorsTau12
is the covariance matrix of the random intercept
at level 2 attached with the random slopes of level-1 predictors across
level-2 unitsTau13
is the covariance matrix of the random intercept
at level 3 attached with the random slopes of level-1 predictors across
level-3 unitsTau23
is the covariance matrix of the random intercept
at level 3 attached with the random slopes of level-1 predictors across
level-3 unitssigma2
is the level-1 residual variancesclustermeancentered
is whether all variables are
centered at level-2 and level-3 meanslibrary(r2mlm)
r2mlm3_manual(data=dat1_1,
l1_covs=lv1_covs,
l2_covs=lv2_covs,
l3_covs=lv3_covs,
random_covs12 = NULL,
random_covs13 = "diffschooliqc",
random_covs23 = "diffprivate",
gamma_1 = coef(sumoutm20)[lv1_covs, "Estimate"],
gamma_2 = coef(sumoutm20)[lv2_covs, "Estimate"],
gamma_3 = coef(sumoutm20)[lv3_covs, "Estimate"],
Tau12 = ranefvar20[1, 1, drop=FALSE],
Tau13 = ranefvar20[2:3, 2:3, drop=FALSE],
Tau23 = ranefvar20[c(2,4), c(2,4), drop=FALSE],
sigma2=getME(m20, "sigma")^2,
clustermeancentered = TRUE)
## $R2s
## total l1 l2 l3
## f1 0.0037073076 0.0059868543 NA NA
## f2 0.2218461183 NA 0.595235164 NA
## f3 0.0050217908 NA NA 0.6234098
## v12 0.0000000000 0.0000000000 NA NA
## v13 0.0004451072 0.0007187944 NA NA
## v23 0.0028663064 NA 0.007690585 NA
## m2 0.1479908894 NA 0.397074251 NA
## m3 0.0030335704 NA NA 0.3765902
Use the final model with cross-level interactions to find the
proportion of variances at each level explained. Note that this is the
example where a predictor is not group-mean centered, which is
yearc
.
m24 <- lmer(test ~ 1 + yearc*public + (1|timeid) + (1 + yearc|schoolid),
data=dat2, REML=FALSE,
control = lmerControl(optimizer ="Nelder_Mead"))
Use the model.matrix
function to get the
data.frame
containing predictors used in the model.
dat2_1 <- model.matrix(m24)
Attach level-2 and level-3 ID in the resulting data frame.
varnames <- colnames(dat2_1)
dat2_1 <- data.frame(dat2_1, timeid = dat2$timeid,schoolid = dat2$schoolid)
colnames(dat2_1)[1:length(varnames)] <- varnames
Save the results of summary
function for future use.
sumoutm24 <- summary(m24)
Next, predictors at each level are listed. Note that this model does not have level-1 predictors.
lv2_covs <- c("yearc", "yearc:public")
lv3_covs <- c("public")
Next, the covariance matrix between random effects are extracted.
ranefvar24 <- as.matrix(Matrix::bdiag(VarCorr(m24)))
The r2mlm3_manual
function in the r2mlm
package is used. Most arguments of this function have explained above.
However, for the model without cluster mean centered, some arguments are
needed.
Tau2_noncmc
(replacing Tau12
) is the
covariance matrix of the random intercept at level 2 attached with the
random slopes of level-1 predictors across level-2 unitsTau3_noncmc
(replacing Tau13
and
Tau23
) is the covariance matrix of the random intercept at
level 3 attached with the random slopes of level-1 and level-2
predictors across level-3 unitsl2clusterID_noncmc
is the level-2 ID variable namel3clusterID_noncmc
is the level-3 ID variable namer2mlm3_manual(data=dat2_1,
l1_covs=NULL,
l2_covs=lv2_covs,
l3_covs=lv3_covs,
random_covs12 = NULL,
random_covs13 = NULL,
random_covs23 = "yearc",
gamma_1 = NULL,
gamma_2 = coef(sumoutm24)[lv2_covs, "Estimate"],
gamma_3 = coef(sumoutm24)[lv3_covs, "Estimate"],
Tau2_noncmc = ranefvar24[1, 1,drop=FALSE],
Tau3_noncmc = ranefvar24[2:3, 2:3, drop=FALSE],
sigma2=getME(m24, "sigma")^2,
l2clusterID_noncmc = "timeid",
l3clusterID_noncmc = "schoolid",
clustermeancentered = FALSE)
## $R2s
## total l1 l2 l3
## f1 0.0000000000 0 NA NA
## f2 0.0537897646 NA 0.3519469 NA
## f3 0.1756730513 NA NA 0.45459847
## v12 0.0000000000 0 NA NA
## v13 0.0000000000 0 NA NA
## v22 0.0000000000 NA 0.0000000 NA
## v23 0.0226818108 NA 0.1484073 NA
## v32 0.0000000000 NA NA 0.00000000
## v33 0.0002778279 NA NA 0.00071895
## m2 0.0763632960 NA 0.4996458 NA
## m3 0.2104847588 NA NA 0.54468258