Read the data set from Lecture 7.
dat4 <- read.table("lecture7ex1.csv", sep=",", header=TRUE)
Calculate group means of math achievement by the ave
function and get the group-mean-centered math achievement. The group
means of math achievement is centered at 50. The centering at 50 is not
implemented by I()
but by subtracting the original group
means and saving it as a new variable called aveach50
. The
reason of making a new variable because I()
is not suppored
in the r2mlm
package which will be used to calculate \(R^2\).
dat4$aveach <- ave(dat4$ach, dat4$schoolid)
dat4$diffach <- dat4$ach - dat4$aveach
dat4$aveach50 <- dat4$aveach - 50
First, let’s run the cross-level interaction model where the
within-school predictors are the deviation of math achievement within
schools, diffach
, and the cross-level interaction,
diffach:aveach50
. The between-school predictors are the
school means of math achievement centered at 50, aveach50
,
and whether a school is private, private
.
library(lme4)
out4m1 <- lmer(efficacy ~ 1 + diffach + aveach50 + private
+ diffach:aveach50
+ (1 + diffach|schoolid), data=dat4, REML=FALSE)
summary(out4m1)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: efficacy ~ 1 + diffach + aveach50 + private + diffach:aveach50 +
## (1 + diffach | schoolid)
## Data: dat4
##
## AIC BIC logLik deviance df.resid
## 8888.7 8939.0 -4435.3 8870.7 1982
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.1594 -0.6500 0.0071 0.6654 3.2921
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## schoolid (Intercept) 1.53574 1.2392
## diffach 0.02968 0.1723 -0.22
## Residual 4.34903 2.0854
## Number of obs: 1991, groups: schoolid, 50
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 35.602009 0.325448 109.394
## diffach 0.061923 0.027110 2.284
## aveach50 0.048671 0.021855 2.227
## private -0.181827 0.420004 -0.433
## diffach:aveach50 0.009505 0.002548 3.730
##
## Correlation of Fixed Effects:
## (Intr) diffch avch50 privat
## diffach -0.127
## aveach50 -0.622 0.071
## private -0.794 0.000 0.525
## dffch:vch50 0.051 -0.397 -0.179 0.000
Run the r2mlm
function to get the proportions of
variance explained.
library(r2mlm)
r2mlm(out4m1)
## $Decompositions
## total within between
## fixed, within 0.17176369 0.2066938 NA
## fixed, between 0.02635712 NA 0.1559644
## slope variation 0.25530900 0.3072290 NA
## mean variation 0.14263735 NA 0.8440356
## sigma2 0.40393284 0.4860772 NA
##
## $R2s
## total within between
## f1 0.17176369 0.2066938 NA
## f2 0.02635712 NA 0.1559644
## v 0.25530900 0.3072290 NA
## m 0.14263735 NA 0.8440356
## f 0.19812081 NA NA
## fv 0.45342981 0.5139228 NA
## fvm 0.59606716 NA NA
Read the data set from Lecture 4.
dat5 <- read.table("lecture4ex2.csv", sep=",", header=TRUE)
First, the cross-level interaction model will be used to predict
interviews’ ratings. The interviewee-level predictors are interviewee’s
sex which male is the reference group, eesex, interviewee’s IQ centered
at 100, and the cross-level interaction between interviewee’s and
interviewer’s sex. The interviewer-level predictor is interviewer’s sex.
However, the r2mlm_long_manual
function in the
r2mlm
package is not easy to use as the r2mlm
function. Users need to provide many arguments and the variable names
cannot have :
, *
, or I()
. Thus,
the interaction or centering must be calcuated as new variables before
running the model. Thus, the centered IQ and the interaction are
calculated.
dat5$iqc <- dat5$iq - 100
dat5$int <- dat5$eesex * dat5$ersex
Next, analyze the cross-level interaction model.
out5 <- lmer(score ~ 1 + eesex + iqc + ersex + int
+ (1 + eesex|erid), data=dat5, REML=FALSE)
summary(out5)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: score ~ 1 + eesex + iqc + ersex + int + (1 + eesex | erid)
## Data: dat5
##
## AIC BIC logLik deviance df.resid
## 61641.0 61705.9 -30811.5 61623.0 9991
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.7413 -0.6367 -0.0048 0.6410 3.4320
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## erid (Intercept) 20.1152 4.4850
## eesex 0.7979 0.8933 0.12
## Residual 21.7645 4.6652
## Number of obs: 10000, groups: erid, 1000
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 75.538700 0.221217 341.469
## eesex -0.413234 0.137868 -2.997
## iqc 0.029418 0.003299 8.918
## ersex 0.009887 0.312847 0.032
## int 1.073648 0.194981 5.506
##
## Correlation of Fixed Effects:
## (Intr) eesex iqc ersex
## eesex -0.253
## iqc -0.003 0.001
## ersex -0.707 0.179 0.002
## int 0.179 -0.707 -0.008 -0.253
Run the r2mlm_long_manual
function to get the
proportions of variance explained. Note that, covs
is the
list of predictors with fixed effects. random_covs
is the
list of predictors that have random slopes. gammas
is the
point estimates of the fixed effect. Tau
is the covariance
matrix of the random effects. sigma2
is the level-1
residual variance.
sumout5 <- summary(out5)
r2mlm_long_manual(data=dat5,
covs=c("eesex", "iqc", "ersex", "int"),
random_covs=c("eesex"),
gammas=coef(sumout5)[-1, "Estimate"],
clusterID="erid",
Tau=as.matrix(Matrix::bdiag(VarCorr(out5))),
sigma2=getME(out5, "sigma")^2)
## $Decompositions
## total within between
## fixed slopes (within) 0.005822049 0.011301637 NA
## fixed slopes (between) 0.002170825 NA 0.004477321
## slope variation (within) 0.004626392 0.008980654 NA
## slope variation (between) 0.000000000 NA 0.000000000
## intercept variation (between) 0.482678196 NA 0.995522679
## residual (within) 0.504702537 0.979717710 NA
##
## $R2s
## total within between
## f1 0.005822049 0.011301637 NA
## f2 0.002170825 NA 0.004477321
## v1 0.004626392 0.008980654 NA
## v2 0.000000000 NA 0.000000000
## m 0.482678196 NA 0.995522679
## f 0.007992874 NA NA
## fv 0.012619266 0.020282290 0.004477321
## fvm 0.495297463 NA NA
Let’s check the increase in R squares when math achievement is added
to predict self-efficacy in addition to private
. First,
make the likelihood ratio test to compare the models with and without
math achievement.
out4m1 <- lmer(efficacy ~ 1 + diffach + aveach50 + private
+ diffach:aveach50
+ (1 + diffach|schoolid), data=dat4, REML=FALSE)
out4m2 <- lmer(efficacy ~ 1 + private + (1|schoolid), data=dat4, REML=FALSE)
anova(out4m2, out4m1)
## Data: dat4
## Models:
## out4m2: efficacy ~ 1 + private + (1 | schoolid)
## out4m1: efficacy ~ 1 + diffach + aveach50 + private + diffach:aveach50 + (1 + diffach | schoolid)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## out4m2 4 10096.3 10119 -5044.2 10088.3
## out4m1 9 8888.7 8939 -4435.3 8870.7 1217.6 5 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Use the r2mlm_comp
function to check the R squared
changes when math achievement is added.
r2mlm_comp(out4m2, out4m1)
## $`Model A R2s`
## total within between
## f1 0.000000000 0 NA
## f2 0.007708657 NA 0.04747128
## v 0.000000000 0 NA
## m 0.154677036 NA 0.95252872
## f 0.007708657 NA NA
## fv 0.007708657 0 NA
## fvm 0.162385693 NA NA
##
## $`Model B R2s`
## total within between
## f1 0.17176369 0.2066938 NA
## f2 0.02635712 NA 0.1559644
## v 0.25530900 0.3072290 NA
## m 0.14263735 NA 0.8440356
## f 0.19812081 NA NA
## fv 0.45342981 0.5139228 NA
## fvm 0.59606716 NA NA
##
## $`R2 differences, Model B - Model A`
## total within between
## f1 0.17176369 0.2066938 NA
## f2 0.01864846 NA 0.1084931
## v 0.25530900 0.3072290 NA
## m -0.01203969 NA -0.1084931
## f 0.19041216 NA NA
## fv 0.44572115 0.5139228 NA
## fvm 0.43368146 NA NA
Read the data set from Lecture 4.
dat1 <- read.table("lecture4ex1.csv", sep=",", header=TRUE)
To find standardized coefficients, all variables need to be
standardized before running multilevel models. First, the lower-level
variables are easily standardized by the scale
function.
dat1$zconsume <- scale(dat1$consume)
dat1$zlength <- scale(dat1$length)
However, the upper-level variables are not easy to standardized. Each row in a data frame represents the lower-level observation. Thus, each upper-level unit will have duplicated values. The duplicated values must be removed. Then, the means and standard deviations are calculated. Finally, the original scores (with duplicated values) are simply subtracted by the calculated means and divided by the obtained standard deviations.
volume <- dat1[!duplicated(dat1$groupid), "volume"]
mvolume <- mean(volume)
sdvolume <- sd(volume)
dat1$zvolume <- (dat1$volume - mvolume) / sd(volume)
Use the standard scores to find group means and group-mean-centered standard scores.
dat1$avezlength <- ave(dat1$zlength, dat1$groupid)
dat1$diffzlength <- dat1$zlength - dat1$avezlength
Run the multilevel models with standard scores.
out1z23 <- lmer(zconsume ~ 1 + diffzlength + avezlength + zvolume
+ diffzlength:zvolume + (1 + diffzlength|groupid),
data=dat1, REML=FALSE)
summary(out1z23)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula:
## zconsume ~ 1 + diffzlength + avezlength + zvolume + diffzlength:zvolume +
## (1 + diffzlength | groupid)
## Data: dat1
##
## AIC BIC logLik deviance df.resid
## 963.2 1005.5 -472.6 945.2 802
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.14847 -0.65742 0.02343 0.65483 2.93474
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## groupid (Intercept) 0.319747 0.56546
## diffzlength 0.006452 0.08033 0.58
## Residual 0.126301 0.35539
## Number of obs: 811, groups: groupid, 100
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) -0.003562 0.058226 -0.061
## diffzlength 0.626756 0.016704 37.521
## avezlength 0.797201 0.130465 6.110
## zvolume 0.232775 0.062356 3.733
## diffzlength:zvolume 0.038541 0.016110 2.392
##
## Correlation of Fixed Effects:
## (Intr) dffzln avzlng zvolum
## diffzlength 0.274
## avezlength 0.068 -0.001
## zvolume -0.037 -0.001 -0.354
## dffzlngth:z -0.003 -0.225 -0.023 0.277
To check whether the models with standardized and unstandardized variables are the same, log likelihood can be calculated. These models are compared:
out1z23
)out1bz23
)out1zb23
)out1b23
)As shown below, Model 1 and Model 2, which both have standardized outcomes, have the same log likelihood values. Model 3 and Model 4, which both have unstandardized outcomes, have the same log likelihood values.
dat1$lengthx <- dat1$length/10
dat1$avelengthx <- ave(dat1$lengthx, dat1$groupid)
dat1$difflengthx <- dat1$lengthx - dat1$avelengthx
dat1$avegold <- ave(dat1$goldcolor, dat1$groupid)
dat1$diffgold <- dat1$goldcolor - dat1$avegold
out1zb23 <- lmer(consume ~ 1 + diffzlength + avezlength + zvolume
+ diffzlength:zvolume + (1 + diffzlength|groupid),
data=dat1, REML=FALSE)
out1bz23 <- lmer(zconsume ~ 1 + difflengthx + avelengthx + volume
+ difflengthx:volume + (1 + difflengthx|groupid),
data=dat1, REML=FALSE)
out1b23 <- lmer(consume ~ 1 + difflengthx + avelengthx + volume
+ difflengthx:volume + (1 + difflengthx|groupid),
data=dat1, REML=FALSE)
logLik(out1z23)
## 'log Lik.' -472.61 (df=9)
logLik(out1bz23)
## 'log Lik.' -472.61 (df=9)
logLik(out1zb23)
## 'log Lik.' -2041.053 (df=9)
logLik(out1b23)
## 'log Lik.' -2041.053 (df=9)
Instead of standardized regression coefficients,
r2mlm_comp
can be used to find unique contributions of each
predictor. This concept is the same as part regression in regular
multiple regression coefficients.
out1b23_1 <- lmer(consume ~ 1 + avelengthx + volume
+ (1|groupid),
data=dat1, REML=FALSE)
out1b23_2 <- lmer(consume ~ 1 + difflengthx + volume
+ difflengthx:volume + (1 + difflengthx|groupid),
data=dat1, REML=FALSE)
out1b23_3 <- lmer(consume ~ 1 + difflengthx + avelengthx
+ difflengthx + (1 + difflengthx|groupid),
data=dat1, REML=FALSE)
r2mlm_comp(out1b23_1, out1b23, bargraph = FALSE)
## $`Model A R2s`
## total within between
## f1 0.0000000 0 NA
## f2 0.2288923 NA 0.4678728
## v 0.0000000 0 NA
## m 0.2603268 NA 0.5321272
## f 0.2288923 NA NA
## fv 0.2288923 0 NA
## fvm 0.4892191 NA NA
##
## $`Model B R2s`
## total within between
## f1 0.329200078 0.71831315 NA
## f2 0.228033130 NA 0.4209553
## v 0.005193286 0.01133173 NA
## m 0.313670822 NA 0.5790447
## f 0.557233208 NA NA
## fv 0.562426494 0.72964488 NA
## fvm 0.876097316 NA NA
##
## $`R2 differences, Model B - Model A`
## total within between
## f1 0.3292000781 0.71831315 NA
## f2 -0.0008592057 NA -0.04691757
## v 0.0051932859 0.01133173 NA
## m 0.0533440414 NA 0.04691757
## f 0.3283408725 NA NA
## fv 0.3335341584 0.72964488 NA
## fvm 0.3868781998 NA NA
r2mlm_comp(out1b23_2, out1b23, bargraph = FALSE)
## $`Model A R2s`
## total within between
## f1 0.318909555 0.71794802 NA
## f2 0.139199486 NA 0.250447
## v 0.005325128 0.01198824 NA
## m 0.416604632 NA 0.749553
## f 0.458109041 NA NA
## fv 0.463434169 0.72993626 NA
## fvm 0.880038801 NA NA
##
## $`Model B R2s`
## total within between
## f1 0.329200078 0.71831315 NA
## f2 0.228033130 NA 0.4209553
## v 0.005193286 0.01133173 NA
## m 0.313670822 NA 0.5790447
## f 0.557233208 NA NA
## fv 0.562426494 0.72964488 NA
## fvm 0.876097316 NA NA
##
## $`R2 differences, Model B - Model A`
## total within between
## f1 0.010290523 0.0003651276 NA
## f2 0.088833644 NA 0.1705082
## v -0.000131842 -0.0006565149 NA
## m -0.102933810 NA -0.1705082
## f 0.099124167 NA NA
## fv 0.098992325 -0.0002913873 NA
## fvm -0.003941485 NA NA
r2mlm_comp(out1b23_3, out1b23, bargraph = FALSE)
## $`Model A R2s`
## total within between
## f1 0.330962135 0.71150617 NA
## f2 0.163142469 NA 0.3050288
## v 0.006895454 0.01482393 NA
## m 0.371700444 NA 0.6949712
## f 0.494104605 NA NA
## fv 0.501000059 0.72633009 NA
## fvm 0.872700503 NA NA
##
## $`Model B R2s`
## total within between
## f1 0.329200078 0.71831315 NA
## f2 0.228033130 NA 0.4209553
## v 0.005193286 0.01133173 NA
## m 0.313670822 NA 0.5790447
## f 0.557233208 NA NA
## fv 0.562426494 0.72964488 NA
## fvm 0.876097316 NA NA
##
## $`R2 differences, Model B - Model A`
## total within between
## f1 -0.001762057 0.006806984 NA
## f2 0.064890661 NA 0.1159265
## v -0.001702168 -0.003492199 NA
## m -0.058029622 NA -0.1159265
## f 0.063128603 NA NA
## fv 0.061426435 0.003314785 NA
## fvm 0.003396813 NA NA