Read the new data set representing a longitudinal study which employees data are collected in four time points.
dat1 <- read.table("lecture11ex1.csv", sep=",", header=TRUE)
See descriptive statistics of all variables within the data set.
library(psych)
describe(dat1)
## vars n mean sd median trimmed mad min max range skew
## rowid 1 1600 800.50 462.02 800.5 800.50 593.04 1 1600 1599 0.00
## pid 2 1600 200.50 115.51 200.5 200.50 148.26 1 400 399 0.00
## jsat 3 1600 48.14 10.28 47.0 47.85 10.38 12 96 84 0.27
## time 4 1600 1.50 1.12 1.5 1.50 1.48 0 3 3 0.00
## married 5 1600 0.37 0.48 0.0 0.34 0.00 0 1 1 0.53
## age0 6 1600 28.38 6.14 27.0 27.85 5.93 18 50 32 0.82
## tenure0 7 1600 2.05 1.40 1.8 1.92 1.37 0 8 8 0.93
## fftj 8 1600 0.46 0.50 0.0 0.45 0.00 0 1 1 0.15
## officejob 9 1600 0.33 0.47 0.0 0.29 0.00 0 1 1 0.72
## male 10 1600 0.68 0.47 1.0 0.72 0.00 0 1 1 -0.77
## kurtosis se
## rowid -1.20 11.55
## pid -1.20 2.89
## jsat 0.47 0.26
## time -1.36 0.03
## married -1.72 0.01
## age0 0.62 0.15
## tenure0 0.85 0.04
## fftj -1.98 0.01
## officejob -1.48 0.01
## male -1.41 0.01
Let’s run the null model of job satisfaction to check the intraclass correlation.
library(lme4)
out1a <- lmer(jsat ~ 1 + (1|pid), data=dat1, REML=FALSE)
summary(out1a)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: jsat ~ 1 + (1 | pid)
## Data: dat1
##
## AIC BIC logLik deviance df.resid
## 10919.4 10935.5 -5456.7 10913.4 1597
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.9497 -0.5921 0.0149 0.5842 3.3077
##
## Random effects:
## Groups Name Variance Std.Dev.
## pid (Intercept) 76.51 8.747
## Residual 29.15 5.399
## Number of obs: 1600, groups: pid, 400
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 48.1356 0.4577 105.2
Use time
to predict job satisfaction without random
slopes.
out1b <- lmer(jsat ~ 1 + time + (1|pid), data=dat1, REML=FALSE)
summary(out1b)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: jsat ~ 1 + time + (1 | pid)
## Data: dat1
##
## AIC BIC logLik deviance df.resid
## 10881.7 10903.2 -5436.9 10873.7 1596
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.7741 -0.5993 0.0087 0.5622 3.5582
##
## Random effects:
## Groups Name Variance Std.Dev.
## pid (Intercept) 76.75 8.761
## Residual 28.20 5.310
## Number of obs: 1600, groups: pid, 400
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 49.2663 0.4912 100.308
## time -0.7538 0.1187 -6.348
##
## Correlation of Fixed Effects:
## (Intr)
## time -0.363
Next, add the random slope of time
in the model. That
is, every employee has different changes in job satisfaction. The
likelihood ratio test was significant so the random slope is
retained.
out1c <- lmer(jsat ~ 1 + time + (1 + time|pid), data=dat1, REML=FALSE)
anova(out1b, out1c)
## Data: dat1
## Models:
## out1b: jsat ~ 1 + time + (1 | pid)
## out1c: jsat ~ 1 + time + (1 + time | pid)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## out1b 4 10882 10903 -5436.9 10874
## out1c 6 10720 10753 -5354.2 10708 165.23 2 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Check the output of the random slope model.
summary(out1c)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: jsat ~ 1 + time + (1 + time | pid)
## Data: dat1
##
## AIC BIC logLik deviance df.resid
## 10720.5 10752.8 -5354.2 10708.5 1594
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.52917 -0.52309 0.00172 0.53373 2.43188
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## pid (Intercept) 57.34 7.572
## time 5.04 2.245 0.20
## Residual 19.80 4.450
## Number of obs: 1600, groups: pid, 400
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 49.2663 0.4219 116.772
## time -0.7537 0.1500 -5.025
##
## Correlation of Fixed Effects:
## (Intr)
## time -0.101
Let’s investigate individual changes in job satisfaction across three
years. The ggplot2
package is used to group each person
data. Then, the regression lines are plotted for each person. Note that
only every 10th person will be used in the plot so the plot does not
have too many regression lines.
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
dat1_2 <- dat1[dat1$pid%%10 == 0,]
ggplot(dat1_2, aes(x=time, y=jsat, group=pid)) + geom_smooth(method=lm, se=FALSE)
## `geom_smooth()` using formula = 'y ~ x'
Next, the X axis is changed to age
which is the sum of
the age at the beginning, age0
, and the time point,
time
.
dat1_2 <- dat1[dat1$pid%%3 == 0,]
dat1_2$age <- dat1_2$age0 + dat1_2$time
ggplot(dat1_2, aes(x=age, y=jsat, group=pid)) + geom_smooth(method=lm, se=FALSE)
## `geom_smooth()` using formula = 'y ~ x'
Finally, the X axis is changed to tenure
which is the
sum of the tenure at the beginning, tenure0
, and the time
point, time
.
dat1_2 <- dat1[dat1$pid%%5 == 0,]
dat1_2$tenure <- dat1_2$tenure0 + dat1_2$time
ggplot(dat1_2, aes(x=tenure, y=jsat, group=pid)) + geom_smooth(method=lm, se=FALSE)
## `geom_smooth()` using formula = 'y ~ x'
Let’s recenter the time
variable. Previously,
time == 0
represents the beginning of the study. The
time
variable is subtracted by 3 called timel
so timel == 0
represents the last time point (at Year 3).
Check the result of this centering in the random slope model.
dat1$timel <- dat1$time - 3
out1c1 <- lmer(jsat ~ 1 + timel + (1 + timel|pid), data=dat1, REML=FALSE)
summary(out1c1)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: jsat ~ 1 + timel + (1 + timel | pid)
## Data: dat1
##
## AIC BIC logLik deviance df.resid
## 10720.5 10752.8 -5354.2 10708.5 1594
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.52916 -0.52310 0.00171 0.53373 2.43187
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## pid (Intercept) 123.04 11.092
## timel 5.04 2.245 0.74
## Residual 19.80 4.450
## Number of obs: 1600, groups: pid, 400
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 47.0050 0.5850 80.347
## timel -0.7538 0.1500 -5.025
##
## Correlation of Fixed Effects:
## (Intr)
## timel 0.697
Let’s add time-varying covariates in the model. In this example, whether an employee is married is only time-varying covariate. First, this variable is added without centering.
out1d <- lmer(jsat ~ 1 + time + married + (1 + time|pid), data=dat1, REML=FALSE)
summary(out1d)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: jsat ~ 1 + time + married + (1 + time | pid)
## Data: dat1
##
## AIC BIC logLik deviance df.resid
## 10721.8 10759.5 -5353.9 10707.8 1593
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.54359 -0.52100 0.00137 0.52261 2.42010
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## pid (Intercept) 56.867 7.541
## time 5.046 2.246 0.19
## Residual 19.841 4.454
## Number of obs: 1600, groups: pid, 400
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 49.1621 0.4394 111.894
## time -0.7932 0.1577 -5.031
## married 0.4386 0.5351 0.820
##
## Correlation of Fixed Effects:
## (Intr) time
## time -0.007
## married -0.289 -0.305
As discussed in Lecture 7, if the level-1 variables are added in the
model without centering, the level-1 and level-2 effects are equally
constrained, which should not be assumed without testing. Thus,
married
is separated into employee means,
avemarried
and within-employee deviations,
diffmarried
. Then, both variables are put in the model in
addition to time
.
dat1$avemarried <- ave(dat1$married, dat1$pid)
dat1$diffmarried <- dat1$married - dat1$avemarried
out1e <- lmer(jsat ~ 1 + time + diffmarried + avemarried + (1 + time|pid), data=dat1, REML=FALSE)
summary(out1e)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: jsat ~ 1 + time + diffmarried + avemarried + (1 + time | pid)
## Data: dat1
##
## AIC BIC logLik deviance df.resid
## 10712 10755 -5348 10696 1592
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.56584 -0.51463 0.00135 0.52804 2.39684
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## pid (Intercept) 55.988 7.482
## time 5.041 2.245 0.15
## Residual 19.770 4.446
## Number of obs: 1600, groups: pid, 400
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 47.9283 0.5593 85.698
## time -0.7004 0.1599 -4.380
## diffmarried -0.5932 0.6164 -0.962
## avemarried 3.3768 0.9840 3.432
##
## Correlation of Fixed Effects:
## (Intr) time dffmrr
## time -0.134
## diffmarried 0.116 -0.347
## avemarried -0.648 -0.017 0.050
Because diffmarried
is not significant, it is dropped
from the model. Only avemarried
is retained, which is time
invariant. Other time-invariant covariates are added in the model.
out1h <- lmer(jsat ~ 1 + time + avemarried + I(age0-25) + I(tenure0-5) + fftj + officejob + (1 + time|pid), data=dat1, REML=FALSE)
summary(out1h)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: jsat ~ 1 + time + avemarried + I(age0 - 25) + I(tenure0 - 5) +
## fftj + officejob + (1 + time | pid)
## Data: dat1
##
## AIC BIC logLik deviance df.resid
## 10629.2 10688.4 -5303.6 10607.2 1589
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.52034 -0.52056 0.01208 0.52926 2.40706
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## pid (Intercept) 46.27 6.802
## time 5.04 2.245 -0.07
## Residual 19.80 4.450
## Number of obs: 1600, groups: pid, 400
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 46.38997 1.27225 36.463
## time -0.75375 0.15000 -5.025
## avemarried 2.15336 0.91933 2.342
## I(age0 - 25) 0.69220 0.06864 10.084
## I(tenure0 - 5) -0.55378 0.30031 -1.844
## fftj -2.74403 0.84507 -3.247
## officejob -1.89139 0.83488 -2.265
##
## Correlation of Fixed Effects:
## (Intr) time avmrrd I(0-25 I(0-5) fftj
## time -0.091
## avemarried -0.378 0.000
## I(age0-25) -0.233 0.000 -0.144
## I(tenur0-5) 0.824 0.000 -0.158 -0.228
## fftj -0.462 0.000 0.021 -0.267 -0.228
## officejob -0.223 0.000 0.084 0.062 0.120 0.182
To increase the interpretability of the regression coefficients, all time-invariate covariates are grand-mean centered. Thus, the grand means must be calculated by (a) making a new data set where rows represent each individual and (b) calculating the means across individuals.
dat1a <- dat1[!duplicated(dat1$pid),]
apply(dat1a, 2, mean)
## rowid pid jsat time married age0
## 799.000000 200.500000 49.315000 0.000000 0.232500 28.375000
## tenure0 fftj officejob male timel avemarried
## 2.054317 0.462500 0.330000 0.680000 -3.000000 0.372500
## diffmarried
## -0.140000
Next, all time-invariate covariates are centered at the calculated
grand means. Also, all cross-level interactions are added to predict the
random slopes of time
.
out1n <- lmer(jsat ~ 1 + time + time*I(avemarried-0.3725) + time*I(age0-28.375) + time*I(tenure0-2.054) + time*I(fftj-0.4625) + time*I(officejob-0.33) + (1 + time|pid), data=dat1, REML=FALSE)
summary(out1n)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: jsat ~ 1 + time + time * I(avemarried - 0.3725) + time * I(age0 -
## 28.375) + time * I(tenure0 - 2.054) + time * I(fftj - 0.4625) +
## time * I(officejob - 0.33) + (1 + time | pid)
## Data: dat1
##
## AIC BIC logLik deviance df.resid
## 10427.5 10513.5 -5197.7 10395.5 1584
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.57301 -0.54946 0.00844 0.54756 2.62571
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## pid (Intercept) 44.043 6.636
## time 1.341 1.158 0.24
## Residual 19.801 4.450
## Number of obs: 1600, groups: pid, 400
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 49.266682 0.380471 129.489
## time -0.754080 0.115116 -6.551
## I(avemarried - 0.3725) 2.157200 0.945739 2.281
## I(age0 - 28.375) 0.584070 0.070616 8.271
## I(tenure0 - 2.054) -1.361880 0.308940 -4.408
## I(fftj - 0.4625) -2.558328 0.869350 -2.943
## I(officejob - 0.33) -1.752716 0.858865 -2.041
## time:I(avemarried - 0.3725) -0.004955 0.286145 -0.017
## time:I(age0 - 28.375) 0.139417 0.021366 6.525
## time:I(tenure0 - 2.054) 1.041940 0.093474 11.147
## time:I(fftj - 0.4625) -0.239439 0.263033 -0.910
## time:I(officejob - 0.33) -0.178807 0.259861 -0.688
##
## Correlation of Fixed Effects:
## (Intr) time I(-0.37 I(0-28 I(0-2. I(-0.4 I(-0.33 t:I(-0.37
## time -0.235
## I(v-0.3725) 0.000 0.000
## I(0-28.375) 0.000 0.000 -0.144
## I(t0-2.054) 0.000 0.000 -0.158 -0.228
## I(f-0.4625) 0.000 0.000 0.021 -0.267 -0.228
## I(ffc-0.33) 0.000 0.000 0.084 0.062 0.120 0.182
## t:I(-0.3725 0.000 0.000 -0.235 0.034 0.037 -0.005 -0.020
## t:I(0-28.37 0.000 0.000 0.034 -0.235 0.053 0.063 -0.015 -0.144
## t:I(0-2.054 0.000 0.000 0.037 0.053 -0.235 0.054 -0.028 -0.158
## t:I(-0.4625 0.000 0.000 -0.005 0.063 0.054 -0.235 -0.043 0.021
## tm:I(-0.33) 0.000 0.000 -0.020 -0.015 -0.028 -0.043 -0.235 0.084
## t:I(0-28 t:I(0-2. t:I(-0.4
## time
## I(v-0.3725)
## I(0-28.375)
## I(t0-2.054)
## I(f-0.4625)
## I(ffc-0.33)
## t:I(-0.3725
## t:I(0-28.37
## t:I(0-2.054 -0.228
## t:I(-0.4625 -0.267 -0.228
## tm:I(-0.33) 0.062 0.120 0.182
age0
and tenure0
significantly predicted
the random slopes of time
. Thus, both cross-level
interactions are probed by the interactions
package. Before
using the package, all variables must be centered by creating new
variables.
dat1$age0c <- dat1$age0-28.375
dat1$avemarriedc <- dat1$avemarried - 0.3725
dat1$tenure0c <- dat1$tenure0 - 2.054
dat1$fftjc <- dat1$fftj - 0.4625
dat1$officejobc <- dat1$officejob - 0.33
out1nn <- lmer(jsat ~ 1 + time + time*avemarriedc + time*age0c + time*tenure0c + time*fftjc + time*officejobc + (1 + time|pid), data=dat1, REML=FALSE)
Next, find values of time
, age0
, and
tenure0
for probing interactions. Don’t forget to delete
values from the grand means because all predictors are centered by
creating new variables.
age0cval <- c(25, 30, 35) - 28.375
tenure0cval <- c(0, 2, 4) - 2.054
timeval <- c(0, 1, 2, 3)
First, the simple slopes of time
at each level of
age0
is calculated.
library(interactions)
ss11 <- sim_slopes(model=out1nn, pred=time, modx=age0c, modx.values=age0cval)
ss11
## JOHNSON-NEYMAN INTERVAL
##
## When age0c is OUTSIDE the interval [3.48, 8.41], the slope of time is p <
## .05.
##
## Note: The range of observed values of age0c is [-10.38, 21.62]
##
## SIMPLE SLOPES ANALYSIS
##
## Slope of time when age0c = -3.375:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -1.22 0.14 -8.95 0.00
##
## Slope of time when age0c = 1.625:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -0.53 0.12 -4.35 0.00
##
## Slope of time when age0c = 6.625:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.17 0.18 0.92 0.36
Plot the interaction where X-axis is time
and each line
represents each value of age0
.
interact_plot(model=out1nn, pred=time, modx=age0c, modx.values=age0cval)
Second, the simple slopes of age0
at each level of
time
is calculated.
ss12 <- sim_slopes(model=out1nn, pred=age0c, modx=time, modx.values=timeval)
ss12
## JOHNSON-NEYMAN INTERVAL
##
## When time is OUTSIDE the interval [-6.61, -2.75], the slope of age0c is p <
## .05.
##
## Note: The range of observed values of time is [0.00, 3.00]
##
## SIMPLE SLOPES ANALYSIS
##
## Slope of age0c when time = 0.00:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.58 0.07 8.21 0.00
##
## Slope of age0c when time = 1.00:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.72 0.07 10.43 0.00
##
## Slope of age0c when time = 2.00:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.86 0.07 11.66 0.00
##
## Slope of age0c when time = 3.00:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 1.00 0.08 11.91 0.00
Plot the interaction where X-axis is age0
and each line
represents each value of time
.
interact_plot(model=out1nn, pred=age0c, modx=time, modx.values=timeval)
Third, the simple slopes of time
at each level of
tenure0
is calculated.
ss13 <- sim_slopes(model=out1nn, pred=time, modx=tenure0c, modx.values=tenure0cval)
ss13
## JOHNSON-NEYMAN INTERVAL
##
## When tenure0c is OUTSIDE the interval [0.49, 1.00], the slope of time is p
## < .05.
##
## Note: The range of observed values of tenure0c is [-2.05, 5.95]
##
## SIMPLE SLOPES ANALYSIS
##
## Slope of time when tenure0c = -2.054:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -2.89 0.23 -12.83 0.00
##
## Slope of time when tenure0c = -0.054:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -0.81 0.12 -6.98 0.00
##
## Slope of time when tenure0c = 1.946:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 1.27 0.22 5.87 0.00
Plot the interaction where X-axis is time
and each line
represents each value of tenure0
.
interact_plot(model=out1nn, pred=time, modx=tenure0c, modx.values=tenure0cval)
Finally, the simple slopes of tenure0
at each level of
time
is calculated.
ss14 <- sim_slopes(model=out1nn, pred=tenure0c, modx=time, modx.values=timeval)
ss14
## JOHNSON-NEYMAN INTERVAL
##
## When time is OUTSIDE the interval [0.74, 1.91], the slope of tenure0c is p
## < .05.
##
## Note: The range of observed values of time is [0.00, 3.00]
##
## SIMPLE SLOPES ANALYSIS
##
## Slope of tenure0c when time = 0.00:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -1.36 0.31 -4.38 0.00
##
## Slope of tenure0c when time = 1.00:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -0.32 0.30 -1.05 0.29
##
## Slope of tenure0c when time = 2.00:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.72 0.32 2.23 0.03
##
## Slope of tenure0c when time = 3.00:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 1.76 0.37 4.79 0.00
Plot the interaction where X-axis is tenure0
and each
line represents each value of time
.
interact_plot(model=out1nn, pred=tenure0c, modx=time, modx.values=timeval)
Read the new data set representing a longitudinal study which undergraduate students data are collected in four school years.
dat2 <- read.table("lecture11ex2.csv", sep=",", header=TRUE)
See descriptive statistics of all variables within the data set.
describe(dat2)
## vars n mean sd median trimmed mad min max range skew
## rowid 1 3400 1700.50 981.64 1700.5 1700.50 1260.21 1 3400 3399 0.00
## pid 2 3400 425.50 245.41 425.5 425.50 315.05 1 850 849 0.00
## time 3 3400 2.50 1.12 2.5 2.50 1.48 1 4 3 0.00
## mem 4 3400 59.02 18.97 58.0 59.13 19.27 -1 113 114 -0.05
## stress 5 3400 50.27 13.63 50.0 50.26 13.34 3 99 96 0.02
## cohort 6 3400 0.50 0.50 1.0 0.50 0.00 0 1 1 0.00
## female 7 3400 0.50 0.50 0.0 0.49 0.00 0 1 1 0.02
## act 8 3400 50.22 13.12 50.5 50.29 14.08 9 87 78 -0.06
## ext 9 3400 49.53 9.73 49.0 49.47 8.90 23 82 59 0.10
## kurtosis se
## rowid -1.20 16.83
## pid -1.20 4.21
## time -1.36 0.02
## mem -0.32 0.33
## stress 0.10 0.23
## cohort -2.00 0.01
## female -2.00 0.01
## act -0.46 0.23
## ext 0.00 0.17
Let’s run the null model of senses of belonging to check the intraclass correlation.
out2a <- lmer(mem ~ 1 + (1|pid), data=dat2, REML=FALSE)
summary(out2a)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: mem ~ 1 + (1 | pid)
## Data: dat2
##
## AIC BIC logLik deviance df.resid
## 27524.0 27542.4 -13759.0 27518.0 3397
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.3921 -0.5104 -0.0082 0.5316 3.8758
##
## Random effects:
## Groups Name Variance Std.Dev.
## pid (Intercept) 253.5 15.92
## Residual 106.4 10.31
## Number of obs: 3400, groups: pid, 850
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 59.015 0.574 102.8
Use time
to predict senses of belonging without random
slopes. Note that time
in this example starts with 1 so
this variable is subtracted by 1 to make the intercept represent the
sense of belonging at freshman year.
dat2$timec <- dat2$time - 1
out2b <- lmer(mem ~ 1 + timec + (1|pid), data=dat2, REML=FALSE)
summary(out2b)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: mem ~ 1 + timec + (1 | pid)
## Data: dat2
##
## AIC BIC logLik deviance df.resid
## 27366.3 27390.8 -13679.1 27358.3 3396
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.7910 -0.5418 -0.0133 0.5594 3.7069
##
## Random effects:
## Groups Name Variance Std.Dev.
## pid (Intercept) 255.11 15.972
## Residual 99.93 9.996
## Number of obs: 3400, groups: pid, 850
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 61.9679 0.6184 100.21
## timec -1.9686 0.1533 -12.84
##
## Correlation of Fixed Effects:
## (Intr)
## timec -0.372
As another method, calendar year is used as the predictor. Note that
this example has two cohorts: B.E. 2550 (2007)
(cohort == 1
) and B.E. 2551 (2008)
(cohort == 0
). If B.E. 2552 (2009) is centered at 0, then
four years in the B.E. 2550 cohort should be -2, -1, 0, 1 and those in
the B.E. 2551 cohort should be -1, 0, 1, 2.
dat2$calyear <- dat2$time - 3
dat2[dat2$cohort == 0, "calyear"] <- dat2[dat2$cohort == 0, "calyear"] + 1
out2b1 <- lmer(mem ~ 1 + calyear + (1|pid), data=dat2, REML=FALSE)
summary(out2b1)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: mem ~ 1 + calyear + (1 | pid)
## Data: dat2
##
## AIC BIC logLik deviance df.resid
## 27331.7 27356.3 -13661.9 27323.7 3396
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.8101 -0.5480 -0.0108 0.5635 3.6866
##
## Random effects:
## Groups Name Variance Std.Dev.
## pid (Intercept) 243.51 15.605
## Residual 99.98 9.999
## Number of obs: 3400, groups: pid, 850
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 59.013 0.562 105.00
## calyear -2.149 0.152 -14.14
##
## Correlation of Fixed Effects:
## (Intr)
## calyear 0.000
Let’s keep using time
. Next, the curvilinear changes are
investigated but the changes were not significant.
out2c <- lmer(mem ~ 1 + timec + I(timec^2) + (1|pid), data=dat2, REML=FALSE)
summary(out2c)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: mem ~ 1 + timec + I(timec^2) + (1 | pid)
## Data: dat2
##
## AIC BIC logLik deviance df.resid
## 27367.4 27398.0 -13678.7 27357.4 3395
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.8079 -0.5372 -0.0080 0.5689 3.6912
##
## Random effects:
## Groups Name Variance Std.Dev.
## pid (Intercept) 255.12 15.973
## Residual 99.89 9.994
## Number of obs: 3400, groups: pid, 850
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 62.1311 0.6417 96.822
## timec -2.4583 0.5366 -4.581
## I(timec^2) 0.1632 0.1714 0.952
##
## Correlation of Fixed Effects:
## (Intr) timec
## timec -0.358
## I(timec^2) 0.267 -0.958
Plot the nonsignificant curvilinear changes and see that the curvature is not strong.
mytime <- 0:3
mypred <- 62.1311 - 2.4583*mytime + 0.1632*mytime^2
plot(mytime, mypred, type="l")
Next, add the random slope of timec
in the model. That
is, every employee has different changes in job satisfaction. The
likelihood ratio test was significant so the random slope is
retained.
out2d <- lmer(mem ~ 1 + timec + (1 + timec|pid), data=dat2, REML=FALSE)
anova(out2b, out2d)
## Data: dat2
## Models:
## out2b: mem ~ 1 + timec + (1 | pid)
## out2d: mem ~ 1 + timec + (1 + timec | pid)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## out2b 4 27366 27391 -13679 27358
## out2d 6 26440 26477 -13214 26428 930.34 2 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Check the output of the random slope model.
summary(out2d)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: mem ~ 1 + timec + (1 + timec | pid)
## Data: dat2
##
## AIC BIC logLik deviance df.resid
## 26439.9 26476.7 -13214.0 26427.9 3394
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.1630 -0.4818 -0.0089 0.5045 2.7987
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## pid (Intercept) 223.11 14.937
## timec 34.41 5.866 -0.12
## Residual 42.57 6.525
## Number of obs: 3400, groups: pid, 850
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 61.9679 0.5455 113.61
## timec -1.9686 0.2247 -8.76
##
## Correlation of Fixed Effects:
## (Intr)
## timec -0.222
Let’s investigate individual changes in senses of belonging across four years for both cohorts.
dat2_2 <- dat2[dat2$pid%%5 == 0,]
dat2_2$cohort <- factor(dat2_2$cohort, labels=c("2551", "2550"))
ggplot(dat2_2, aes(x=time, y=mem, group=pid, colour=cohort)) + geom_smooth(method=lm, se=FALSE, linewidth=0.5)
## `geom_smooth()` using formula = 'y ~ x'
Next, the X axis is changed to calcyear
which is the
calendar year of each data point.
dat2_2 <- dat2[dat2$pid%%5 == 0,]
dat2_2$cohort <- factor(dat2_2$cohort, labels=c("2551", "2550"))
dat2_2$calyear <- dat2_2$calyear + 2552
ggplot(dat2_2, aes(x=calyear, y=mem, group=pid, colour=cohort)) + geom_smooth(method=lm, se=FALSE, linewidth=0.5)
## `geom_smooth()` using formula = 'y ~ x'
Let’s add time-varying covariates in the model. In this example,
perceived stress measured in each year is the only time-varying
covariate. Let’s make the grand means, avestress
, and
group-mean-centered variables, diffstress
.
dat2$avestress <- ave(dat2$stress, dat2$pid)
dat2$diffstress <- dat2$stress - dat2$avestress
out2e <- lmer(mem ~ 1 + timec + diffstress + avestress + (1 + timec|pid), data=dat2, REML=FALSE)
summary(out2e)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: mem ~ 1 + timec + diffstress + avestress + (1 + timec | pid)
## Data: dat2
##
## AIC BIC logLik deviance df.resid
## 25552.9 25602.0 -12768.5 25536.9 3392
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.10411 -0.48077 0.00681 0.49606 2.61723
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## pid (Intercept) 234.68 15.319
## timec 35.29 5.941 -0.15
## Residual 25.92 5.091
## Number of obs: 3400, groups: pid, 850
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 56.78964 2.57724 22.035
## timec -1.88457 0.21823 -8.636
## diffstress 0.40878 0.01212 33.730
## avestress 0.10050 0.05011 2.006
##
## Correlation of Fixed Effects:
## (Intr) timec dffstr
## timec -0.045
## diffstress -0.004 0.011
## avestress -0.977 0.000 0.003
Both stress variables were retained because they were significant. Next, time-invariant covariates are added. To increase the interpretability of the regression coefficients, all time-invariate covariates are grand-mean centered. Thus, the grand means must be calculated by (a) making a new data set where rows represent each individual and (b) calculating the means across individuals.
dat2a <- dat2[!duplicated(dat2$pid),]
apply(dat2a, 2, mean)
## rowid pid time mem stress cohort
## 1699.0000000 425.5000000 1.0000000 62.0823529 50.4729412 0.5011765
## female act ext timec calyear avestress
## 0.4952941 50.2188235 49.5341176 0.0000000 -1.5011765 50.2685294
## diffstress
## 0.2044118
Next, all time-invariate covariates are centered at the calculated grand means and added to the model.
out2g <- lmer(mem ~ 1 + timec + diffstress + I(avestress - 50.473) + I(cohort - 0.501) + I(female - 0.495) + I(act - 50.219) + I(ext - 49.534) + (1 + timec|pid), data=dat2, REML=FALSE)
summary(out2g)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: mem ~ 1 + timec + diffstress + I(avestress - 50.473) + I(cohort -
## 0.501) + I(female - 0.495) + I(act - 50.219) + I(ext - 49.534) +
## (1 + timec | pid)
## Data: dat2
##
## AIC BIC logLik deviance df.resid
## 25225.2 25298.8 -12600.6 25201.2 3388
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.13559 -0.48411 0.00767 0.48967 2.65651
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## pid (Intercept) 226.73 15.057
## timec 35.30 5.941 -0.56
## Residual 25.91 5.091
## Number of obs: 3400, groups: pid, 850
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 61.86305 0.53681 115.243
## timec -1.88397 0.21824 -8.632
## diffstress 0.41171 0.01218 33.815
## I(avestress - 50.473) 0.11013 0.04117 2.675
## I(cohort - 0.501) 0.38350 1.08785 0.353
## I(female - 0.495) 1.73497 0.89133 1.946
## I(act - 50.219) 0.81816 0.04605 17.768
## I(ext - 49.534) -0.11612 0.05279 -2.200
##
## Correlation of Fixed Effects:
## (Intr) timec dffstr I(-50.4 I(-0.5 I(-0.4 I(-50.2
## timec -0.580
## diffstress -0.007 0.011
## I(v-50.473) 0.016 0.000 0.000
## I(ch-0.501) -0.001 0.000 0.001 -0.046
## I(fm-0.495) -0.001 0.000 0.000 0.001 0.128
## I(c-50.219) 0.001 0.000 0.000 0.035 -0.594 -0.176
## I(x-49.534) 0.000 0.000 0.000 -0.017 0.323 0.155 -0.520
Also, all cross-level interactions are added to predict the random
slopes of timec
.
out2h <- lmer(mem ~ 1 + timec + diffstress + timec*I(avestress - 50.473) + timec*I(cohort - 0.501) + timec*I(female - 0.495) + timec*I(act - 50.219) + timec*I(ext - 49.534) + (1 + timec|pid), data=dat2, REML=FALSE)
summary(out2h)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: mem ~ 1 + timec + diffstress + timec * I(avestress - 50.473) +
## timec * I(cohort - 0.501) + timec * I(female - 0.495) + timec *
## I(act - 50.219) + timec * I(ext - 49.534) + (1 + timec | pid)
## Data: dat2
##
## AIC BIC logLik deviance df.resid
## 24869.8 24974.1 -12417.9 24835.8 3383
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.0904 -0.4824 0.0017 0.4963 2.7450
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## pid (Intercept) 197.91 14.068
## timec 21.15 4.598 -0.46
## Residual 25.92 5.091
## Number of obs: 3400, groups: pid, 850
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 61.861269 0.504268 122.675
## timec -1.882677 0.176048 -10.694
## diffstress 0.407743 0.011986 34.019
## I(avestress - 50.473) 0.100033 0.047473 2.107
## I(cohort - 0.501) 3.454553 1.254292 2.754
## I(female - 0.495) 2.666791 1.027672 2.595
## I(act - 50.219) 0.299656 0.053091 5.644
## I(ext - 49.534) 0.184658 0.060866 3.034
## timec:I(avestress - 50.473) 0.007077 0.016573 0.427
## timec:I(cohort - 0.501) -2.153463 0.437908 -4.918
## timec:I(female - 0.495) -0.653397 0.358756 -1.821
## timec:I(act - 50.219) 0.363560 0.018534 19.616
## timec:I(ext - 49.534) -0.210898 0.021248 -9.926
##
## Correlation matrix not shown by default, as p = 13 > 12.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
cohort
, act
, and ext
significantly predicted the random slopes of timec
. Thus,
these three cross-level interactions are probed by the
interactions
package. Before using the package, all
variables must be centered by creating new variables.
dat2$avestressc <- dat2$avestress - 50.473
dat2$cohortc <- dat2$cohort - 0.501
dat2$femalec <- dat2$female - 0.495
dat2$actc <- dat2$act - 50.219
dat2$extc <- dat2$ext - 49.534
out2hn <- lmer(mem ~ 1 + timec + diffstress + timec*avestressc + timec*cohortc + timec*femalec + timec*actc + timec*extc + (1 + timec|pid), data=dat2, REML=FALSE)
Next, find values of time
, cohort
,
act
, and ext
for probing interactions. Don’t
forget to delete values from the grand means because all predictors are
centered by creating new variables.
cohortcval <- c(0, 1) - 0.501
actcval <- c(40, 50, 60) - 50.219
extcval <- c(40, 50, 60) - 49.534
timecval <- c(0, 1, 2, 3)
First, the simple slopes of timec
at each level of
cohort
is calculated.
ss21 <- sim_slopes(model=out2hn, pred=timec, modx=cohortc, modx.values=cohortcval)
ss21
## JOHNSON-NEYMAN INTERVAL
##
## When cohortc is OUTSIDE the interval [-1.49, -0.59], the slope of time is p
## < .05.
##
## Note: The range of observed values of cohortc is [-0.50, 0.50]
##
## SIMPLE SLOPES ANALYSIS
##
## Slope of time when cohortc = -0.501:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -0.81 0.28 -2.85 0.00
##
## Slope of time when cohortc = 0.499:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -2.96 0.28 -10.51 0.00
Plot the interaction where X-axis is timec
and each line
represents each value of cohort
.
interact_plot(model=out2hn, pred=timec, modx=cohortc, modx.values=cohortcval)
Second, the simple slopes of cohort
at each level of
timec
is calculated.
ss22 <- sim_slopes(model=out2hn, pred=cohortc, modx=timec, modx.values=timecval)
ss22
## JOHNSON-NEYMAN INTERVAL
##
## When time is OUTSIDE the interval [1.55, 3.72], the slope of cohortc is p <
## .05.
##
## Note: The range of observed values of time is [1.00, 4.00]
##
## SIMPLE SLOPES ANALYSIS
##
## Slope of cohortc when time = 0.00:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 5.61 1.53 3.68 0.00
##
## Slope of cohortc when time = 1.00:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 3.45 1.26 2.74 0.01
##
## Slope of cohortc when time = 2.00:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 1.30 1.11 1.17 0.24
##
## Slope of cohortc when time = 3.00:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -0.85 1.12 -0.76 0.45
Plot the interaction where X-axis is cohort
and each
line represents each value of timec
.
interact_plot(model=out2hn, pred=cohortc, modx=timec, modx.values=timecval)
Third, the simple slopes of timec
at each level of
act
is calculated.
ss23 <- sim_slopes(model=out2hn, pred=timec, modx=actc, modx.values=actcval)
ss23
## JOHNSON-NEYMAN INTERVAL
##
## When actc is OUTSIDE the interval [4.15, 6.32], the slope of time is p <
## .05.
##
## Note: The range of observed values of actc is [-41.22, 36.78]
##
## SIMPLE SLOPES ANALYSIS
##
## Slope of time when actc = -10.219:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -5.60 0.26 -21.58 0.00
##
## Slope of time when actc = -0.219:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -1.96 0.18 -11.12 0.00
##
## Slope of time when actc = 9.781:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 1.67 0.25 6.59 0.00
Plot the interaction where X-axis is timec
and each line
represents each value of act
.
interact_plot(model=out2hn, pred=timec, modx=actc, modx.values=actcval)
Fourth, the simple slopes of act
at each level of
timec
is calculated.
ss24 <- sim_slopes(model=out2hn, pred=actc, modx=timec, modx.values=timecval)
ss24
## JOHNSON-NEYMAN INTERVAL
##
## When time is OUTSIDE the interval [-0.18, 0.49], the slope of actc is p <
## .05.
##
## Note: The range of observed values of time is [1.00, 4.00]
##
## SIMPLE SLOPES ANALYSIS
##
## Slope of actc when time = 0.00:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -0.06 0.06 -0.99 0.32
##
## Slope of actc when time = 1.00:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.30 0.05 5.62 0.00
##
## Slope of actc when time = 2.00:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.66 0.05 14.15 0.00
##
## Slope of actc when time = 3.00:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 1.03 0.05 21.65 0.00
Plot the interaction where X-axis is act
and each line
represents each value of timec
.
interact_plot(model=out2hn, pred=actc, modx=timec, modx.values=timecval)
Fifth, the simple slopes of timec
at each level of
ext
is calculated.
ss25 <- sim_slopes(model=out2hn, pred=timec, modx=extc, modx.values=extcval)
ss25
## JOHNSON-NEYMAN INTERVAL
##
## When extc is OUTSIDE the interval [-11.78, -6.82], the slope of time is p <
## .05.
##
## Note: The range of observed values of extc is [-26.53, 32.47]
##
## SIMPLE SLOPES ANALYSIS
##
## Slope of time when extc = -9.534:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.13 0.27 0.47 0.64
##
## Slope of time when extc = 0.466:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -1.98 0.18 -11.21 0.00
##
## Slope of time when extc = 10.466:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -4.09 0.28 -14.38 0.00
Plot the interaction where X-axis is timec
and each line
represents each value of ext
.
interact_plot(model=out2hn, pred=timec, modx=extc, modx.values=extcval)
Finally, the simple slopes of ext
at each level of
timec
is calculated.
ss26 <- sim_slopes(model=out2hn, pred=extc, modx=timec, modx.values=timecval)
ss26
## JOHNSON-NEYMAN INTERVAL
##
## When time is OUTSIDE the interval [1.34, 2.37], the slope of extc is p <
## .05.
##
## Note: The range of observed values of time is [1.00, 4.00]
##
## SIMPLE SLOPES ANALYSIS
##
## Slope of extc when time = 0.00:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.40 0.07 5.34 0.00
##
## Slope of extc when time = 1.00:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.18 0.06 3.02 0.00
##
## Slope of extc when time = 2.00:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -0.03 0.05 -0.49 0.63
##
## Slope of extc when time = 3.00:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -0.24 0.05 -4.36 0.00
Plot the interaction where X-axis is ext
and each line
represents each value of timec
.
interact_plot(model=out2hn, pred=extc, modx=timec, modx.values=timecval)