Intro to Multilevel Model: Lecture 11 Example Code

Growth Curve Model

Example 1: Changes in Job Satisfaction

Read the new data set representing a longitudinal study which employees data are collected in four time points.

dat1 <- read.table("lecture11ex1.csv", sep=",", header=TRUE)

See descriptive statistics of all variables within the data set.

library(psych)
describe(dat1)

##           vars    n   mean     sd median trimmed    mad min  max range  skew
## rowid        1 1600 800.50 462.02  800.5  800.50 593.04   1 1600  1599  0.00
## pid          2 1600 200.50 115.51  200.5  200.50 148.26   1  400   399  0.00
## jsat         3 1600  48.14  10.28   47.0   47.85  10.38  12   96    84  0.27
## time         4 1600   1.50   1.12    1.5    1.50   1.48   0    3     3  0.00
## married      5 1600   0.37   0.48    0.0    0.34   0.00   0    1     1  0.53
## age0         6 1600  28.38   6.14   27.0   27.85   5.93  18   50    32  0.82
## tenure0      7 1600   2.05   1.40    1.8    1.92   1.37   0    8     8  0.93
## fftj         8 1600   0.46   0.50    0.0    0.45   0.00   0    1     1  0.15
## officejob    9 1600   0.33   0.47    0.0    0.29   0.00   0    1     1  0.72
## male        10 1600   0.68   0.47    1.0    0.72   0.00   0    1     1 -0.77
##           kurtosis    se
## rowid        -1.20 11.55
## pid          -1.20  2.89
## jsat          0.47  0.26
## time         -1.36  0.03
## married      -1.72  0.01
## age0          0.62  0.15
## tenure0       0.85  0.04
## fftj         -1.98  0.01
## officejob    -1.48  0.01
## male         -1.41  0.01

Let’s run the null model of job satisfaction to check the intraclass correlation.

library(lme4)
out1a <- lmer(jsat ~ 1 + (1|pid), data=dat1, REML=FALSE)
summary(out1a)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: jsat ~ 1 + (1 | pid)
##    Data: dat1
## 
##      AIC      BIC   logLik deviance df.resid 
##  10919.4  10935.5  -5456.7  10913.4     1597 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.9497 -0.5921  0.0149  0.5842  3.3077 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  pid      (Intercept) 76.51    8.747   
##  Residual             29.15    5.399   
## Number of obs: 1600, groups:  pid, 400
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  48.1356     0.4577   105.2

Use time to predict job satisfaction without random slopes.

out1b <- lmer(jsat ~ 1 + time + (1|pid), data=dat1, REML=FALSE)
summary(out1b)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: jsat ~ 1 + time + (1 | pid)
##    Data: dat1
## 
##      AIC      BIC   logLik deviance df.resid 
##  10881.7  10903.2  -5436.9  10873.7     1596 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.7741 -0.5993  0.0087  0.5622  3.5582 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  pid      (Intercept) 76.75    8.761   
##  Residual             28.20    5.310   
## Number of obs: 1600, groups:  pid, 400
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  49.2663     0.4912 100.308
## time         -0.7538     0.1187  -6.348
## 
## Correlation of Fixed Effects:
##      (Intr)
## time -0.363

Next, add the random slope of time in the model. That is, every employee has different changes in job satisfaction. The likelihood ratio test was significant so the random slope is retained.

out1c <- lmer(jsat ~ 1 + time + (1 + time|pid), data=dat1, REML=FALSE)
anova(out1b, out1c)

## Data: dat1
## Models:
## out1b: jsat ~ 1 + time + (1 | pid)
## out1c: jsat ~ 1 + time + (1 + time | pid)
##       npar   AIC   BIC  logLik deviance  Chisq Df Pr(>Chisq)    
## out1b    4 10882 10903 -5436.9    10874                         
## out1c    6 10720 10753 -5354.2    10708 165.23  2  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Check the output of the random slope model.

summary(out1c)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: jsat ~ 1 + time + (1 + time | pid)
##    Data: dat1
## 
##      AIC      BIC   logLik deviance df.resid 
##  10720.5  10752.8  -5354.2  10708.5     1594 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.52917 -0.52309  0.00172  0.53373  2.43188 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  pid      (Intercept) 57.34    7.572        
##           time         5.04    2.245    0.20
##  Residual             19.80    4.450        
## Number of obs: 1600, groups:  pid, 400
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  49.2663     0.4219 116.772
## time         -0.7537     0.1500  -5.025
## 
## Correlation of Fixed Effects:
##      (Intr)
## time -0.101

Let’s investigate individual changes in job satisfaction across three years. The ggplot2 package is used to group each person data. Then, the regression lines are plotted for each person. Note that only every 10th person will be used in the plot so the plot does not have too many regression lines.

library(ggplot2)

## 
## Attaching package: 'ggplot2'

## The following objects are masked from 'package:psych':
## 
##     %+%, alpha

dat1_2 <- dat1[dat1$pid%%10 == 0,]
ggplot(dat1_2, aes(x=time, y=jsat, group=pid)) + geom_smooth(method=lm, se=FALSE)

## `geom_smooth()` using formula = 'y ~ x'

Next, the X axis is changed to age which is the sum of the age at the beginning, age0, and the time point, time.

dat1_2 <- dat1[dat1$pid%%3 == 0,]
dat1_2$age <- dat1_2$age0 + dat1_2$time
ggplot(dat1_2, aes(x=age, y=jsat, group=pid)) + geom_smooth(method=lm, se=FALSE)

## `geom_smooth()` using formula = 'y ~ x'

Finally, the X axis is changed to tenure which is the sum of the tenure at the beginning, tenure0, and the time point, time.

dat1_2 <- dat1[dat1$pid%%5 == 0,]
dat1_2$tenure <- dat1_2$tenure0 + dat1_2$time
ggplot(dat1_2, aes(x=tenure, y=jsat, group=pid)) + geom_smooth(method=lm, se=FALSE)

## `geom_smooth()` using formula = 'y ~ x'

Let’s recenter the time variable. Previously, time == 0 represents the beginning of the study. The time variable is subtracted by 3 called timel so timel == 0 represents the last time point (at Year 3). Check the result of this centering in the random slope model.

dat1$timel <- dat1$time - 3
out1c1 <- lmer(jsat ~ 1 + timel + (1 + timel|pid), data=dat1, REML=FALSE)
summary(out1c1)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: jsat ~ 1 + timel + (1 + timel | pid)
##    Data: dat1
## 
##      AIC      BIC   logLik deviance df.resid 
##  10720.5  10752.8  -5354.2  10708.5     1594 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.52916 -0.52310  0.00171  0.53373  2.43187 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  pid      (Intercept) 123.04   11.092       
##           timel         5.04    2.245   0.74
##  Residual              19.80    4.450       
## Number of obs: 1600, groups:  pid, 400
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  47.0050     0.5850  80.347
## timel        -0.7538     0.1500  -5.025
## 
## Correlation of Fixed Effects:
##       (Intr)
## timel 0.697

Let’s add time-varying covariates in the model. In this example, whether an employee is married is only time-varying covariate. First, this variable is added without centering.

out1d <- lmer(jsat ~ 1 + time + married + (1 + time|pid), data=dat1, REML=FALSE)
summary(out1d)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: jsat ~ 1 + time + married + (1 + time | pid)
##    Data: dat1
## 
##      AIC      BIC   logLik deviance df.resid 
##  10721.8  10759.5  -5353.9  10707.8     1593 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.54359 -0.52100  0.00137  0.52261  2.42010 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  pid      (Intercept) 56.867   7.541        
##           time         5.046   2.246    0.19
##  Residual             19.841   4.454        
## Number of obs: 1600, groups:  pid, 400
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  49.1621     0.4394 111.894
## time         -0.7932     0.1577  -5.031
## married       0.4386     0.5351   0.820
## 
## Correlation of Fixed Effects:
##         (Intr) time  
## time    -0.007       
## married -0.289 -0.305

As discussed in Lecture 7, if the level-1 variables are added in the model without centering, the level-1 and level-2 effects are equally constrained, which should not be assumed without testing. Thus, married is separated into employee means, avemarried and within-employee deviations, diffmarried. Then, both variables are put in the model in addition to time.

dat1$avemarried <- ave(dat1$married, dat1$pid)
dat1$diffmarried <- dat1$married - dat1$avemarried
out1e <- lmer(jsat ~ 1 + time + diffmarried + avemarried + (1 + time|pid), data=dat1, REML=FALSE)
summary(out1e)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: jsat ~ 1 + time + diffmarried + avemarried + (1 + time | pid)
##    Data: dat1
## 
##      AIC      BIC   logLik deviance df.resid 
##    10712    10755    -5348    10696     1592 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.56584 -0.51463  0.00135  0.52804  2.39684 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  pid      (Intercept) 55.988   7.482        
##           time         5.041   2.245    0.15
##  Residual             19.770   4.446        
## Number of obs: 1600, groups:  pid, 400
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  47.9283     0.5593  85.698
## time         -0.7004     0.1599  -4.380
## diffmarried  -0.5932     0.6164  -0.962
## avemarried    3.3768     0.9840   3.432
## 
## Correlation of Fixed Effects:
##             (Intr) time   dffmrr
## time        -0.134              
## diffmarried  0.116 -0.347       
## avemarried  -0.648 -0.017  0.050

Because diffmarried is not significant, it is dropped from the model. Only avemarried is retained, which is time invariant. Other time-invariant covariates are added in the model.

out1h <- lmer(jsat ~ 1 + time + avemarried + I(age0-25) + I(tenure0-5) + fftj + officejob + (1 + time|pid), data=dat1, REML=FALSE)
summary(out1h)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: jsat ~ 1 + time + avemarried + I(age0 - 25) + I(tenure0 - 5) +  
##     fftj + officejob + (1 + time | pid)
##    Data: dat1
## 
##      AIC      BIC   logLik deviance df.resid 
##  10629.2  10688.4  -5303.6  10607.2     1589 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.52034 -0.52056  0.01208  0.52926  2.40706 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  pid      (Intercept) 46.27    6.802         
##           time         5.04    2.245    -0.07
##  Residual             19.80    4.450         
## Number of obs: 1600, groups:  pid, 400
## 
## Fixed effects:
##                Estimate Std. Error t value
## (Intercept)    46.38997    1.27225  36.463
## time           -0.75375    0.15000  -5.025
## avemarried      2.15336    0.91933   2.342
## I(age0 - 25)    0.69220    0.06864  10.084
## I(tenure0 - 5) -0.55378    0.30031  -1.844
## fftj           -2.74403    0.84507  -3.247
## officejob      -1.89139    0.83488  -2.265
## 
## Correlation of Fixed Effects:
##             (Intr) time   avmrrd I(0-25 I(0-5) fftj  
## time        -0.091                                   
## avemarried  -0.378  0.000                            
## I(age0-25)  -0.233  0.000 -0.144                     
## I(tenur0-5)  0.824  0.000 -0.158 -0.228              
## fftj        -0.462  0.000  0.021 -0.267 -0.228       
## officejob   -0.223  0.000  0.084  0.062  0.120  0.182

To increase the interpretability of the regression coefficients, all time-invariate covariates are grand-mean centered. Thus, the grand means must be calculated by (a) making a new data set where rows represent each individual and (b) calculating the means across individuals.

dat1a <- dat1[!duplicated(dat1$pid),]
apply(dat1a, 2, mean)

##       rowid         pid        jsat        time     married        age0 
##  799.000000  200.500000   49.315000    0.000000    0.232500   28.375000 
##     tenure0        fftj   officejob        male       timel  avemarried 
##    2.054317    0.462500    0.330000    0.680000   -3.000000    0.372500 
## diffmarried 
##   -0.140000

Next, all time-invariate covariates are centered at the calculated grand means. Also, all cross-level interactions are added to predict the random slopes of time.

out1n <- lmer(jsat ~ 1 + time + time*I(avemarried-0.3725) + time*I(age0-28.375) + time*I(tenure0-2.054) + time*I(fftj-0.4625) + time*I(officejob-0.33) + (1 + time|pid), data=dat1, REML=FALSE)
summary(out1n)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: jsat ~ 1 + time + time * I(avemarried - 0.3725) + time * I(age0 -  
##     28.375) + time * I(tenure0 - 2.054) + time * I(fftj - 0.4625) +  
##     time * I(officejob - 0.33) + (1 + time | pid)
##    Data: dat1
## 
##      AIC      BIC   logLik deviance df.resid 
##  10427.5  10513.5  -5197.7  10395.5     1584 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.57301 -0.54946  0.00844  0.54756  2.62571 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  pid      (Intercept) 44.043   6.636        
##           time         1.341   1.158    0.24
##  Residual             19.801   4.450        
## Number of obs: 1600, groups:  pid, 400
## 
## Fixed effects:
##                              Estimate Std. Error t value
## (Intercept)                 49.266682   0.380471 129.489
## time                        -0.754080   0.115116  -6.551
## I(avemarried - 0.3725)       2.157200   0.945739   2.281
## I(age0 - 28.375)             0.584070   0.070616   8.271
## I(tenure0 - 2.054)          -1.361880   0.308940  -4.408
## I(fftj - 0.4625)            -2.558328   0.869350  -2.943
## I(officejob - 0.33)         -1.752716   0.858865  -2.041
## time:I(avemarried - 0.3725) -0.004955   0.286145  -0.017
## time:I(age0 - 28.375)        0.139417   0.021366   6.525
## time:I(tenure0 - 2.054)      1.041940   0.093474  11.147
## time:I(fftj - 0.4625)       -0.239439   0.263033  -0.910
## time:I(officejob - 0.33)    -0.178807   0.259861  -0.688
## 
## Correlation of Fixed Effects:
##             (Intr) time   I(-0.37 I(0-28 I(0-2. I(-0.4 I(-0.33 t:I(-0.37
## time        -0.235                                                      
## I(v-0.3725)  0.000  0.000                                               
## I(0-28.375)  0.000  0.000 -0.144                                        
## I(t0-2.054)  0.000  0.000 -0.158  -0.228                                
## I(f-0.4625)  0.000  0.000  0.021  -0.267 -0.228                         
## I(ffc-0.33)  0.000  0.000  0.084   0.062  0.120  0.182                  
## t:I(-0.3725  0.000  0.000 -0.235   0.034  0.037 -0.005 -0.020           
## t:I(0-28.37  0.000  0.000  0.034  -0.235  0.053  0.063 -0.015  -0.144   
## t:I(0-2.054  0.000  0.000  0.037   0.053 -0.235  0.054 -0.028  -0.158   
## t:I(-0.4625  0.000  0.000 -0.005   0.063  0.054 -0.235 -0.043   0.021   
## tm:I(-0.33)  0.000  0.000 -0.020  -0.015 -0.028 -0.043 -0.235   0.084   
##             t:I(0-28 t:I(0-2. t:I(-0.4
## time                                  
## I(v-0.3725)                           
## I(0-28.375)                           
## I(t0-2.054)                           
## I(f-0.4625)                           
## I(ffc-0.33)                           
## t:I(-0.3725                           
## t:I(0-28.37                           
## t:I(0-2.054 -0.228                    
## t:I(-0.4625 -0.267   -0.228           
## tm:I(-0.33)  0.062    0.120    0.182

age0 and tenure0 significantly predicted the random slopes of time. Thus, both cross-level interactions are probed by the interactions package. Before using the package, all variables must be centered by creating new variables.

dat1$age0c <- dat1$age0-28.375
dat1$avemarriedc <- dat1$avemarried - 0.3725
dat1$tenure0c <- dat1$tenure0 - 2.054
dat1$fftjc <- dat1$fftj - 0.4625
dat1$officejobc <- dat1$officejob - 0.33
out1nn <- lmer(jsat ~ 1 + time + time*avemarriedc + time*age0c + time*tenure0c + time*fftjc + time*officejobc + (1 + time|pid), data=dat1, REML=FALSE)

Next, find values of time, age0, and tenure0 for probing interactions. Don’t forget to delete values from the grand means because all predictors are centered by creating new variables.

age0cval <- c(25, 30, 35) - 28.375
tenure0cval <- c(0, 2, 4) - 2.054
timeval <- c(0, 1, 2, 3)

First, the simple slopes of time at each level of age0 is calculated.

library(interactions)
ss11 <- sim_slopes(model=out1nn, pred=time, modx=age0c, modx.values=age0cval)

ss11

## JOHNSON-NEYMAN INTERVAL 
## 
## When age0c is OUTSIDE the interval [3.48, 8.41], the slope of time is p <
## .05.
## 
## Note: The range of observed values of age0c is [-10.38, 21.62]
## 
## SIMPLE SLOPES ANALYSIS 
## 
## Slope of time when age0c = -3.375: 
## 
##    Est.   S.E.   t val.      p
## ------- ------ -------- ------
##   -1.22   0.14    -8.95   0.00
## 
## Slope of time when age0c =  1.625: 
## 
##    Est.   S.E.   t val.      p
## ------- ------ -------- ------
##   -0.53   0.12    -4.35   0.00
## 
## Slope of time when age0c =  6.625: 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   0.17   0.18     0.92   0.36

Plot the interaction where X-axis is time and each line represents each value of age0.

interact_plot(model=out1nn, pred=time, modx=age0c, modx.values=age0cval)

Second, the simple slopes of age0 at each level of time is calculated.

ss12 <- sim_slopes(model=out1nn, pred=age0c, modx=time, modx.values=timeval)

ss12

## JOHNSON-NEYMAN INTERVAL 
## 
## When time is OUTSIDE the interval [-6.61, -2.75], the slope of age0c is p <
## .05.
## 
## Note: The range of observed values of time is [0.00, 3.00]
## 
## SIMPLE SLOPES ANALYSIS 
## 
## Slope of age0c when time = 0.00: 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   0.58   0.07     8.21   0.00
## 
## Slope of age0c when time = 1.00: 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   0.72   0.07    10.43   0.00
## 
## Slope of age0c when time = 2.00: 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   0.86   0.07    11.66   0.00
## 
## Slope of age0c when time = 3.00: 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   1.00   0.08    11.91   0.00

Plot the interaction where X-axis is age0 and each line represents each value of time.

interact_plot(model=out1nn, pred=age0c, modx=time, modx.values=timeval)

Third, the simple slopes of time at each level of tenure0 is calculated.

ss13 <- sim_slopes(model=out1nn, pred=time, modx=tenure0c, modx.values=tenure0cval)

ss13

## JOHNSON-NEYMAN INTERVAL 
## 
## When tenure0c is OUTSIDE the interval [0.49, 1.00], the slope of time is p
## < .05.
## 
## Note: The range of observed values of tenure0c is [-2.05, 5.95]
## 
## SIMPLE SLOPES ANALYSIS 
## 
## Slope of time when tenure0c = -2.054: 
## 
##    Est.   S.E.   t val.      p
## ------- ------ -------- ------
##   -2.89   0.23   -12.83   0.00
## 
## Slope of time when tenure0c = -0.054: 
## 
##    Est.   S.E.   t val.      p
## ------- ------ -------- ------
##   -0.81   0.12    -6.98   0.00
## 
## Slope of time when tenure0c =  1.946: 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   1.27   0.22     5.87   0.00

Plot the interaction where X-axis is time and each line represents each value of tenure0.

interact_plot(model=out1nn, pred=time, modx=tenure0c, modx.values=tenure0cval)

Finally, the simple slopes of tenure0 at each level of time is calculated.

ss14 <- sim_slopes(model=out1nn, pred=tenure0c, modx=time, modx.values=timeval)

ss14

## JOHNSON-NEYMAN INTERVAL 
## 
## When time is OUTSIDE the interval [0.74, 1.91], the slope of tenure0c is p
## < .05.
## 
## Note: The range of observed values of time is [0.00, 3.00]
## 
## SIMPLE SLOPES ANALYSIS 
## 
## Slope of tenure0c when time = 0.00: 
## 
##    Est.   S.E.   t val.      p
## ------- ------ -------- ------
##   -1.36   0.31    -4.38   0.00
## 
## Slope of tenure0c when time = 1.00: 
## 
##    Est.   S.E.   t val.      p
## ------- ------ -------- ------
##   -0.32   0.30    -1.05   0.29
## 
## Slope of tenure0c when time = 2.00: 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   0.72   0.32     2.23   0.03
## 
## Slope of tenure0c when time = 3.00: 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   1.76   0.37     4.79   0.00

Plot the interaction where X-axis is tenure0 and each line represents each value of time.

interact_plot(model=out1nn, pred=tenure0c, modx=time, modx.values=timeval)

Example 2: Changes in Senses of Belonging in Undergraduates

Read the new data set representing a longitudinal study which undergraduate students data are collected in four school years.

dat2 <- read.table("lecture11ex2.csv", sep=",", header=TRUE)

See descriptive statistics of all variables within the data set.

describe(dat2)

##        vars    n    mean     sd median trimmed     mad min  max range  skew
## rowid     1 3400 1700.50 981.64 1700.5 1700.50 1260.21   1 3400  3399  0.00
## pid       2 3400  425.50 245.41  425.5  425.50  315.05   1  850   849  0.00
## time      3 3400    2.50   1.12    2.5    2.50    1.48   1    4     3  0.00
## mem       4 3400   59.02  18.97   58.0   59.13   19.27  -1  113   114 -0.05
## stress    5 3400   50.27  13.63   50.0   50.26   13.34   3   99    96  0.02
## cohort    6 3400    0.50   0.50    1.0    0.50    0.00   0    1     1  0.00
## female    7 3400    0.50   0.50    0.0    0.49    0.00   0    1     1  0.02
## act       8 3400   50.22  13.12   50.5   50.29   14.08   9   87    78 -0.06
## ext       9 3400   49.53   9.73   49.0   49.47    8.90  23   82    59  0.10
##        kurtosis    se
## rowid     -1.20 16.83
## pid       -1.20  4.21
## time      -1.36  0.02
## mem       -0.32  0.33
## stress     0.10  0.23
## cohort    -2.00  0.01
## female    -2.00  0.01
## act       -0.46  0.23
## ext        0.00  0.17

Let’s run the null model of senses of belonging to check the intraclass correlation.

out2a <- lmer(mem ~ 1 + (1|pid), data=dat2, REML=FALSE)
summary(out2a)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: mem ~ 1 + (1 | pid)
##    Data: dat2
## 
##      AIC      BIC   logLik deviance df.resid 
##  27524.0  27542.4 -13759.0  27518.0     3397 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.3921 -0.5104 -0.0082  0.5316  3.8758 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  pid      (Intercept) 253.5    15.92   
##  Residual             106.4    10.31   
## Number of obs: 3400, groups:  pid, 850
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   59.015      0.574   102.8

Use time to predict senses of belonging without random slopes. Note that time in this example starts with 1 so this variable is subtracted by 1 to make the intercept represent the sense of belonging at freshman year.

dat2$timec <- dat2$time - 1
out2b <- lmer(mem ~ 1 + timec + (1|pid), data=dat2, REML=FALSE)
summary(out2b)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: mem ~ 1 + timec + (1 | pid)
##    Data: dat2
## 
##      AIC      BIC   logLik deviance df.resid 
##  27366.3  27390.8 -13679.1  27358.3     3396 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.7910 -0.5418 -0.0133  0.5594  3.7069 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  pid      (Intercept) 255.11   15.972  
##  Residual              99.93    9.996  
## Number of obs: 3400, groups:  pid, 850
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  61.9679     0.6184  100.21
## timec        -1.9686     0.1533  -12.84
## 
## Correlation of Fixed Effects:
##       (Intr)
## timec -0.372

As another method, calendar year is used as the predictor. Note that this example has two cohorts: B.E. 2550 (2007) (cohort == 1) and B.E. 2551 (2008) (cohort == 0). If B.E. 2552 (2009) is centered at 0, then four years in the B.E. 2550 cohort should be -2, -1, 0, 1 and those in the B.E. 2551 cohort should be -1, 0, 1, 2.

dat2$calyear <- dat2$time - 3
dat2[dat2$cohort == 0, "calyear"] <- dat2[dat2$cohort == 0, "calyear"] + 1
out2b1 <- lmer(mem ~ 1 + calyear + (1|pid), data=dat2, REML=FALSE)
summary(out2b1)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: mem ~ 1 + calyear + (1 | pid)
##    Data: dat2
## 
##      AIC      BIC   logLik deviance df.resid 
##  27331.7  27356.3 -13661.9  27323.7     3396 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.8101 -0.5480 -0.0108  0.5635  3.6866 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  pid      (Intercept) 243.51   15.605  
##  Residual              99.98    9.999  
## Number of obs: 3400, groups:  pid, 850
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   59.013      0.562  105.00
## calyear       -2.149      0.152  -14.14
## 
## Correlation of Fixed Effects:
##         (Intr)
## calyear 0.000

Let’s keep using time. Next, the curvilinear changes are investigated but the changes were not significant.

out2c <- lmer(mem ~ 1 + timec + I(timec^2) + (1|pid), data=dat2, REML=FALSE)
summary(out2c)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: mem ~ 1 + timec + I(timec^2) + (1 | pid)
##    Data: dat2
## 
##      AIC      BIC   logLik deviance df.resid 
##  27367.4  27398.0 -13678.7  27357.4     3395 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.8079 -0.5372 -0.0080  0.5689  3.6912 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  pid      (Intercept) 255.12   15.973  
##  Residual              99.89    9.994  
## Number of obs: 3400, groups:  pid, 850
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  62.1311     0.6417  96.822
## timec        -2.4583     0.5366  -4.581
## I(timec^2)    0.1632     0.1714   0.952
## 
## Correlation of Fixed Effects:
##            (Intr) timec 
## timec      -0.358       
## I(timec^2)  0.267 -0.958

Plot the nonsignificant curvilinear changes and see that the curvature is not strong.

mytime <- 0:3
mypred <- 62.1311 - 2.4583*mytime + 0.1632*mytime^2
plot(mytime, mypred, type="l")

Next, add the random slope of timec in the model. That is, every employee has different changes in job satisfaction. The likelihood ratio test was significant so the random slope is retained.

out2d <- lmer(mem ~ 1 + timec + (1 + timec|pid), data=dat2, REML=FALSE)
anova(out2b, out2d)

## Data: dat2
## Models:
## out2b: mem ~ 1 + timec + (1 | pid)
## out2d: mem ~ 1 + timec + (1 + timec | pid)
##       npar   AIC   BIC logLik deviance  Chisq Df Pr(>Chisq)    
## out2b    4 27366 27391 -13679    27358                         
## out2d    6 26440 26477 -13214    26428 930.34  2  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Check the output of the random slope model.

summary(out2d)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: mem ~ 1 + timec + (1 + timec | pid)
##    Data: dat2
## 
##      AIC      BIC   logLik deviance df.resid 
##  26439.9  26476.7 -13214.0  26427.9     3394 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.1630 -0.4818 -0.0089  0.5045  2.7987 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  pid      (Intercept) 223.11   14.937        
##           timec        34.41    5.866   -0.12
##  Residual              42.57    6.525        
## Number of obs: 3400, groups:  pid, 850
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  61.9679     0.5455  113.61
## timec        -1.9686     0.2247   -8.76
## 
## Correlation of Fixed Effects:
##       (Intr)
## timec -0.222

Let’s investigate individual changes in senses of belonging across four years for both cohorts.

dat2_2 <- dat2[dat2$pid%%5 == 0,]
dat2_2$cohort <- factor(dat2_2$cohort, labels=c("2551", "2550"))
ggplot(dat2_2, aes(x=time, y=mem, group=pid, colour=cohort)) + geom_smooth(method=lm, se=FALSE, linewidth=0.5)

## `geom_smooth()` using formula = 'y ~ x'

Next, the X axis is changed to calcyear which is the calendar year of each data point.

dat2_2 <- dat2[dat2$pid%%5 == 0,]
dat2_2$cohort <- factor(dat2_2$cohort, labels=c("2551", "2550"))
dat2_2$calyear <- dat2_2$calyear + 2552
ggplot(dat2_2, aes(x=calyear, y=mem, group=pid, colour=cohort)) + geom_smooth(method=lm, se=FALSE, linewidth=0.5)

## `geom_smooth()` using formula = 'y ~ x'

Let’s add time-varying covariates in the model. In this example, perceived stress measured in each year is the only time-varying covariate. Let’s make the grand means, avestress, and group-mean-centered variables, diffstress.

dat2$avestress <- ave(dat2$stress, dat2$pid)
dat2$diffstress <- dat2$stress - dat2$avestress
out2e <- lmer(mem ~ 1 + timec + diffstress + avestress + (1 + timec|pid), data=dat2, REML=FALSE)
summary(out2e)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: mem ~ 1 + timec + diffstress + avestress + (1 + timec | pid)
##    Data: dat2
## 
##      AIC      BIC   logLik deviance df.resid 
##  25552.9  25602.0 -12768.5  25536.9     3392 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -3.10411 -0.48077  0.00681  0.49606  2.61723 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  pid      (Intercept) 234.68   15.319        
##           timec        35.29    5.941   -0.15
##  Residual              25.92    5.091        
## Number of obs: 3400, groups:  pid, 850
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) 56.78964    2.57724  22.035
## timec       -1.88457    0.21823  -8.636
## diffstress   0.40878    0.01212  33.730
## avestress    0.10050    0.05011   2.006
## 
## Correlation of Fixed Effects:
##            (Intr) timec  dffstr
## timec      -0.045              
## diffstress -0.004  0.011       
## avestress  -0.977  0.000  0.003

Both stress variables were retained because they were significant. Next, time-invariant covariates are added. To increase the interpretability of the regression coefficients, all time-invariate covariates are grand-mean centered. Thus, the grand means must be calculated by (a) making a new data set where rows represent each individual and (b) calculating the means across individuals.

dat2a <- dat2[!duplicated(dat2$pid),]
apply(dat2a, 2, mean)

##        rowid          pid         time          mem       stress       cohort 
## 1699.0000000  425.5000000    1.0000000   62.0823529   50.4729412    0.5011765 
##       female          act          ext        timec      calyear    avestress 
##    0.4952941   50.2188235   49.5341176    0.0000000   -1.5011765   50.2685294 
##   diffstress 
##    0.2044118

Next, all time-invariate covariates are centered at the calculated grand means and added to the model.

out2g <- lmer(mem ~ 1 + timec + diffstress + I(avestress - 50.473) + I(cohort - 0.501) + I(female - 0.495) + I(act - 50.219) + I(ext - 49.534) + (1 + timec|pid), data=dat2, REML=FALSE)
summary(out2g)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: mem ~ 1 + timec + diffstress + I(avestress - 50.473) + I(cohort -  
##     0.501) + I(female - 0.495) + I(act - 50.219) + I(ext - 49.534) +  
##     (1 + timec | pid)
##    Data: dat2
## 
##      AIC      BIC   logLik deviance df.resid 
##  25225.2  25298.8 -12600.6  25201.2     3388 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -3.13559 -0.48411  0.00767  0.48967  2.65651 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  pid      (Intercept) 226.73   15.057        
##           timec        35.30    5.941   -0.56
##  Residual              25.91    5.091        
## Number of obs: 3400, groups:  pid, 850
## 
## Fixed effects:
##                       Estimate Std. Error t value
## (Intercept)           61.86305    0.53681 115.243
## timec                 -1.88397    0.21824  -8.632
## diffstress             0.41171    0.01218  33.815
## I(avestress - 50.473)  0.11013    0.04117   2.675
## I(cohort - 0.501)      0.38350    1.08785   0.353
## I(female - 0.495)      1.73497    0.89133   1.946
## I(act - 50.219)        0.81816    0.04605  17.768
## I(ext - 49.534)       -0.11612    0.05279  -2.200
## 
## Correlation of Fixed Effects:
##             (Intr) timec  dffstr I(-50.4 I(-0.5 I(-0.4 I(-50.2
## timec       -0.580                                            
## diffstress  -0.007  0.011                                     
## I(v-50.473)  0.016  0.000  0.000                              
## I(ch-0.501) -0.001  0.000  0.001 -0.046                       
## I(fm-0.495) -0.001  0.000  0.000  0.001   0.128               
## I(c-50.219)  0.001  0.000  0.000  0.035  -0.594 -0.176        
## I(x-49.534)  0.000  0.000  0.000 -0.017   0.323  0.155 -0.520

Also, all cross-level interactions are added to predict the random slopes of timec.

out2h <- lmer(mem ~ 1 + timec + diffstress + timec*I(avestress - 50.473) + timec*I(cohort - 0.501) + timec*I(female - 0.495) + timec*I(act - 50.219) + timec*I(ext - 49.534) + (1 + timec|pid), data=dat2, REML=FALSE)
summary(out2h)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: mem ~ 1 + timec + diffstress + timec * I(avestress - 50.473) +  
##     timec * I(cohort - 0.501) + timec * I(female - 0.495) + timec *  
##     I(act - 50.219) + timec * I(ext - 49.534) + (1 + timec |      pid)
##    Data: dat2
## 
##      AIC      BIC   logLik deviance df.resid 
##  24869.8  24974.1 -12417.9  24835.8     3383 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.0904 -0.4824  0.0017  0.4963  2.7450 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  pid      (Intercept) 197.91   14.068        
##           timec        21.15    4.598   -0.46
##  Residual              25.92    5.091        
## Number of obs: 3400, groups:  pid, 850
## 
## Fixed effects:
##                              Estimate Std. Error t value
## (Intercept)                 61.861269   0.504268 122.675
## timec                       -1.882677   0.176048 -10.694
## diffstress                   0.407743   0.011986  34.019
## I(avestress - 50.473)        0.100033   0.047473   2.107
## I(cohort - 0.501)            3.454553   1.254292   2.754
## I(female - 0.495)            2.666791   1.027672   2.595
## I(act - 50.219)              0.299656   0.053091   5.644
## I(ext - 49.534)              0.184658   0.060866   3.034
## timec:I(avestress - 50.473)  0.007077   0.016573   0.427
## timec:I(cohort - 0.501)     -2.153463   0.437908  -4.918
## timec:I(female - 0.495)     -0.653397   0.358756  -1.821
## timec:I(act - 50.219)        0.363560   0.018534  19.616
## timec:I(ext - 49.534)       -0.210898   0.021248  -9.926

## 
## Correlation matrix not shown by default, as p = 13 > 12.
## Use print(x, correlation=TRUE)  or
##     vcov(x)        if you need it

cohort, act, and ext significantly predicted the random slopes of timec. Thus, these three cross-level interactions are probed by the interactions package. Before using the package, all variables must be centered by creating new variables.

dat2$avestressc <- dat2$avestress - 50.473
dat2$cohortc <- dat2$cohort - 0.501
dat2$femalec <- dat2$female - 0.495
dat2$actc <- dat2$act - 50.219
dat2$extc <- dat2$ext - 49.534
out2hn <- lmer(mem ~ 1 + timec + diffstress + timec*avestressc + timec*cohortc + timec*femalec + timec*actc + timec*extc + (1 + timec|pid), data=dat2, REML=FALSE)

Next, find values of time, cohort, act, and ext for probing interactions. Don’t forget to delete values from the grand means because all predictors are centered by creating new variables.

cohortcval <- c(0, 1) - 0.501
actcval <- c(40, 50, 60) - 50.219
extcval <- c(40, 50, 60) - 49.534
timecval <- c(0, 1, 2, 3)

First, the simple slopes of timec at each level of cohort is calculated.

ss21 <- sim_slopes(model=out2hn, pred=timec, modx=cohortc, modx.values=cohortcval)

ss21

## JOHNSON-NEYMAN INTERVAL 
## 
## When cohortc is OUTSIDE the interval [-1.49, -0.59], the slope of time is p
## < .05.
## 
## Note: The range of observed values of cohortc is [-0.50, 0.50]
## 
## SIMPLE SLOPES ANALYSIS 
## 
## Slope of time when cohortc = -0.501: 
## 
##    Est.   S.E.   t val.      p
## ------- ------ -------- ------
##   -0.81   0.28    -2.85   0.00
## 
## Slope of time when cohortc =  0.499: 
## 
##    Est.   S.E.   t val.      p
## ------- ------ -------- ------
##   -2.96   0.28   -10.51   0.00

Plot the interaction where X-axis is timec and each line represents each value of cohort.

interact_plot(model=out2hn, pred=timec, modx=cohortc, modx.values=cohortcval)

Second, the simple slopes of cohort at each level of timec is calculated.

ss22 <- sim_slopes(model=out2hn, pred=cohortc, modx=timec, modx.values=timecval)

ss22

## JOHNSON-NEYMAN INTERVAL 
## 
## When time is OUTSIDE the interval [1.55, 3.72], the slope of cohortc is p <
## .05.
## 
## Note: The range of observed values of time is [1.00, 4.00]
## 
## SIMPLE SLOPES ANALYSIS 
## 
## Slope of cohortc when time = 0.00: 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   5.61   1.53     3.68   0.00
## 
## Slope of cohortc when time = 1.00: 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   3.45   1.26     2.74   0.01
## 
## Slope of cohortc when time = 2.00: 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   1.30   1.11     1.17   0.24
## 
## Slope of cohortc when time = 3.00: 
## 
##    Est.   S.E.   t val.      p
## ------- ------ -------- ------
##   -0.85   1.12    -0.76   0.45

Plot the interaction where X-axis is cohort and each line represents each value of timec.

interact_plot(model=out2hn, pred=cohortc, modx=timec, modx.values=timecval)

Third, the simple slopes of timec at each level of act is calculated.

ss23 <- sim_slopes(model=out2hn, pred=timec, modx=actc, modx.values=actcval)

ss23

## JOHNSON-NEYMAN INTERVAL 
## 
## When actc is OUTSIDE the interval [4.15, 6.32], the slope of time is p <
## .05.
## 
## Note: The range of observed values of actc is [-41.22, 36.78]
## 
## SIMPLE SLOPES ANALYSIS 
## 
## Slope of time when actc = -10.219: 
## 
##    Est.   S.E.   t val.      p
## ------- ------ -------- ------
##   -5.60   0.26   -21.58   0.00
## 
## Slope of time when actc =  -0.219: 
## 
##    Est.   S.E.   t val.      p
## ------- ------ -------- ------
##   -1.96   0.18   -11.12   0.00
## 
## Slope of time when actc =   9.781: 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   1.67   0.25     6.59   0.00

Plot the interaction where X-axis is timec and each line represents each value of act.

interact_plot(model=out2hn, pred=timec, modx=actc, modx.values=actcval)

Fourth, the simple slopes of act at each level of timec is calculated.

ss24 <- sim_slopes(model=out2hn, pred=actc, modx=timec, modx.values=timecval)

ss24

## JOHNSON-NEYMAN INTERVAL 
## 
## When time is OUTSIDE the interval [-0.18, 0.49], the slope of actc is p <
## .05.
## 
## Note: The range of observed values of time is [1.00, 4.00]
## 
## SIMPLE SLOPES ANALYSIS 
## 
## Slope of actc when time = 0.00: 
## 
##    Est.   S.E.   t val.      p
## ------- ------ -------- ------
##   -0.06   0.06    -0.99   0.32
## 
## Slope of actc when time = 1.00: 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   0.30   0.05     5.62   0.00
## 
## Slope of actc when time = 2.00: 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   0.66   0.05    14.15   0.00
## 
## Slope of actc when time = 3.00: 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   1.03   0.05    21.65   0.00

Plot the interaction where X-axis is act and each line represents each value of timec.

interact_plot(model=out2hn, pred=actc, modx=timec, modx.values=timecval)

Fifth, the simple slopes of timec at each level of ext is calculated.

ss25 <- sim_slopes(model=out2hn, pred=timec, modx=extc, modx.values=extcval)

ss25

## JOHNSON-NEYMAN INTERVAL 
## 
## When extc is OUTSIDE the interval [-11.78, -6.82], the slope of time is p <
## .05.
## 
## Note: The range of observed values of extc is [-26.53, 32.47]
## 
## SIMPLE SLOPES ANALYSIS 
## 
## Slope of time when extc = -9.534: 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   0.13   0.27     0.47   0.64
## 
## Slope of time when extc =  0.466: 
## 
##    Est.   S.E.   t val.      p
## ------- ------ -------- ------
##   -1.98   0.18   -11.21   0.00
## 
## Slope of time when extc = 10.466: 
## 
##    Est.   S.E.   t val.      p
## ------- ------ -------- ------
##   -4.09   0.28   -14.38   0.00

Plot the interaction where X-axis is timec and each line represents each value of ext.

interact_plot(model=out2hn, pred=timec, modx=extc, modx.values=extcval)

Finally, the simple slopes of ext at each level of timec is calculated.

ss26 <- sim_slopes(model=out2hn, pred=extc, modx=timec, modx.values=timecval)

ss26

## JOHNSON-NEYMAN INTERVAL 
## 
## When time is OUTSIDE the interval [1.34, 2.37], the slope of extc is p <
## .05.
## 
## Note: The range of observed values of time is [1.00, 4.00]
## 
## SIMPLE SLOPES ANALYSIS 
## 
## Slope of extc when time = 0.00: 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   0.40   0.07     5.34   0.00
## 
## Slope of extc when time = 1.00: 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   0.18   0.06     3.02   0.00
## 
## Slope of extc when time = 2.00: 
## 
##    Est.   S.E.   t val.      p
## ------- ------ -------- ------
##   -0.03   0.05    -0.49   0.63
## 
## Slope of extc when time = 3.00: 
## 
##    Est.   S.E.   t val.      p
## ------- ------ -------- ------
##   -0.24   0.05    -4.36   0.00

Plot the interaction where X-axis is ext and each line represents each value of timec.

interact_plot(model=out2hn, pred=extc, modx=timec, modx.values=timecval)

Intro to Multilevel Model: Lecture 11 Example Code

Sunthud Pornprasertmanit

29/01/2024

Growth Curve Model

Example 1: Changes in Job Satisfaction

Example 2: Changes in Senses of Belonging in Undergraduates