Read the data set
dat1 <- read.table("lecture4ex1.csv", sep=",", header=TRUE)
Find the descriptive statistics of the variables in the data set
library(psych)
## Warning: package 'psych' was built under R version 4.2.3
describe(dat1)
## vars n mean sd median trimmed mad min max range
## id 1 811 406.00 234.26 406.00 406.00 300.97 1.00 811.00 810.00
## groupid 2 811 49.08 28.45 49.00 48.72 37.06 1.00 100.00 99.00
## consume 3 811 38.31 6.92 38.00 38.21 7.41 20.00 60.00 40.00
## goldcolor 4 811 0.51 0.50 1.00 0.52 0.00 0.00 1.00 1.00
## length 5 811 14.07 5.01 14.12 14.10 6.49 3.17 24.33 21.16
## volume 6 811 0.78 0.29 0.73 0.76 0.28 0.04 1.60 1.56
## area 7 811 1.33 0.70 1.09 1.26 0.56 0.13 3.29 3.15
## height 8 811 0.66 0.20 0.67 0.66 0.25 0.30 0.98 0.68
## nfish 9 811 8.99 2.65 9.00 8.89 2.97 2.00 15.00 13.00
## skew kurtosis se
## id 0.00 -1.20 8.23
## groupid 0.08 -1.17 1.00
## consume 0.14 -0.32 0.24
## goldcolor -0.05 -2.00 0.02
## length -0.05 -1.16 0.18
## volume 0.48 0.42 0.01
## area 0.85 -0.09 0.02
## height -0.07 -1.24 0.01
## nfish 0.27 -0.17 0.09
Check the group means by the aggregate
function using
FUN=mean
. To save spaces, only first ten group means are
shown.
outgroupmean <- aggregate(consume ~ groupid, data=dat1, FUN=mean)
head(outgroupmean, 10)
## groupid consume
## 1 1 33.00000
## 2 2 30.00000
## 3 3 45.50000
## 4 4 38.50000
## 5 5 35.41667
## 6 6 42.83333
## 7 7 43.12500
## 8 8 31.12500
## 9 9 38.66667
## 10 10 35.45455
Check the ranges of consume
within each group by the
aggregate
function using FUN=mean
.
outgrouprange <- aggregate(consume ~ groupid, data=dat1, FUN=range)
head(outgrouprange, 10)
## groupid consume.1 consume.2
## 1 1 29 38
## 2 2 23 37
## 3 3 39 51
## 4 4 35 44
## 5 5 27 41
## 6 6 35 50
## 7 7 35 50
## 8 8 26 35
## 9 9 33 43
## 10 10 28 43
stripchart
is the graph that show data points within
each group.
stripchart(consume ~ groupid, vertical = TRUE, data = dat1[dat1$groupid < 10,])
beeswarm
is the more beautiful version of
stripchart
.
library(beeswarm)
dat1_1 <- dat1[dat1$groupid < 10,]
beeswarm(consume ~ groupid, data=dat1_1, col=rainbow(9))
Two main packages in R are used in multilevel models:
lme4
and nlme
. Let’s use lme4
package first. The main function for analyzing multilevel regression is
lmer
. The formula is a little more complicated than regular
multiple regression. The predictors are 1 + (1|groupid)
.
The first term, 1
, means to estimate the intercept where
the second term, (1|groupid)
, means to estimate group
means. RMEL=FALSE
means to use full information maximum
likelihood.
library(lme4)
## Warning: package 'lme4' was built under R version 4.2.3
## Loading required package: Matrix
## Warning: package 'Matrix' was built under R version 4.2.3
out1m0 <- lmer(consume ~ 1 + (1|groupid), data=dat1, REML=FALSE)
summary(out1m0)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: consume ~ 1 + (1 | groupid)
## Data: dat1
##
## AIC BIC logLik deviance df.resid
## 5120.1 5134.2 -2557.0 5114.1 808
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.52068 -0.71755 0.03949 0.68318 3.05112
##
## Random effects:
## Groups Name Variance Std.Dev.
## groupid (Intercept) 24.04 4.903
## Residual 24.63 4.963
## Number of obs: 811, groups: groupid, 100
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 38.1562 0.5238 72.84
Read and examine the data.
dat2 <-read.table("lecture4ex2.csv", sep=",", header=TRUE)
describe(dat2)
## vars n mean sd median trimmed mad min max range
## eeid 1 10000 5000.50 2886.90 5000.50 5000.50 3706.50 1.00 10000.00 9999.00
## erid 2 10000 500.50 288.69 500.50 500.50 370.65 1.00 1000.00 999.00
## score 3 10000 75.61 6.57 76.00 75.64 5.93 50.00 100.00 50.00
## eesex 4 10000 0.50 0.50 0.50 0.50 0.74 0.00 1.00 1.00
## ersex 5 10000 0.50 0.50 0.50 0.50 0.74 0.00 1.00 1.00
## iq 6 10000 100.20 14.88 100.28 100.17 14.88 50.82 157.57 106.76
## skew kurtosis se
## eeid 0.00 -1.20 28.87
## erid 0.00 -1.20 2.89
## score -0.05 0.07 0.07
## eesex 0.00 -2.00 0.01
## ersex 0.00 -2.00 0.01
## iq 0.03 -0.04 0.15
Run the null model.
out2m0 <- lmer(score ~ 1 + (1|erid), data=dat2, REML=FALSE)
summary(out2m0)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: score ~ 1 + (1 | erid)
## Data: dat2
##
## AIC BIC logLik deviance df.resid
## 61754.1 61775.8 -30874.1 61748.1 9997
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.7662 -0.6329 0.0029 0.6438 3.4706
##
## Random effects:
## Groups Name Variance Std.Dev.
## erid (Intercept) 20.89 4.571
## Residual 22.26 4.718
## Number of obs: 10000, groups: erid, 1000
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 75.6114 0.1521 497.3
Read and examine the data.
dat3 <-read.table("lecture4ex3.csv", sep=",", header=TRUE)
describe(dat3)
## vars n mean sd median trimmed mad min max range
## personid 1 2262 1131.50 653.13 1131.5 1131.50 838.41 1 2262 2261
## tableid 2 2262 249.73 144.15 251.0 250.11 183.84 1 500 499
## sat 3 2262 64.12 11.66 64.0 64.25 11.86 20 100 80
## age 4 2262 36.04 14.23 36.0 35.91 16.31 1 75 74
## female 5 2262 0.51 0.50 1.0 0.52 0.00 0 1 1
## outdoor 6 2262 0.50 0.50 1.0 0.50 0.00 0 1 1
## payperperson 7 2262 325.44 83.30 326.0 325.82 80.06 90 590 500
## numperson 8 2262 5.46 2.02 6.0 5.56 2.97 2 8 6
## skew kurtosis se
## personid 0.00 -1.20 13.73
## tableid -0.01 -1.20 3.03
## sat -0.10 -0.10 0.25
## age 0.11 -0.20 0.30
## female -0.05 -2.00 0.01
## outdoor -0.01 -2.00 0.01
## payperperson -0.01 0.03 1.75
## numperson -0.21 -1.27 0.04
Run the null model.
out3m0 <- lmer(sat ~ 1 + (1|tableid), data=dat3, REML=FALSE)
summary(out3m0)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: sat ~ 1 + (1 | tableid)
## Data: dat3
##
## AIC BIC logLik deviance df.resid
## 16952.7 16969.8 -8473.3 16946.7 2259
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2436 -0.6240 -0.0043 0.6077 2.8041
##
## Random effects:
## Groups Name Variance Std.Dev.
## tableid (Intercept) 60.34 7.768
## Residual 76.05 8.721
## Number of obs: 2262, groups: tableid, 500
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 64.2814 0.4005 160.5
In this goldfish example, two tank-level predictors are used to
predict goldfish’s’ food consumption. nfish
is the number
of fish within each tank and volume
is the tank volume.
Technically, the averaged fish consumption within each tank is predicted
by both tank-level predictor. Multilevel regression is used rather than
aggregated regression which does not account for the reliability of
group means. Note that 1
is not included as the predictor
in the formula but it is included automatically as the intercept is
shown in the output.
out1m1 <- lmer(consume ~ nfish + volume + (1|groupid), data=dat1, REML=FALSE)
summary(out1m1)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: consume ~ nfish + volume + (1 | groupid)
## Data: dat1
##
## AIC BIC logLik deviance df.resid
## 5073.4 5096.9 -2531.7 5063.4 806
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.61728 -0.69324 0.03748 0.67548 3.03298
##
## Random effects:
## Groups Name Variance Std.Dev.
## groupid (Intercept) 13.20 3.633
## Residual 24.62 4.962
## Number of obs: 811, groups: groupid, 100
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 35.1096 1.3547 25.917
## nfish -1.2111 0.2381 -5.087
## volume 18.0981 2.2787 7.942
##
## Correlation of Fixed Effects:
## (Intr) nfish
## nfish -0.533
## volume -0.108 -0.760
Both nfish
and volume
can be centered at
their grand means to make the intercept of the model meaningful.
out1m1_1 <- lmer(consume ~ I(nfish - mean(nfish)) + I(volume - mean(volume)) + (1|groupid), data=dat1, REML=FALSE)
summary(out1m1_1)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: consume ~ I(nfish - mean(nfish)) + I(volume - mean(volume)) +
## (1 | groupid)
## Data: dat1
##
## AIC BIC logLik deviance df.resid
## 5073.4 5096.9 -2531.7 5063.4 806
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.61728 -0.69324 0.03748 0.67548 3.03298
##
## Random effects:
## Groups Name Variance Std.Dev.
## groupid (Intercept) 13.20 3.633
## Residual 24.62 4.962
## Number of obs: 811, groups: groupid, 100
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 38.3144 0.4202 91.187
## I(nfish - mean(nfish)) -1.2111 0.2381 -5.087
## I(volume - mean(volume)) 18.0981 2.2787 7.942
##
## Correlation of Fixed Effects:
## (Intr) I(-mn(n))
## I(-mn(nfs)) 0.163
## I(-mn(vlm)) 0.000 -0.760
To get the exact p of the t values (assuming normal distribution or
df = \(\infty\)), the t value must be
extracted from the output. Thus, the output from the
summary
function is saved.
summary1m1_1 <- summary(out1m1_1)
coef1m1_1 <- coef(summary1m1_1)
coef1m1_1
## Estimate Std. Error t value
## (Intercept) 38.314427 0.4201743 91.186985
## I(nfish - mean(nfish)) -1.211056 0.2380587 -5.087214
## I(volume - mean(volume)) 18.098092 2.2786698 7.942393
Next, extract only the t value
column.
tvalue1m1_1 <- coef1m1_1[,"t value"]
tvalue1m1_1
## (Intercept) I(nfish - mean(nfish)) I(volume - mean(volume))
## 91.186985 -5.087214 7.942393
Finally, calculate the p values using the pnorm
function. The result is multiplied by 2 to get the two-tailed p
values
pnorm(abs(tvalue1m1_1), lower.tail=FALSE) * 2
## (Intercept) I(nfish - mean(nfish)) I(volume - mean(volume))
## 0.00000e+00 3.63361e-07 1.98317e-15
The rating scores are predicted by the interviewers’ sex.
out2m1 <- lmer(score ~ 1 + ersex + (1|erid), data=dat2, REML=FALSE)
summary(out2m1)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: score ~ 1 + ersex + (1 | erid)
## Data: dat2
##
## AIC BIC logLik deviance df.resid
## 61752.9 61781.7 -30872.4 61744.9 9996
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.7604 -0.6335 0.0010 0.6456 3.4647
##
## Random effects:
## Groups Name Variance Std.Dev.
## erid (Intercept) 20.82 4.563
## Residual 22.26 4.718
## Number of obs: 10000, groups: erid, 1000
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 75.3376 0.2147 350.915
## ersex 0.5476 0.3036 1.804
##
## Correlation of Fixed Effects:
## (Intr)
## ersex -0.707
The customer satisfaction scores are predicted by the number of persons in each table and whether the table is outdoor.
out3m1 <- lmer(sat ~ 1 + I(numperson - 4) + outdoor + (1|tableid), data=dat3, REML=FALSE)
summary(out3m1)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: sat ~ 1 + I(numperson - 4) + outdoor + (1 | tableid)
## Data: dat3
##
## AIC BIC logLik deviance df.resid
## 16943.7 16972.4 -8466.9 16933.7 2257
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2689 -0.6341 -0.0019 0.6189 2.7590
##
## Random effects:
## Groups Name Variance Std.Dev.
## tableid (Intercept) 58.24 7.632
## Residual 76.06 8.721
## Number of obs: 2262, groups: tableid, 500
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 63.1131 0.5768 109.420
## I(numperson - 4) -0.2375 0.1920 -1.237
## outdoor 2.6954 0.7904 3.410
##
## Correlation of Fixed Effects:
## (Intr) I(n-4)
## I(nmprsn-4) -0.247
## outdoor -0.683 -0.008
In this goldfish example, two goldfish-level predictors are used to
predict goldfish’s’ food consumption. length
is the length
of each goldfish and goldcolor
is whether a goldfish is
gold or not. Multilevel regression is used rather than disaggregated
regression which may inflate type I errors. Note that the length is
centered at 15 cm.
out1m2 <- lmer(consume ~ I(length - 15) + goldcolor + (1|groupid), data=dat1, REML=FALSE)
summary(out1m2)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: consume ~ I(length - 15) + goldcolor + (1 | groupid)
## Data: dat1
##
## AIC BIC logLik deviance df.resid
## 4127.7 4151.2 -2058.8 4117.7 806
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2524 -0.6184 0.0284 0.6508 3.1677
##
## Random effects:
## Groups Name Variance Std.Dev.
## groupid (Intercept) 18.644 4.318
## Residual 6.363 2.522
## Number of obs: 811, groups: groupid, 100
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 39.34350 0.45266 86.916
## I(length - 15) 0.88283 0.01943 45.442
## goldcolor -0.46446 0.18697 -2.484
##
## Correlation of Fixed Effects:
## (Intr) I(-15)
## I(lngth-15) 0.040
## goldcolor -0.211 0.027
The rating scores are predicted by the interviewees’ sex and IQ. Note that the interviewee’s IQ is centered at 100.
out2m2 <- lmer(score ~ 1 + eesex + I(iq - 100) + (1|erid), data=dat2, REML=FALSE)
summary(out2m2)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: score ~ 1 + eesex + I(iq - 100) + (1 | erid)
## Data: dat2
##
## AIC BIC logLik deviance df.resid
## 61676.1 61712.2 -30833.1 61666.1 9995
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.7323 -0.6356 -0.0036 0.6447 3.3714
##
## Random effects:
## Groups Name Variance Std.Dev.
## erid (Intercept) 20.86 4.567
## Residual 22.07 4.697
## Number of obs: 10000, groups: erid, 1000
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 75.543619 0.158973 475.199
## eesex 0.123540 0.093952 1.315
## I(iq - 100) 0.029659 0.003307 8.970
##
## Correlation of Fixed Effects:
## (Intr) eesex
## eesex -0.295
## I(iq - 100) -0.002 -0.007
The customer satisfaction scores are predicted by the customers’ sex and age. Note that age is centered at 40.
out3m2 <- lmer(sat ~ 1 + I(age - 40) + female + (1|tableid), data=dat3, REML=FALSE)
summary(out3m2)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: sat ~ 1 + I(age - 40) + female + (1 | tableid)
## Data: dat3
##
## AIC BIC logLik deviance df.resid
## 16553.0 16581.6 -8271.5 16543.0 2257
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.95679 -0.59088 -0.00305 0.59996 2.94782
##
## Random effects:
## Groups Name Variance Std.Dev.
## tableid (Intercept) 68.43 8.272
## Residual 59.69 7.726
## Number of obs: 2262, groups: tableid, 500
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 66.44867 0.45496 146.055
## I(age - 40) 0.26085 0.01242 20.994
## female -2.14516 0.36228 -5.921
##
## Correlation of Fixed Effects:
## (Intr) I(-40)
## I(age - 40) 0.140
## female -0.417 -0.073
Put both two goldfish-level predictors and two tank-level predicts in the model.
out1m3 <- lmer(consume ~ I(nfish - mean(nfish)) + I(volume - mean(volume)) + I(length - 15) + goldcolor + (1|groupid), data=dat1, REML=FALSE)
summary(out1m3)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: consume ~ I(nfish - mean(nfish)) + I(volume - mean(volume)) +
## I(length - 15) + goldcolor + (1 | groupid)
## Data: dat1
##
## AIC BIC logLik deviance df.resid
## 4087.8 4120.7 -2036.9 4073.8 804
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2895 -0.6440 0.0324 0.6377 3.1624
##
## Random effects:
## Groups Name Variance Std.Dev.
## groupid (Intercept) 11.768 3.430
## Residual 6.358 2.522
## Number of obs: 811, groups: groupid, 100
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 39.37890 0.38422 102.491
## I(nfish - mean(nfish)) -1.11583 0.20651 -5.403
## I(volume - mean(volume)) 14.76808 1.99877 7.389
## I(length - 15) 0.87926 0.01939 45.346
## goldcolor -0.47843 0.18663 -2.563
##
## Correlation of Fixed Effects:
## (Intr) I(-mn(n)) I(-mn(v)) I(-15)
## I(-mn(nfs)) 0.182
## I(-mn(vlm)) -0.001 -0.760
## I(lngth-15) 0.040 0.016 -0.042
## goldcolor -0.248 0.005 -0.005 0.027
Put interviewers’ sex, interviewees’ sex, and interviewees’ IQ in the same model.
out2m3 <- lmer(score ~ 1 + ersex + eesex + I(iq - 100) + (1|erid), data=dat2, REML=FALSE)
summary(out2m3)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: score ~ 1 + ersex + eesex + I(iq - 100) + (1 | erid)
## Data: dat2
##
## AIC BIC logLik deviance df.resid
## 61674.9 61718.2 -30831.4 61662.9 9994
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.7265 -0.6355 -0.0029 0.6452 3.3655
##
## Random effects:
## Groups Name Variance Std.Dev.
## erid (Intercept) 20.78 4.559
## Residual 22.07 4.697
## Number of obs: 10000, groups: erid, 1000
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 75.270267 0.219518 342.888
## ersex 0.546704 0.303254 1.803
## eesex 0.123540 0.093952 1.315
## I(iq - 100) 0.029658 0.003307 8.969
##
## Correlation of Fixed Effects:
## (Intr) ersex eesex
## ersex -0.691
## eesex -0.214 0.000
## I(iq - 100) -0.001 0.000 -0.007
In addition to put all customer-level and table-level predictors, the
age averages within each table and the proportion of females within each
table are also used. Let’s start with the age variable. First, the age
averages within each group is calculated by the ave
function.
dat3$aveage <- ave(dat3$age, dat3$tableid)
Next, both age
and aveage
are used to
predict customer satisfaction. The regression coefficient of
age
represents the within-table effect. However, the
regression coefficient of aveage
represent the sum of
within-table and between-table effects. Note that both predictors are
centered at 40.
out3m3 <- lmer(sat ~ I(age - 40) + I(aveage - 40) + (1|tableid), data=dat3, REML=FALSE)
summary(out3m3)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: sat ~ I(age - 40) + I(aveage - 40) + (1 | tableid)
## Data: dat3
##
## AIC BIC logLik deviance df.resid
## 16562.7 16591.3 -8276.4 16552.7 2257
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.1638 -0.6006 -0.0048 0.6144 3.0681
##
## Random effects:
## Groups Name Variance Std.Dev.
## tableid (Intercept) 64.30 8.019
## Residual 60.82 7.799
## Number of obs: 2262, groups: tableid, 500
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 64.08614 0.47176 135.845
## I(age - 40) 0.26783 0.01276 20.992
## I(aveage - 40) -0.31818 0.06279 -5.068
##
## Correlation of Fixed Effects:
## (Intr) I(g-40)
## I(age - 40) 0.000
## I(aveag-40) 0.516 -0.203
To calculate the between table effects, the regression coefficients are extracted from the summary output.
sumout3m3 <- summary(out3m3)
coef3m3 <- coef(sumout3m3)[,"Estimate"]
coef3m3
## (Intercept) I(age - 40) I(aveage - 40)
## 64.0861426 0.2678348 -0.3181824
Calculate the between-table effect by summing the last two coefficients.
coef3m3[2] + coef3m3[3]
## I(age - 40)
## -0.05034763
Next, visualize the within-table and between-table age effects.
Because this data set has 500 tables. We will plot only 100 within-table
regression lines. So dat3_1
is created to extract only
table ID that is divisible by 5.
dat3_1 <- dat3[dat3$tableid%%5 == 0,]
The ggplot
function in the ggplot2
package
is used to create within-table regression lines. See that
aes
is used to make subsets of data for each table and
geom_smooth
is used to create regression lines within each
subset of data.
Next, to create the between-table regression line,
geom_abline
is used to overlap the current plot. Note that
btwintcept
is added by btwslope*40
because the
X-axis of the graph is the original-scaled age not the centered age at
40.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.2.3
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
out <- ggplot(dat3_1, aes(x=age, y=sat, group=tableid)) + geom_smooth(method=lm, se=FALSE)
btwslope <- coef3m3[2] + coef3m3[3]
btwintcept <- coef3m3[1]+(btwslope*40)
out + geom_abline(intercept=btwintcept, slope=btwslope, color="red")
## `geom_smooth()` using formula = 'y ~ x'
Similar to customers’ age, repeat the same process to
ave
to get the within- and between-table effects.
dat3$avefemale <- ave(dat3$female, dat3$tableid)
out3m4 <- lmer(sat ~ female + I(avefemale - 0.5) + (1|tableid), data=dat3, REML=FALSE)
summary(out3m4)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: sat ~ female + I(avefemale - 0.5) + (1 | tableid)
## Data: dat3
##
## AIC BIC logLik deviance df.resid
## 16940.3 16968.9 -8465.1 16930.3 2257
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.15838 -0.63360 -0.00586 0.61192 2.75216
##
## Random effects:
## Groups Name Variance Std.Dev.
## tableid (Intercept) 60.12 7.753
## Residual 75.45 8.686
## Number of obs: 2262, groups: tableid, 500
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 65.0817 0.4513 144.209
## female -1.5293 0.4175 -3.663
## I(avefemale - 0.5) -1.1253 1.5879 -0.709
##
## Correlation of Fixed Effects:
## (Intr) female
## female -0.463
## I(vfml-0.5) 0.078 -0.263
Extract the regression coefficients.
sumout3m4 <- summary(out3m4)
coef3m4 <- coef(sumout3m4)[,"Estimate"]
coef3m4
## (Intercept) female I(avefemale - 0.5)
## 65.081716 -1.529278 -1.125336
Visualize the within-table (blue) and between-table (red) effects of gender on customer satisfaction.
out <- ggplot(dat3_1, aes(x=female, y=sat, group=tableid)) + geom_smooth(method=lm, se=FALSE)
btwslope <- coef3m4[2] + coef3m4[3]
btwintcept <- coef3m4[1]+(btwslope*0.5)
out + geom_abline(intercept=btwintcept, slope=btwslope, color="red")
## `geom_smooth()` using formula = 'y ~ x'
Put all predictors in the model.
out3m5 <- lmer(sat ~ female + I(age - 40) + I(avefemale - 0.5) + I(aveage - 40) + I(numperson - 4) + outdoor + (1|tableid), data=dat3, REML=FALSE)
summary(out3m5)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: sat ~ female + I(age - 40) + I(avefemale - 0.5) + I(aveage -
## 40) + I(numperson - 4) + outdoor + (1 | tableid)
## Data: dat3
##
## AIC BIC logLik deviance df.resid
## 16522.6 16574.1 -8252.3 16504.6 2253
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.07246 -0.60940 0.00108 0.60756 2.92122
##
## Random effects:
## Groups Name Variance Std.Dev.
## tableid (Intercept) 62.19 7.886
## Residual 59.70 7.726
## Number of obs: 2262, groups: tableid, 500
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 64.11409 0.64109 100.008
## female -2.13297 0.37243 -5.727
## I(age - 40) 0.27329 0.01268 21.560
## I(avefemale - 0.5) -0.57699 1.55230 -0.372
## I(aveage - 40) -0.30041 0.06216 -4.832
## I(numperson - 4) -0.23366 0.19197 -1.217
## outdoor 2.66961 0.79303 3.366
##
## Correlation of Fixed Effects:
## (Intr) female I(g-40) I(-0.5 I(v-40) I(n-4)
## female -0.290
## I(age - 40) 0.022 -0.075
## I(vfml-0.5) 0.032 -0.240 0.018
## I(aveag-40) 0.324 0.015 -0.204 -0.025
## I(nmprsn-4) -0.220 0.000 0.000 0.007 -0.034
## outdoor -0.581 0.000 0.000 -0.005 0.089 -0.011