Read the data set
ex1 <- read.table("lecture3ex1.csv", sep=",", header=TRUE)
Find the descriptive statistics of the variables in the data set
library(psych)
## Warning: package 'psych' was built under R version 4.2.3
describe(ex1)
## vars n mean sd median trimmed mad min max range skew
## tlleader 1 60 48.90 10.96 51.03 49.61 9.63 17.31 70.75 53.44 -0.70
## security 2 60 51.65 10.22 52.10 51.87 9.78 28.29 76.30 48.01 -0.13
## jobsat 3 60 51.21 15.74 50.99 50.93 17.83 20.00 80.00 60.00 0.06
## kurtosis se
## tlleader 0.40 1.41
## security -0.26 1.32
## jobsat -0.89 2.03
Test whether each correlation in a correlation matrix is significantly different from 0.
corr.test(ex1)
## Call:corr.test(x = ex1)
## Correlation matrix
## tlleader security jobsat
## tlleader 1.00 0.09 -0.14
## security 0.09 1.00 0.57
## jobsat -0.14 0.57 1.00
## Sample Size
## [1] 60
## Probability values (Entries above the diagonal are adjusted for multiple tests.)
## tlleader security jobsat
## tlleader 0.00 0.58 0.58
## security 0.51 0.00 0.00
## jobsat 0.29 0.00 0.00
##
## To see confidence intervals of the correlations, print with the short=FALSE option
Run multiple regression where jobsat
is the outcome and
tlleader
and security
are predictors. Save the
output of the multiple regression as out1
and use
summary
to see the output.
out1 <- lm(jobsat ~ tlleader + security, data=ex1)
summary(out1)
##
## Call:
## lm(formula = jobsat ~ tlleader + security, data = ex1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -39.683 -8.821 2.048 7.479 26.441
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.8111 10.9031 1.634 0.1079
## tlleader -0.2724 0.1527 -1.785 0.0797 .
## security 0.9045 0.1637 5.527 8.45e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.8 on 57 degrees of freedom
## Multiple R-squared: 0.3613, Adjusted R-squared: 0.3389
## F-statistic: 16.12 on 2 and 57 DF, p-value: 2.82e-06
tlleader*security
represents the effect of each
predictor alone and the product of both variables. That is, in R, the
interaction can be represented by *
in the lm
formula.
out1a <- lm(jobsat ~ tlleader*security, data=ex1)
summary(out1a)
##
## Call:
## lm(formula = jobsat ~ tlleader * security, data = ex1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.9307 -8.8247 0.3722 7.6131 18.6684
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -152.11000 30.80450 -4.938 7.48e-06 ***
## tlleader 3.25737 0.62576 5.205 2.86e-06 ***
## security 4.19433 0.58680 7.148 1.97e-09 ***
## tlleader:security -0.06808 0.01184 -5.751 3.84e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.24 on 56 degrees of freedom
## Multiple R-squared: 0.5985, Adjusted R-squared: 0.577
## F-statistic: 27.83 on 3 and 56 DF, p-value: 3.788e-11
You will see that tlleader:security
is the interaction
effect, which is significant here.
The anova
function compares the models and without the
interaction.
anova(out1, out1a)
## Analysis of Variance Table
##
## Model 1: jobsat ~ tlleader + security
## Model 2: jobsat ~ tlleader * security
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 57 9335.4
## 2 56 5868.8 1 3466.7 33.079 3.843e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Because the intercept is not meaningful because tlleader
= 0 and security
= 0 are not meaningful. Thus, centering to
their means are used.
out1_1 <- lm(jobsat ~ I(tlleader - mean(tlleader)) + I(security - mean(security)), data=ex1)
summary(out1_1)
##
## Call:
## lm(formula = jobsat ~ I(tlleader - mean(tlleader)) + I(security -
## mean(security)), data = ex1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -39.683 -8.821 2.048 7.479 26.441
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 51.2075 1.6522 30.994 < 2e-16 ***
## I(tlleader - mean(tlleader)) -0.2724 0.1527 -1.785 0.0797 .
## I(security - mean(security)) 0.9045 0.1637 5.527 8.45e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.8 on 57 degrees of freedom
## Multiple R-squared: 0.3613, Adjusted R-squared: 0.3389
## F-statistic: 16.12 on 2 and 57 DF, p-value: 2.82e-06
out1a_1 <- lm(jobsat ~ I(tlleader - mean(tlleader))*I(security - mean(security)), data=ex1)
summary(out1a_1)
##
## Call:
## lm(formula = jobsat ~ I(tlleader - mean(tlleader)) * I(security -
## mean(security)), data = ex1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.9307 -8.8247 0.3722 7.6131 18.6684
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 51.86394 1.32653
## I(tlleader - mean(tlleader)) -0.25935 0.12215
## I(security - mean(security)) 0.86493 0.13109
## I(tlleader - mean(tlleader)):I(security - mean(security)) -0.06808 0.01184
## t value Pr(>|t|)
## (Intercept) 39.097 < 2e-16 ***
## I(tlleader - mean(tlleader)) -2.123 0.0382 *
## I(security - mean(security)) 6.598 1.59e-08 ***
## I(tlleader - mean(tlleader)):I(security - mean(security)) -5.751 3.84e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.24 on 56 degrees of freedom
## Multiple R-squared: 0.5985, Adjusted R-squared: 0.577
## F-statistic: 27.83 on 3 and 56 DF, p-value: 3.788e-11
You can see that the anova
results between centered and
uncentered outputs are the same.
anova(out1_1, out1a_1)
## Analysis of Variance Table
##
## Model 1: jobsat ~ I(tlleader - mean(tlleader)) + I(security - mean(security))
## Model 2: jobsat ~ I(tlleader - mean(tlleader)) * I(security - mean(security))
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 57 9335.4
## 2 56 5868.8 1 3466.7 33.079 3.843e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
To get the standardized estimates, the betaDelta
package
cannot be used here. Each predictor must be standardized (by the
scale
function) first before calculating their products and
running regression. The I
and scale
functions
are used to get standardized coefficients. Note that the test statistics
are incorrect. Please use the point standardized estimates only.
out1_2 <- lm(I(scale(jobsat)) ~ I(scale(tlleader)) + I(scale(security)), data=ex1)
summary(out1_2)
##
## Call:
## lm(formula = I(scale(jobsat)) ~ I(scale(tlleader)) + I(scale(security)),
## data = ex1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.5212 -0.5604 0.1301 0.4752 1.6799
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.091e-17 1.050e-01 0.000 1.0000
## I(scale(tlleader)) -1.896e-01 1.063e-01 -1.785 0.0797 .
## I(scale(security)) 5.873e-01 1.063e-01 5.527 8.45e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8131 on 57 degrees of freedom
## Multiple R-squared: 0.3613, Adjusted R-squared: 0.3389
## F-statistic: 16.12 on 2 and 57 DF, p-value: 2.82e-06
out1a_2 <- lm(I(scale(jobsat)) ~ I(scale(tlleader))*I(scale(security)), data=ex1)
summary(out1a_2)
##
## Call:
## lm(formula = I(scale(jobsat)) ~ I(scale(tlleader)) * I(scale(security)),
## data = ex1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.45683 -0.56065 0.02365 0.48368 1.18605
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.04170 0.08428 0.495 0.6227
## I(scale(tlleader)) -0.18051 0.08501 -2.123 0.0382 *
## I(scale(security)) 0.56160 0.08512 6.598 1.59e-08 ***
## I(scale(tlleader)):I(scale(security)) -0.48428 0.08420 -5.751 3.84e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6504 on 56 degrees of freedom
## Multiple R-squared: 0.5985, Adjusted R-squared: 0.577
## F-statistic: 27.83 on 3 and 56 DF, p-value: 3.788e-11
The interactions
package is used for probing
interactions. First, use interact_plot
to visualize the
interaction effect.
library(interactions)
## Warning: package 'interactions' was built under R version 4.2.3
interact_plot(out1a, pred="tlleader", modx="security")
The lines representing each value of moderators can be modified by
changing modx.values
. plot.points=TRUE
is used
to provide data points.
interact_plot(out1a, pred="tlleader", modx="security", modx.values = c(30, 40, 50, 60, 70), plot.points=TRUE)
You can change many arguments. Please check the help page.
interact_plot(out1a, pred="tlleader", modx="security", interval=TRUE, int.width=.9,
x.label = "Transformational Leadership", y.label = "Job Satisfaction",
main.title = NULL, legend.main = "Job Security",
colors = "seagreen")
The simple slopes can be tested whether the effect of the main predictor is significant at each level of moderators.
sim_slopes(out1a, pred="tlleader", modx="security")
## JOHNSON-NEYMAN INTERVAL
##
## When security is OUTSIDE the interval [43.14, 51.44], the slope of tlleader
## is p < .05.
##
## Note: The range of observed values of security is [28.29, 76.30]
##
## SIMPLE SLOPES ANALYSIS
##
## Slope of tlleader when security = 41.43465 (- 1 SD):
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.44 0.17 2.52 0.01
##
## Slope of tlleader when security = 51.65472 (Mean):
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -0.26 0.12 -2.12 0.04
##
## Slope of tlleader when security = 61.87480 (+ 1 SD):
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -0.96 0.17 -5.61 0.00
Get and test the conditional intercepts.
sim_slopes(out1a, pred="tlleader", modx="security", cond.int=TRUE)
## JOHNSON-NEYMAN INTERVAL
##
## When security is OUTSIDE the interval [43.14, 51.44], the slope of tlleader
## is p < .05.
##
## Note: The range of observed values of security is [28.29, 76.30]
##
## SIMPLE SLOPES ANALYSIS
##
## When security = 41.43465 (- 1 SD):
##
## Est. S.E. t val. p
## --------------------------- ------- ------ -------- ------
## Slope of tlleader 0.44 0.17 2.52 0.01
## Conditional intercept 43.02 1.89 22.77 0.00
##
## When security = 51.65472 (Mean):
##
## Est. S.E. t val. p
## --------------------------- ------- ------ -------- ------
## Slope of tlleader -0.26 0.12 -2.12 0.04
## Conditional intercept 51.86 1.33 39.10 0.00
##
## When security = 61.87480 (+ 1 SD):
##
## Est. S.E. t val. p
## --------------------------- ------- ------ -------- ------
## Slope of tlleader -0.96 0.17 -5.61 0.00
## Conditional intercept 60.70 1.88 32.27 0.00
Plot the confidence intervals of simple slopes at each level of the moderator.
ss <- sim_slopes(out1a, pred="tlleader", modx="security", modx.values = seq(30, 75, 5))
plot(ss)
## Registered S3 methods overwritten by 'broom':
## method from
## tidy.glht jtools
## tidy.summary.glht jtools
## Loading required namespace: broom.mixed
Check the Johnson-Neyman plot.
johnson_neyman(out1a, pred="tlleader", modx="security", alpha = .05)
## JOHNSON-NEYMAN INTERVAL
##
## When security is OUTSIDE the interval [43.14, 51.44], the slope of tlleader
## is p < .05.
##
## Note: The range of observed values of security is [28.29, 76.30]
Read the data set
ex2 <- read.table("lecture3ex2.csv", sep=",", header=TRUE)
Find the descriptive statistics of the variables in the data set
describe(ex2)
## vars n mean sd median trimmed mad min max range skew
## ses 1 100 4.96 2.10 5.24 5.01 2.18 0.00 9.67 9.67 -0.19
## age 2 100 41.66 9.09 42.27 41.84 8.01 17.90 64.49 46.59 -0.17
## guarlike 3 100 50.77 10.81 52.08 51.33 9.16 20.00 76.16 56.16 -0.49
## partlike 4 100 64.72 9.71 65.46 65.09 9.74 30.84 80.00 49.16 -0.59
## kurtosis se
## ses -0.51 0.21
## age -0.10 0.91
## guarlike 0.29 1.08
## partlike 0.26 0.97
Test whether each correlation in a correlation matrix is significantly different from 0.
corr.test(ex2)
## Call:corr.test(x = ex2)
## Correlation matrix
## ses age guarlike partlike
## ses 1.00 -0.06 0.63 0.47
## age -0.06 1.00 -0.09 0.14
## guarlike 0.63 -0.09 1.00 0.60
## partlike 0.47 0.14 0.60 1.00
## Sample Size
## [1] 100
## Probability values (Entries above the diagonal are adjusted for multiple tests.)
## ses age guarlike partlike
## ses 0.00 0.72 0.00 0.00
## age 0.56 0.00 0.72 0.46
## guarlike 0.00 0.36 0.00 0.00
## partlike 0.00 0.15 0.00 0.00
##
## To see confidence intervals of the correlations, print with the short=FALSE option
Run the multiple regression without interactions.
out2 <- lm(partlike ~ age + ses + guarlike, data=ex2)
summary(out2)
##
## Call:
## lm(formula = partlike ~ age + ses + guarlike, data = ex2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -20.5429 -4.7012 -0.3549 5.8684 15.6513
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 28.49941 5.32750 5.349 5.98e-07 ***
## age 0.21546 0.08404 2.564 0.0119 *
## ses 0.70153 0.46733 1.501 0.1366
## guarlike 0.46814 0.09105 5.142 1.44e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.564 on 96 degrees of freedom
## Multiple R-squared: 0.4121, Adjusted R-squared: 0.3937
## F-statistic: 22.43 on 3 and 96 DF, p-value: 4.345e-11
Run the multiple regression with the interaction between
age
and ses
.
out2a <- lm(partlike ~ age*ses + guarlike, data=ex2)
summary(out2a)
##
## Call:
## lm(formula = partlike ~ age * ses + guarlike, data = ex2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.3513 -5.8078 0.0796 4.8115 14.7311
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 59.90367 9.82356 6.098 2.3e-08 ***
## age -0.52191 0.21361 -2.443 0.016399 *
## ses -5.25065 1.66125 -3.161 0.002112 **
## guarlike 0.45095 0.08565 5.265 8.7e-07 ***
## age:ses 0.14456 0.03891 3.715 0.000343 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.105 on 95 degrees of freedom
## Multiple R-squared: 0.4866, Adjusted R-squared: 0.465
## F-statistic: 22.51 on 4 and 95 DF, p-value: 4.239e-13
The anova
function compares the models and without the
interaction.
anova(out2, out2a)
## Analysis of Variance Table
##
## Model 1: partlike ~ age + ses + guarlike
## Model 2: partlike ~ age * ses + guarlike
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 96 5492.7
## 2 95 4796.0 1 696.74 13.801 0.000343 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Check the simple slopes of ses
at each level of
age
.
sim_slopes(out2a, pred="ses", modx="age")
## JOHNSON-NEYMAN INTERVAL
##
## When age is OUTSIDE the interval [26.38, 42.38], the slope of ses is p <
## .05.
##
## Note: The range of observed values of age is [17.90, 64.49]
##
## SIMPLE SLOPES ANALYSIS
##
## Slope of ses when age = 32.57988 (- 1 SD):
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -0.54 0.55 -0.98 0.33
##
## Slope of ses when age = 41.66498 (Mean):
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.77 0.44 1.76 0.08
##
## Slope of ses when age = 50.75007 (+ 1 SD):
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 2.09 0.58 3.62 0.00
Plot the interaction between ses
and
age
.
interact_plot(out2a, pred="ses", modx="age")
Read the data set
ex3 <- read.table("lecture3ex3.csv", sep=",", header=TRUE)
Find the descriptive statistics of the variables in the data set
describe(ex3)
## vars n mean sd median trimmed mad min max range skew
## latemins 1 150 3.31 2.51 3.00 3.04 2.97 0.00 12.00 12.00 1.05
## country* 2 150 2.00 0.82 2.00 2.00 1.48 1.00 3.00 2.00 0.00
## angry 3 150 36.18 9.48 34.92 35.37 8.76 19.68 73.74 54.05 0.88
## kurtosis se
## latemins 1.12 0.21
## country* -1.52 0.07
## angry 1.20 0.77
Find the frequency table of country
.
table(ex3[,"country"])
##
## chinese english thai
## 50 50 50
Test whether each correlation in a correlation matrix (only between continuous variables) is significantly different from 0.
corr.test(ex3[,c("latemins", "angry")])
## Call:corr.test(x = ex3[, c("latemins", "angry")])
## Correlation matrix
## latemins angry
## latemins 1.00 0.53
## angry 0.53 1.00
## Sample Size
## [1] 150
## Probability values (Entries above the diagonal are adjusted for multiple tests.)
## latemins angry
## latemins 0 0
## angry 0 0
##
## To see confidence intervals of the correlations, print with the short=FALSE option
Change country
to be in the factor format and change the
reference group to thai
.
ex3[,"country"] <- factor(ex3[,"country"])
ex3[,"country"] <- relevel(ex3[,"country"], ref="thai")
Run the multiple regression without interactions.
out3 <- lm(angry ~ latemins + country, data=ex3)
summary(out3)
##
## Call:
## lm(formula = angry ~ latemins + country, data = ex3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.0516 -4.0123 -0.2607 4.2589 18.5488
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 26.3328 1.2353 21.318 < 2e-16 ***
## latemins 1.9676 0.2266 8.685 6.98e-15 ***
## countrychinese 0.8071 1.3907 0.580 0.563
## countryenglish 9.1790 1.3907 6.600 7.10e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.953 on 146 degrees of freedom
## Multiple R-squared: 0.4733, Adjusted R-squared: 0.4625
## F-statistic: 43.73 on 3 and 146 DF, p-value: < 2.2e-16
Run the multiple regression with the interaction between
latemins
and country
. You will see that the
dummy variables are multipled by the continuous variable
automatically.
out3a <- lm(angry ~ latemins*country, data=ex3)
summary(out3a)
##
## Call:
## lm(formula = angry ~ latemins * country, data = ex3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.6988 -4.5530 0.1675 3.7784 16.0432
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 29.5190 1.5485 19.063 < 2e-16 ***
## latemins 1.0021 0.3826 2.619 0.00976 **
## countrychinese -0.2551 2.0873 -0.122 0.90290
## countryenglish 0.2512 2.1779 0.115 0.90833
## latemins:countrychinese 0.3140 0.5051 0.622 0.53516
## latemins:countryenglish 2.6642 0.5304 5.023 1.48e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.339 on 144 degrees of freedom
## Multiple R-squared: 0.5682, Adjusted R-squared: 0.5532
## F-statistic: 37.9 on 5 and 144 DF, p-value: < 2.2e-16
The anova
function compares the models and without the
interaction.
anova(out3, out3a)
## Analysis of Variance Table
##
## Model 1: angry ~ latemins + country
## Model 2: angry ~ latemins * country
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 146 7058.5
## 2 144 5786.4 2 1272.1 15.828 6.113e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Check the simple slopes of latemins
at each
country
.
sim_slopes(out3a, pred="latemins", modx="country")
## Warning: Johnson-Neyman intervals are not available for factor moderators.
## SIMPLE SLOPES ANALYSIS
##
## Slope of latemins when country = english:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 3.67 0.37 9.98 0.00
##
## Slope of latemins when country = chinese:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 1.32 0.33 3.99 0.00
##
## Slope of latemins when country = thai:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 1.00 0.38 2.62 0.01
Plot the interaction between latemins
and
country
.
interact_plot(out3a, pred="latemins", modx="country")
The interactions
package can test the simple slopes of
continuous variables at each level of categorical variable. It cannot
examine whether the differences between countries are significant at
each amount of late minutes. You need to adjust the formula in
lm
to test them.
First, at latemins
= 0, we check whether the countries
are different. One model has the country in the lm
formula
to examine their differences and another model exclude the
country
out of the formula to say that the differences
between countries are zero. Then use anova
to test the
differences.
Note that latemins:country
is the interaction effect
alone whereas latemins*country
include the first order
terms and their interaction; that is, latemins
,
country
, and latemins:country
out3a <- lm(angry ~ latemins*country, data=ex3)
out3ax <- lm(angry ~ latemins + latemins:country, data=ex3)
anova(out3ax, out3a)
## Analysis of Variance Table
##
## Model 1: angry ~ latemins + latemins:country
## Model 2: angry ~ latemins * country
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 146 5788.8
## 2 144 5786.4 2 2.4004 0.0299 0.9706
Next, test the differences between countries at three-minute late. Use centering to test the differences.
out3a3 <- lm(angry ~ I(latemins - 3)*country, data=ex3)
out3a3x <- lm(angry ~ I(latemins - 3) + I(latemins - 3):country, data=ex3)
anova(out3a3x, out3a3)
## Analysis of Variance Table
##
## Model 1: angry ~ I(latemins - 3) + I(latemins - 3):country
## Model 2: angry ~ I(latemins - 3) * country
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 146 7836.4
## 2 144 5786.4 2 2050 25.508 3.29e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Next, test the differences between countries at six-minute late. Use centering to test the differences.
out3a6 <- lm(angry ~ I(latemins - 6)*country, data=ex3)
out3a6x <- lm(angry ~ I(latemins - 6) + I(latemins - 6):country, data=ex3)
anova(out3a6x, out3a6)
## Analysis of Variance Table
##
## Model 1: angry ~ I(latemins - 6) + I(latemins - 6):country
## Model 2: angry ~ I(latemins - 6) * country
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 146 9467.6
## 2 144 5786.4 2 3681.2 45.805 4.019e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Next, test the differences between countries at nine-minute late. Use centering to test the differences.
out3a9 <- lm(angry ~ I(latemins - 9)*country, data=ex3)
out3a9x <- lm(angry ~ I(latemins - 9) + I(latemins - 9):country, data=ex3)
anova(out3a9x, out3a9)
## Analysis of Variance Table
##
## Model 1: angry ~ I(latemins - 9) + I(latemins - 9):country
## Model 2: angry ~ I(latemins - 9) * country
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 146 8589.9
## 2 144 5786.4 2 2803.5 34.884 4.429e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Next, test the differences between countries at twelve-minute late. Use centering to test the differences.
out3a12 <- lm(angry ~ I(latemins - 12)*country, data=ex3)
out3a12x <- lm(angry ~ I(latemins - 12) + I(latemins - 12):country, data=ex3)
anova(out3a12x, out3a12)
## Analysis of Variance Table
##
## Model 1: angry ~ I(latemins - 12) + I(latemins - 12):country
## Model 2: angry ~ I(latemins - 12) * country
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 146 8107.5
## 2 144 5786.4 2 2321.1 28.882 2.841e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Read the data set
ex4 <- read.table("lecture3ex4.csv", sep=",", header=TRUE)
Find the descriptive statistics of the variables in the data set
describe(ex4)
## vars n mean sd median trimmed mad min max range skew
## fearprob 1 50 0.59 0.28 0.61 0.6 0.32 0.01 0.99 0.98 -0.36
## cigarette 2 50 14.26 6.94 15.00 14.2 5.93 0.00 36.00 36.00 0.23
## kurtosis se
## fearprob -0.95 0.04
## cigarette 0.56 0.98
Test whether each correlation in a correlation matrix (only between continuous variables) is significantly different from 0.
corr.test(ex4)
## Call:corr.test(x = ex4)
## Correlation matrix
## fearprob cigarette
## fearprob 1.00 -0.59
## cigarette -0.59 1.00
## Sample Size
## [1] 50
## Probability values (Entries above the diagonal are adjusted for multiple tests.)
## fearprob cigarette
## fearprob 0 0
## cigarette 0 0
##
## To see confidence intervals of the correlations, print with the short=FALSE option
Run the multiple regression without the second-order terms.
out4 <- lm(cigarette ~ fearprob, data=ex4)
summary(out4)
##
## Call:
## lm(formula = cigarette ~ fearprob, data = ex4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.0786 -3.2576 -0.7166 2.7475 13.2565
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 22.897 1.898 12.067 3.82e-16 ***
## fearprob -14.699 2.926 -5.024 7.44e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.679 on 48 degrees of freedom
## Multiple R-squared: 0.3446, Adjusted R-squared: 0.3309
## F-statistic: 25.24 on 1 and 48 DF, p-value: 7.436e-06
Run the multiple regression with the second-order term.
out4a <- lm(cigarette ~ fearprob + I(fearprob^2), data=ex4)
summary(out4a)
##
## Call:
## lm(formula = cigarette ~ fearprob + I(fearprob^2), data = ex4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.6048 -3.8607 0.4359 3.3665 9.0118
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 29.920 2.645 11.311 5.18e-15 ***
## fearprob -50.608 10.650 -4.752 1.94e-05 ***
## I(fearprob^2) 33.467 9.617 3.480 0.00109 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.117 on 47 degrees of freedom
## Multiple R-squared: 0.4789, Adjusted R-squared: 0.4567
## F-statistic: 21.6 on 2 and 47 DF, p-value: 2.228e-07
The anova
function compares the models and without the
second-order term.
anova(out4, out4a)
## Analysis of Variance Table
##
## Model 1: cigarette ~ fearprob
## Model 2: cigarette ~ fearprob + I(fearprob^2)
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 48 1547.8
## 2 47 1230.7 1 317.12 12.111 0.001093 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Plot the curvilinear relationship.
with(ex4, plot(fearprob, cigarette, xlab="Probability of the Fear Advertisement", ylab="Number of Cigarettes Smoked"))
myorder <- order(ex4[,"fearprob"])
yhat <- predict(out4a)
lines(ex4[myorder,"fearprob"], yhat[myorder])
Probe the instantaneous rate of change at the probability of fear advertisement of 20%.
out4a2 <- lm(cigarette ~ I(fearprob - 0.2) + I((fearprob - 0.2)^2), data=ex4)
summary(out4a2)
##
## Call:
## lm(formula = cigarette ~ I(fearprob - 0.2) + I((fearprob - 0.2)^2),
## data = ex4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.6048 -3.8607 0.4359 3.3665 9.0118
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21.138 1.297 16.293 < 2e-16 ***
## I(fearprob - 0.2) -37.221 6.988 -5.326 2.77e-06 ***
## I((fearprob - 0.2)^2) 33.467 9.617 3.480 0.00109 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.117 on 47 degrees of freedom
## Multiple R-squared: 0.4789, Adjusted R-squared: 0.4567
## F-statistic: 21.6 on 2 and 47 DF, p-value: 2.228e-07
Probe the instantaneous rate of change at the probability of fear advertisement of 70%.
out4a7 <- lm(cigarette ~ I(fearprob - 0.7) + I((fearprob - 0.7)^2), data=ex4)
summary(out4a7)
##
## Call:
## lm(formula = cigarette ~ I(fearprob - 0.7) + I((fearprob - 0.7)^2),
## data = ex4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.6048 -3.8607 0.4359 3.3665 9.0118
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.8939 0.9241 11.788 1.22e-15 ***
## I(fearprob - 0.7) -3.7540 4.1040 -0.915 0.36501
## I((fearprob - 0.7)^2) 33.4669 9.6167 3.480 0.00109 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.117 on 47 degrees of freedom
## Multiple R-squared: 0.4789, Adjusted R-squared: 0.4567
## F-statistic: 21.6 on 2 and 47 DF, p-value: 2.228e-07
Examine the assumptions by applying the plot
function to
the output.
plot(out3)
Using the car
package to find useful functions for
checking assumptions. First, check the outliers.
library(car)
## Warning: package 'car' was built under R version 4.2.3
## Loading required package: carData
## Warning: package 'carData' was built under R version 4.2.3
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
outlierTest(out3)
## No Studentized residuals with Bonferroni p < 0.05
## Largest |rstudent|:
## rstudent unadjusted p-value Bonferroni p
## 108 2.826432 0.005372 0.80579
The q-q plot can check the normality of residuals and outliers.
qqPlot(out3)
## [1] 108 110
The leverage plot can check influential cases.
leveragePlots(out3)