|
|
Categorical
(nominal) variables can be coded for multiple regression analyses in several
ways, three of which we will examine here. The topic of the multiple regression
approach to ANOVA is very complex. For this reason, I will try to reduce the
complexity for purposes of introduction by limiting our discussion to cases
where all values of the categorical variable (all the groups) have the
same number of subjects (n). We will discuss three types of coding, (1) dummy
coding, (2) effect coding, and (3) orthogonal
contrast coding. We will then explore how the multiple
regression analysis is run and interpreted for each type of coding.
When we add a
set of categorical coded variables to an equation which already has an interval
predictor, we are essentially doing ANCOVA where the interval predictor is a
covariate. For this reason, I have run ANCOVA results with the various methods
below to show how they relate.
In coding nominal variables, the
first thing to do is to determine the number of categories the nominal variable
has. For our purposes, we will assume we have a nominal variable for zip code,
and we will assume our total sample has 10 people from each of 4 zip codes
(n=10, N=40).
For all types
of categorical coding, the number of categorical variables needed for the
regression analysis is the number of categories or groups minus 1. Since we
have 4 categories in our variable, we will need 3 recoded regression variables
to represent the one nominal variable of zip code. For each participant, we
have a score of some type which is the subject of our analysis (the criterion
variable), and we have a rating of some type which is an interval level predictor
variable..
The data for this example is
shown below:
|
|
|
|
|
|
|
Participant #
|
Zip Code
|
Label
|
Rating
|
Score
|
|
1
|
10023
|
1
|
32
|
11
|
|
2
|
10023
|
1
|
51
|
14
|
|
3
|
10023
|
1
|
33
|
13
|
|
4
|
10023
|
1
|
39
|
13
|
|
5
|
10023
|
1
|
29
|
8
|
|
6
|
10023
|
1
|
23
|
8
|
|
7
|
10023
|
1
|
48
|
13
|
|
8
|
10023
|
1
|
77
|
20
|
|
9
|
10023
|
1
|
21
|
13
|
|
10
|
10023
|
1
|
42
|
13
|
|
11
|
43229
|
2
|
47
|
22
|
|
12
|
43229
|
2
|
35
|
19
|
|
13
|
43229
|
2
|
78
|
28
|
|
14
|
43229
|
2
|
22
|
20
|
|
15
|
43229
|
2
|
42
|
21
|
|
16
|
43229
|
2
|
28
|
17
|
|
17
|
43229
|
2
|
33
|
18
|
|
18
|
43229
|
2
|
77
|
26
|
|
19
|
43229
|
2
|
48
|
26
|
|
20
|
43229
|
2
|
63
|
25
|
|
21
|
82673
|
3
|
52
|
12
|
|
22
|
82673
|
3
|
38
|
9
|
|
23
|
82673
|
3
|
85
|
17
|
|
24
|
82673
|
3
|
44
|
11
|
|
25
|
82673
|
3
|
45
|
12
|
|
26
|
82673
|
3
|
53
|
17
|
|
27
|
82673
|
3
|
50
|
11
|
|
28
|
82673
|
3
|
15
|
11
|
|
29
|
82673
|
3
|
63
|
14
|
|
30
|
82673
|
3
|
41
|
9
|
|
31
|
75428
|
4
|
55
|
13
|
|
32
|
75428
|
4
|
56
|
17
|
|
33
|
75428
|
4
|
28
|
8
|
|
34
|
75428
|
4
|
34
|
12
|
|
35
|
75428
|
4
|
26
|
10
|
|
36
|
75428
|
4
|
28
|
12
|
|
37
|
75428
|
4
|
26
|
9
|
|
38
|
75428
|
4
|
60
|
17
|
|
39
|
75428
|
4
|
69
|
18
|
|
40
|
75428
|
4
|
40
|
9
|
Dummy Coding
1. Coding
In
dummy coding for our data, we have k=4 categories for zip code, so we need k-1
= 3 dummy variables. We will name these variables D1, D2, and D3. All of the four zip code categories except
for one are each assigned to one of the 3 dummy variables. It really makes no
difference how this assignment is made, other than choosing the reference
category. The unassigned category is the reference category, and the multiple
regression results will be easiest to interpret in terms of comparing each of
the other three categories to the reference category. Then each dummy variable
is coded as "1" for the cases in it's assigned category, and it is
coded "0" for all other cases. In our case, I have chosen to make the
reference category Commerce, so that our MR results will allow us to compare
the other zip codes to our own zip code. The dummy variable coding for this
analysis is shown below:
|
|
|
|
|
|
Dummy Variables
|
|
Participant #
|
Zip Code
|
Label
|
Rating
|
Score
|
D1
|
D2
|
D3
|
|
1
|
10023
|
1
|
32
|
11
|
1
|
0
|
0
|
|
2
|
10023
|
1
|
51
|
14
|
1
|
0
|
0
|
|
3
|
10023
|
1
|
33
|
13
|
1
|
0
|
0
|
|
4
|
10023
|
1
|
39
|
13
|
1
|
0
|
0
|
|
5
|
10023
|
1
|
29
|
8
|
1
|
0
|
0
|
|
6
|
10023
|
1
|
23
|
8
|
1
|
0
|
0
|
|
7
|
10023
|
1
|
48
|
13
|
1
|
0
|
0
|
|
8
|
10023
|
1
|
77
|
20
|
1
|
0
|
0
|
|
9
|
10023
|
1
|
21
|
13
|
1
|
0
|
0
|
|
10
|
10023
|
1
|
42
|
13
|
1
|
0
|
0
|
|
11
|
43229
|
2
|
47
|
22
|
0
|
1
|
0
|
|
12
|
43229
|
2
|
35
|
19
|
0
|
1
|
0
|
|
13
|
43229
|
2
|
78
|
28
|
0
|
1
|
0
|
|
14
|
43229
|
2
|
22
|
20
|
0
|
1
|
0
|
|
15
|
43229
|
2
|
42
|
21
|
0
|
1
|
0
|
|
16
|
43229
|
2
|
28
|
17
|
0
|
1
|
0
|
|
17
|
43229
|
2
|
33
|
18
|
0
|
1
|
0
|
|
18
|
43229
|
2
|
77
|
26
|
0
|
1
|
0
|
|
19
|
43229
|
2
|
48
|
26
|
0
|
1
|
0
|
|
20
|
43229
|
2
|
63
|
25
|
0
|
1
|
0
|
|
21
|
82673
|
3
|
52
|
12
|
0
|
0
|
1
|
|
22
|
82673
|
3
|
38
|
9
|
0
|
0
|
1
|
|
23
|
82673
|
3
|
85
|
17
|
0
|
0
|
1
|
|
24
|
82673
|
3
|
44
|
11
|
0
|
0
|
1
|
|
25
|
82673
|
3
|
45
|
12
|
0
|
0
|
1
|
|
26
|
82673
|
3
|
53
|
17
|
0
|
0
|
1
|
|
27
|
82673
|
3
|
50
|
11
|
0
|
0
|
1
|
|
28
|
82673
|
3
|
15
|
11
|
0
|
0
|
1
|
|
29
|
82673
|
3
|
63
|
14
|
0
|
0
|
1
|
|
30
|
82673
|
3
|
41
|
9
|
0
|
0
|
1
|
|
31
|
75428
|
4
|
55
|
13
|
0
|
0
|
0
|
|
32
|
75428
|
4
|
56
|
17
|
0
|
0
|
0
|
|
33
|
75428
|
4
|
28
|
8
|
0
|
0
|
0
|
|
34
|
75428
|
4
|
34
|
12
|
0
|
0
|
0
|
|
35
|
75428
|
4
|
26
|
10
|
0
|
0
|
0
|
|
36
|
75428
|
4
|
28
|
12
|
0
|
0
|
0
|
|
37
|
75428
|
4
|
26
|
9
|
0
|
0
|
0
|
|
38
|
75428
|
4
|
60
|
17
|
0
|
0
|
0
|
|
39
|
75428
|
4
|
69
|
18
|
0
|
0
|
0
|
|
40
|
75428
|
4
|
40
|
9
|
0
|
0
|
0
|
2. Analysis
The
analysis is carried out by a sequential regression. The predictors are entered
in two separate blocks in the SPSS Multiple Linear Regression menu. The
"dependent" variable is SCORE. The RATING variable is placed in the
first block, then all three dummy variables (D1,D2, & D3) are placed in the
second block. Also, be sure to order "R-square Change" and
Descriptives. The output is shown below.
Regression
|
Notes
|
|
Output
Created
|
15-OCT-2006 03:28:37
|
|
Comments
|
|
|
Input
|
Filter
|
<none>
|
|
Weight
|
<none>
|
|
Split File
|
<none>
|
|
N of Rows in
Working Data File
|
78
|
|
Missing Value
Handling
|
Definition of
Missing
|
User-defined
missing values are treated as missing.
|
|
Cases Used
|
Statistics are
based on cases with no missing values for any variable used.
|
|
Syntax
|
REGRESSION
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA CHANGE
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT score
/METHOD=ENTER rating /METHOD=ENTER D1 D2 D3 .
|
|
Resources
|
Elapsed Time
|
0:00:00.03
|
|
Memory
Required
|
2396 bytes
|
|
Additional
Memory Required for Residual Plots
|
0 bytes
|
[DataSet0]
|
Descriptive Statistics
|
|
|
Mean
|
Std. Deviation
|
N
|
|
score
|
14.9000
|
5.41034
|
40
|
|
rating
|
44.4000
|
17.35423
|
40
|
|
D1
|
.2500
|
.43853
|
40
|
|
D2
|
.2500
|
.43853
|
40
|
|
D3
|
.2500
|
.43853
|
40
|
|
Correlations
|
|
|
|
score
|
rating
|
D1
|
D2
|
D3
|
|
Pearson
Correlation
|
score
|
1.000
|
.571
|
-.249
|
.789
|
-.281
|
|
rating
|
.571
|
1.000
|
-.165
|
.098
|
.142
|
|
D1
|
-.249
|
-.165
|
1.000
|
-.333
|
-.333
|
|
D2
|
.789
|
.098
|
-.333
|
1.000
|
-.333
|
|
D3
|
-.281
|
.142
|
-.333
|
-.333
|
1.000
|
|
Sig.
(1-tailed)
|
score
|
.
|
.000
|
.061
|
.000
|
.040
|
|
rating
|
.000
|
.
|
.154
|
.274
|
.192
|
|
D1
|
.061
|
.154
|
.
|
.018
|
.018
|
|
D2
|
.000
|
.274
|
.018
|
.
|
.018
|
|
D3
|
.040
|
.192
|
.018
|
.018
|
.
|
|
N
|
score
|
40
|
40
|
40
|
40
|
40
|
|
rating
|
40
|
40
|
40
|
40
|
40
|
|
D1
|
40
|
40
|
40
|
40
|
40
|
|
D2
|
40
|
40
|
40
|
40
|
40
|
|
D3
|
40
|
40
|
40
|
40
|
40
|
|
Variables Entered/Removed(b)
|
|
Model
|
Variables Entered
|
Variables Removed
|
Method
|
|
1
|
rating(a)
|
.
|
Enter
|
|
2
|
D2, D1, D3(a)
|
.
|
Enter
|
|
a All requested
variables entered.
|
|
b Dependent
Variable: score
|
|
Model Summary
|
|
|
Model
|
R
|
R Square
|
Adjusted R Square
|
Std. Error of the Estimate
|
Change Statistics
|
|
|
R Square Change
|
F Change
|
df1
|
df2
|
Sig. F Change
|
|
|
1
|
.571(a)
|
.326
|
.309
|
4.49894
|
.326
|
18.402
|
1
|
38
|
.000
|
|
|
2
|
.940(b)
|
.883
|
.870
|
1.95367
|
.557
|
55.504
|
3
|
35
|
.000
|
|
|
a Predictors:
(Constant), rating
|
|
|
b Predictors:
(Constant), rating, D2, D1, D3
|
|
|
ANOVA(c)
|
|
Model
|
|
Sum of Squares
|
df
|
Mean Square
|
F
|
Sig.
|
|
1
|
Regression
|
372.462
|
1
|
372.462
|
18.402
|
.000(a)
|
|
Residual
|
769.138
|
38
|
20.240
|
|
|
|
Total
|
1141.600
|
39
|
|
|
|
|
2
|
Regression
|
1008.011
|
4
|
252.003
|
66.024
|
.000(b)
|
|
Residual
|
133.589
|
35
|
3.817
|
|
|
|
Total
|
1141.600
|
39
|
|
|
|
|
a Predictors:
(Constant), rating
|
|
b Predictors:
(Constant), rating, D2, D1, D3
|
|
c Dependent
Variable: score
|
|
Coefficients(a)
|
|
|
Model
|
|
Unstandardized Coefficients
|
Standardized Coefficients
|
t
|
Sig.
|
|
|
B
|
Std. Error
|
Beta
|
|
|
1
|
(Constant)
|
6.993
|
1.976
|
|
3.540
|
.001
|
|
|
rating
|
.178
|
.042
|
.571
|
4.290
|
.000
|
|
|
2
|
(Constant)
|
5.627
|
.994
|
|
5.659
|
.000
|
|
|
rating
|
.163
|
.018
|
.522
|
8.821
|
.000
|
|
|
D1
|
.540
|
.875
|
.044
|
.617
|
.541
|
|
|
D2
|
8.869
|
.879
|
.719
|
10.093
|
.000
|
|
|
D3
|
-1.242
|
.882
|
-.101
|
-1.409
|
.168
|
|
|
a Dependent
Variable: score
|
|
|
Excluded Variables(b)
|
|
|
Model
|
|
Beta In
|
t
|
Sig.
|
Partial Correlation
|
Collinearity Statistics
|
|
|
Tolerance
|
|
|
1
|
D1
|
-.159(a)
|
-1.181
|
.245
|
-.191
|
.973
|
|
|
D2
|
.740(a)
|
12.375
|
.000
|
.897
|
.990
|
|
|
D3
|
-.369(a)
|
-3.025
|
.005
|
-.445
|
.980
|
|
|
a Predictors in
the Model: (Constant), rating
|
|
|
b Dependent
Variable: score
|
|
Univariate Analysis of Variance
(ANCOVA with RATING as a covariate --
included for illustration of
adjusted means)
|
Notes
|
|
Output
Created
|
15-OCT-2006 01:19:00
|
|
Comments
|
|
|
Input
|
Filter
|
<none>
|
|
Weight
|
<none>
|
|
Split File
|
<none>
|
|
N of Rows in
Working Data File
|
78
|
|
Missing Value
Handling
|
Definition of
Missing
|
User-defined
missing values are treated as missing.
|
|
Cases Used
|
Statistics are
based on all cases with valid data for all variables in the model.
|
|
Syntax
|
UNIANOVA
score BY label WITH rating
/METHOD = SSTYPE(3)
/INTERCEPT = INCLUDE
/EMMEANS = TABLES(label) WITH(rating=MEAN)
/CRITERIA = ALPHA(.05)
/DESIGN = rating label .
|
|
Resources
|
Elapsed Time
|
0:00:00.03
|
|
Between-Subjects Factors
|
|
|
|
N
|
|
label
|
1.00
|
10
|
|
2.00
|
10
|
|
3.00
|
10
|
|
4.00
|
10
|
|
Tests of Between-Subjects Effects
Dependent Variable: score
|
|
Source
|
Type III Sum of Squares
|
df
|
Mean Square
|
F
|
Sig.
|
|
Corrected
Model
|
1008.011(a)
|
4
|
252.003
|
66.024
|
.000
|
|
Intercept
|
292.471
|
1
|
292.471
|
76.627
|
.000
|
|
rating
|
297.011
|
1
|
297.011
|
77.816
|
.000
|
|
label
|
635.549
|
3
|
211.850
|
55.504
|
.000
|
|
Error
|
133.589
|
35
|
3.817
|
|
|
|
Total
|
10022.000
|
40
|
|
|
|
|
Corrected
Total
|
1141.600
|
39
|
|
|
|
|
a R Squared =
.883 (Adjusted R Squared = .870)
|
Estimated Marginal Means
|
label
Dependent Variable: score
|
|
|
label
|
Mean
|
Std. Error
|
95% Confidence Interval
|
|
|
Lower Bound
|
Upper Bound
|
|
|
1.00
|
13.398(a)
|
.624
|
12.130
|
14.666
|
|
|
2.00
|
21.728(a)
|
.620
|
20.469
|
22.987
|
|
|
3.00
|
11.616(a)
|
.623
|
10.352
|
12.880
|
|
|
4.00
|
12.858(a)
|
.619
|
11.601
|
14.115
|
|
|
a Covariates
appearing in the model are evaluated at the following values: rating =
44.4000.
|
|
The significant R2 change which results from
adding the block of dummy variables to the equation can be interpreted to mean
that zip code does add significantly to the prediction of SCORE. The
dummy variable D2 has a significant beta. To me, that means that knowing
whether you are in Commerce versus zip 43229 (label 2) adds significantly to
the prediction. Notice that the unstandardized regression coefficient for D2 is
8.869. Look at the difference between the estimated marginal means (given in
the illustrative ANCOVA output) for Commerce (Label 4) and the group with Label
2 (which is the Dummy variable D2). It should follow that the unstandardized
regression coefficient for the dummy variables are the difference between the
estimated or adjusted ANCOVA means of the dummy group and the reference group
controlling for RATING.
None of the other comparisons
with Commerce (75428, label 4) make any difference. From looking at the means,
it is clear why. Note that this method of coding only allows us to determine
zip code as a whole adds to prediction, but other than that, it only allows us
to see if each zip code is relevant compared to Commerce (75428, label 4).
Effect Coding
1. Coding
The
process for effect coding is the same as for dummy coding, except the last
category is coded as -1. This produces a situation where the unstandardized
regression coefficient is the difference between the mean criterion score for
the dummy group and the overall mean criterion score, controlling for the
variance which RATING contributes to prediction. An alternative way of looking
at this is that the unstandardized regression coefficient is the difference
between the grand mean and the adjusted mean for the group in an ANCOVA where
RATING is a covariate. It also allows for further examination of differences
between groups (which is conceptualized as differences between regression
coefficient values for the Effect Code variables).
|
|
|
|
|
|
Effect Code Variables
|
|
Participant #
|
Zip Code
|
Label
|
Rating
|
Score
|
E1
|
E2
|
E3
|
|
1
|
10023
|
1
|
32
|
11
|
1
|
0
|
0
|
|
2
|
10023
|
1
|
51
|
14
|
1
|
0
|
0
|
|
3
|
10023
|
1
|
33
|
13
|
1
|
0
|
0
|
|
4
|
10023
|
1
|
39
|
13
|
1
|
0
|
0
|
|
5
|
10023
|
1
|
29
|
8
|
1
|
0
|
0
|
|
6
|
10023
|
1
|
23
|
8
|
1
|
0
|
0
|
|
7
|
10023
|
1
|
48
|
13
|
1
|
0
|
0
|
|
8
|
10023
|
1
|
77
|
20
|
1
|
0
|
0
|
|
9
|
10023
|
1
|
21
|
13
|
1
|
0
|
0
|
|
10
|
10023
|
1
|
42
|
13
|
1
|
0
|
0
|
|
11
|
43229
|
2
|
47
|
22
|
0
|
1
|
0
|
|
12
|
43229
|
2
|
35
|
19
|
0
|
1
|
0
|
|
13
|
43229
|
2
|
78
|
28
|
0
|
1
|
0
|
|
14
|
43229
|
2
|
22
|
20
|
0
|
1
|
0
|
|
15
|
43229
|
2
|
42
|
21
|
0
|
1
|
0
|
|
16
|
43229
|
2
|
28
|
17
|
0
|
1
|
0
|
|
17
|
43229
|
2
|
33
|
18
|
0
|
1
|
0
|
|
18
|
43229
|
2
|
77
|
26
|
0
|
1
|
0
|
|
19
|
43229
|
2
|
48
|
26
|
0
|
1
|
0
|
|
20
|
43229
|
2
|
63
|
25
|
0
|
1
|
0
|
|
21
|
82673
|
3
|
52
|
12
|
0
|
0
|
1
|
|
22
|
82673
|
3
|
38
|
9
|
0
|
0
|
1
|
|
23
|
82673
|
3
|
85
|
17
|
0
|
0
|
1
|
|
24
|
82673
|
3
|
44
|
11
|
0
|
0
|
1
|
|
25
|
82673
|
3
|
45
|
12
|
0
|
0
|
1
|
|
26
|
82673
|
3
|
53
|
17
|
0
|
0
|
1
|
|
27
|
82673
|
3
|
50
|
11
|
0
|
0
|
1
|
|
28
|
82673
|
3
|
15
|
11
|
0
|
0
|
1
|
|
29
|
82673
|
3
|
63
|
14
|
0
|
0
|
1
|
|
30
|
82673
|
3
|
41
|
9
|
0
|
0
|
1
|
|
31
|
75428
|
4
|
55
|
13
|
-1
|
-1
|
-1
|
|
32
|
75428
|
4
|
56
|
17
|
-1
|
-1
|
-1
|
|
33
|
75428
|
4
|
28
|
8
|
-1
|
-1
|
-1
|
|
34
|
75428
|
4
|
34
|
12
|
-1
|
-1
|
-1
|
|
35
|
75428
|
4
|
26
|
10
|
-1
|
-1
|
-1
|
|
36
|
75428
|
4
|
28
|
12
|
-1
|
-1
|
-1
|
|
37
|
75428
|
4
|
26
|
9
|
-1
|
-1
|
-1
|
|
38
|
75428
|
4
|
60
|
17
|
-1
|
-1
|
-1
|
|
39
|
75428
|
4
|
69
|
18
|
-1
|
-1
|
-1
|
|
40
|
75428
|
4
|
40
|
9
|
-1
|
-1
|
-1
|
2. Analysis
The
analysis is carried out in the same way as with dummy coding. First, RATING is
entered as the continuous predictor, and then all effect code variables (E1,
E2, & E3) are added as a block in a sequential test. The results from SPSS
are shown below.
Regression
|
Notes
|
|
Output
Created
|
15-OCT-2006 01:06:02
|
|
Comments
|
|
|
Input
|
Filter
|
<none>
|
|
Weight
|
<none>
|
|
Split File
|
<none>
|
|
N of Rows in
Working Data File
|
78
|
|
Missing Value
Handling
|
Definition of
Missing
|
User-defined
missing values are treated as missing.
|
|
Cases Used
|
Statistics are
based on cases with no missing values for any variable used.
|
|
Syntax
|
REGRESSION
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA CHANGE
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT score
/METHOD=ENTER rating /METHOD=ENTER E1 E2 E3 .
|
|
Resources
|
Elapsed Time
|
0:00:00.05
|
|
Memory
Required
|
2396 bytes
|
|
Additional
Memory Required for Residual Plots
|
0 bytes
|
|
Descriptive Statistics
|
|
|
Mean
|
Std. Deviation
|
N
|
|
score
|
14.9000
|
5.41034
|
40
|
|
rating
|
44.4000
|
17.35423
|
40
|
|
E1
|
.0000
|
.71611
|
40
|
|
E2
|
.0000
|
.71611
|
40
|
|
E3
|
.0000
|
.71611
|
40
|
|
Correlations
|
|
|
|
score
|
rating
|
E1
|
E2
|
E3
|
|
Pearson
Correlation
|
score
|
1.000
|
.571
|
.007
|
.642
|
-.013
|
|
rating
|
.571
|
1.000
|
-.056
|
.105
|
.132
|
|
E1
|
.007
|
-.056
|
1.000
|
.500
|
.500
|
|
E2
|
.642
|
.105
|
.500
|
1.000
|
.500
|
|
E3
|
-.013
|
.132
|
.500
|
.500
|
1.000
|
|
Sig. (1-tailed)
|
score
|
.
|
.000
|
.484
|
.000
|
.468
|
|
rating
|
.000
|
.
|
.366
|
.259
|
.208
|
|
E1
|
.484
|
.366
|
.
|
.001
|
.001
|
|
E2
|
.000
|
.259
|
.001
|
.
|
.001
|
|
E3
|
.468
|
.208
|
.001
|
.001
|
.
|
|
N
|
score
|
40
|
40
|
40
|
40
|
40
|
|
rating
|
40
|
40
|
40
|
40
|
40
|
|
E1
|
40
|
40
|
40
|
40
|
40
|
|
E2
|
40
|
40
|
40
|
40
|
40
|
|
E3
|
40
|
40
|
40
|
40
|
40
|
|
Variables Entered/Removed(b)
|
|
Model
|
Variables Entered
|
Variables Removed
|
Method
|
|
1
|
rating(a)
|
.
|
Enter
|
|
2
|
E1, E2, E3(a)
|
.
|
Enter
|
|
a All requested
variables entered.
|
|
b Dependent
Variable: score
|
|
Model Summary
|
|
|
Model
|
R
|
R Square
|
Adjusted R Square
|
Std. Error of the Estimate
|
Change Statistics
|
|
|
R Square Change
|
F Change
|
df1
|
df2
|
Sig. F Change
|
|
|
1
|
.571(a)
|
.326
|
.309
|
4.49894
|
.326
|
18.402
|
1
|
38
|
.000
|
|
|
2
|
.940(b)
|
.883
|
.870
|
1.95367
|
.557
|
55.504
|
3
|
35
|
.000
|
|
|
a Predictors:
(Constant), rating
|
|
|
b Predictors:
(Constant), rating, E1, E2, E3
|
|
|
ANOVA(c)
|
|
Model
|
|
Sum of Squares
|
df
|
Mean Square
|
F
|
Sig.
|
|
1
|
Regression
|
372.462
|
1
|
372.462
|
18.402
|
.000(a)
|
|
Residual
|
769.138
|
38
|
20.240
|
|
|
|
Total
|
1141.600
|
39
|
|
|
|
|
2
|
Regression
|
1008.011
|
4
|
252.003
|
66.024
|
.000(b)
|
|
Residual
|
133.589
|
35
|
3.817
|
|
|
|
Total
|
1141.600
|
39
|
|
|
|
|
a Predictors:
(Constant), rating
|
|
b Predictors:
(Constant), rating, E1, E2, E3
|
|
c Dependent
Variable: score
|
|
Coefficients(a)
|
|
|
Model
|
|
Unstandardized Coefficients
|
Standardized Coefficients
|
t
|
Sig.
|
|
|
B
|
Std. Error
|
Beta
|
|
|
1
|
(Constant)
|
6.993
|
1.976
|
|
3.540
|
.001
|
|
|
rating
|
.178
|
.042
|
.571
|
4.290
|
.000
|
|
|
2
|
(Constant)
|
7.669
|
.876
|
|
8.754
|
.000
|
|
|
rating
|
.163
|
.018
|
.522
|
8.821
|
.000
|
|
|
E1
|
-1.502
|
.543
|
-.199
|
-2.768
|
.009
|
|
|
E2
|
6.828
|
.538
|
.904
|
12.698
|
.000
|
|
|
E3
|
-3.284
|
.541
|
-.435
|
-6.075
|
.000
|
|
|
a Dependent
Variable: score
|
|
|
Excluded Variables(b)
|
|
|
Model
|
|
Beta In
|
t
|
Sig.
|
Partial Correlation
|
Collinearity Statistics
|
|
|
Tolerance
|
|
|
1
|
E1
|
.039(a)
|
.286
|
.777
|
.047
|
.997
|
|
|
E2
|
.588(a)
|
6.182
|
.000
|
.713
|
.989
|
|
|
E3
|
-.090(a)
|
-.667
|
.509
|
-.109
|
.983
|
|
|
a Predictors in
the Model: (Constant), rating
|
|
|
b Dependent
Variable: score
|
|
Univariate Analysis of Variance
(ANCOVA with RATING as a covariate --
included for illustration of
adjusted means)
|
Notes
|
|
Output
Created
|
15-OCT-2006 01:19:00
|
|
Comments
|
|
|
Input
|
Filter
|
<none>
|
|
Weight
|
<none>
|
|
Split File
|
<none>
|
|
N of Rows in
Working Data File
|
78
|
|
Missing Value
Handling
|
Definition of
Missing
|
User-defined
missing values are treated as missing.
|
|
Cases Used
|
Statistics are
based on all cases with valid data for all variables in the model.
|
|
Syntax
|
UNIANOVA
score BY label WITH rating
/METHOD = SSTYPE(3)
/INTERCEPT = INCLUDE
/EMMEANS = TABLES(label) WITH(rating=MEAN)
/CRITERIA = ALPHA(.05)
/DESIGN = rating label .
|
|
Resources
|
Elapsed Time
|
0:00:00.03
|
|
Between-Subjects Factors
|
|
|
|
N
|
|
label
|
1.00
|
10
|
|
2.00
|
10
|
|
3.00
|
10
|
|
4.00
|
10
|
|
Tests of Between-Subjects Effects
Dependent Variable: score
|
|
Source
|
Type III Sum of Squares
|
df
|
Mean Square
|
F
|
Sig.
|
|
Corrected
Model
|
1008.011(a)
|
4
|
252.003
|
66.024
|
.000
|
|
Intercept
|
292.471
|
1
|
292.471
|
76.627
|
.000
|
|
rating
|
297.011
|
1
|
297.011
|
77.816
|
.000
|
|
label
|
635.549
|
3
|
211.850
|
55.504
|
.000
|
|
Error
|
133.589
|
35
|
3.817
|
|
|
|
Total
|
10022.000
|
40
|
|
|
|
|
Corrected
Total
|
1141.600
|
39
|
|
|
|
|
a R Squared =
.883 (Adjusted R Squared = .870)
|
Estimated Marginal Means
|
label
Dependent Variable: score
|
|
|
label
|
Mean
|
Std. Error
|
95% Confidence Interval
|
|
|
Lower Bound
|
Upper Bound
|
|
|
1.00
|
13.398(a)
|
.624
|
12.130
|
14.666
|
|
|
2.00
|
21.728(a)
|
.620
|
20.469
|
22.987
|
|
|
3.00
|
11.616(a)
|
.623
|
10.352
|
12.880
|
|
|
4.00
|
12.858(a)
|
.619
|
11.601
|
14.115
|
|
|
a Covariates
appearing in the model are evaluated at the following values: rating =
44.4000.
|
|
Note the
adjusted marginal means in the last table of the ANCOVA output. Now, go back
and jot down the grand mean for SCORE, as well as the unstandardized regression
coefficients for E1, E2 & E3 from the regression output. Those values are
noted below:
Grand Mean for SCORE = 14.9
BE1 = -1.502
BE2 = 6.828
BE3 = -3.284
Note we can get the adjusted
means as verified in the ANCOVA output above for the zip code groups by adding
the B values from the regression to the grand mean for SCORE:
ZIP1 = 14.9 - 1.502 =13.40
ZIP2 = 14.9 + 6.828 = 21.73
ZIP3 = 14.9 - 3.284 = 11.62
Since (ZIP1 + ZIP2 + ZIP3 +
ZIP4)/4 = 14.9, we know that
ZIP4 = (4)14.9 - ZIP1 - ZIP2
-ZIP3
ZIP4 = 59.6 - 13.4 - 21.73 -
11.62 = 12.85
The following equation can now be used to compare the four
groups for significant differences:

Here the means in the numerator are the adjusted means for
ZIP's computed above, "a" is the number of groups (4 in our case),
MS'wg is the residual mean square from the regression Model 2, the
Y-Y is the difference between the means for RATING in the two groups (the means
can easily be obtained using MEANS procedure in SPSS* as shown below), SSwg(y) is the SSresidual
for a regression of RATING on the effect variables (easily obtained by running
such regression at the same time), and the "c" values are contrast
coefficients as we have used for post-hoc comparisons before.
*Report
group means for rating
|
label
|
Mean
|
N
|
Std. Deviation
|
|
1.00
|
39.5000
|
10
|
16.46714
|
|
2.00
|
47.3000
|
10
|
19.63019
|
|
3.00
|
48.6000
|
10
|
17.94560
|
|
4.00
|
42.2000
|
10
|
16.29451
|
|
Total
|
44.4000
|
40
|
17.35423
|
You should
also note with this analysis that we get the exact same R2 change
when we enter the Effect Codes as we got in the first analysis when we entered the
Dummy Codes. The overall variance accounted for does not change by changing the
coding method.
For groups
that are significantly different, the final task is to create a separate
regression equation for each group or each set of homogenous groups.
Orthogonal Contrast Coding
1. Coding
In orthogonal
contrast coding, we create contrasts with the k-1 regression variables which
satisfy the orthogonality criterion below,

where the c values are contrast coefficients and the j values
are groups. Two contrasts are considered orthogonal if the sum of the products
of their contrast coefficients across all groups is zero. Orthogonal contrasts
are statistically independent contrasts, meaning they provide unique
information not dependent on other comparisons. Orthogonal contrasts are
peculiarly related to our system of assigning values to k-1 variables when we
have k groups, because each set of possible orthogonal contrasts have at most
k-1 comparisons.
Let's consider
our effect size contrasts to determine if all possible pairs of the three
contrasts we used are orthogonal. We can compare E1 & E2, E1 & E3, and
E2 & E3. The contrast coefficients used in effect coding are given below:
|
|
E1
|
E2
|
E3
|
|
ZIP1
|
1
|
0
|
0
|
|
ZIP2
|
0
|
1
|
0
|
|
| |