Regression with Two Independent Variables

Objectives

Write a raw score regression equation with 2 ivs in it.

What is the difference in interpretation of b weights in simple regression vs. multiple regression?

Describe R-square in two different ways, that is, using two distinct formulas. Explain the formulas.

What happens to b weights if we add new variables to the regression equation that are highly correlated with ones already in the equation?

Why do we report beta weights (standardized b weights)?

Write a regression equation with beta weights in it.

What are the three factors that influence the standard error of the b weight?

How is it possible to have a significant R-square and non-significant b weights?

Materials

 

The Regression Line

With one independent variable, we may write the regression equation as:

Where Y is an observed score on the dependent variable, a is the intercept, b is the slope, X is the observed score on the independent variable, and e is an error or residual.

We can extend this to any number of independent variables:

(3.1)

Note that we have k independent variables and a slope for each. We still have one error and one intercept. Again we want to choose the estimates of a and b so as to minimize the sum of squared errors of prediction. The prediction equation is:

(3.2)

Finding the values of b is tricky for k>2 independent variables, and will be developed after some matrix algebra. It's simpler for k=2 IVs, which we will discuss here.

For the one variable case, the calculation of b and a was:

For the two variable case:

and

At this point, you should notice that all the terms from the one variable case appear in the two variable case. In the two variable case, the other X variable also appears in the equation. For example, X2 appears in the equation for b1. Note that terms corresponding to the variance of both X variables occur in the slopes. Also note that a term corresponding to the covariance of X1 and X2 (sum of deviation cross-products) also appears in the formula for the slope.

The equation for a with two independent variables is:

This equation is a straight-forward generalization of the case for one independent variable.

A Numerical Example

Suppose we want to predict job performance of Chevy mechanics based on mechanical aptitude test scores and test scores from personality test that measures conscientiousness.

Job Perf

Mech Apt

Consc

Y

X1

X2

X1*Y

X2*Y

X1*X2

1

40

25

40

25

1000

2

45

20

90

40

900

1

38

30

38

30

1140

3

50

30

150

90

1500

2

48

28

96

56

1344

3

55

30

165

90

1650

3

53

34

159

102

1802

4

55

36

220

144

1980

4

58

32

232

128

1856

3

40

34

120

102

1360

5

55

38

275

190

2090

3

48

28

144

84

1344

3

45

30

135

90

1350

2

55

36

110

72

1980

4

60

34

240

136

2040

5

60

38

300

190

2280

5

60

42

300

210

2520

5

65

38

325

190

2470

4

50

34

200

136

1700

3

58

38

174

114

2204

Y

X1

X2

X1*Y

X2*Y

X1*X2

65

1038

655

3513

2219

34510

Sum

20

20

20

20

20

20

N

3.25

51.9

32.75

175.65

110.95

1725.5

M

1.25

7.58

5.24

84.33

54.73

474.60

SD

29.75

1091.8

521.75

USS

We can collect the data into a matrix like this:

y

X1

X2

Y

29.75

139.5

90.25

X1

0.77

1091.8

515.5

X2

0.72

0.68

521.75

 

The numbers in the table above correspond to the following sums of squares, cross products, and correlations:

y

x1

X2

Y

X1

X2

 

We can now compute the regression coefficients:

 

 

 

To find the intercept, we have:

Therefore, our regression equation is:

Y '= -4.10+.09X1+.09X2 or

Job Perf' = -4.10 +.09MechApt +.09Coscientiousness.

Visual Representations of the Regression

We have 3 variables, so we have 3 scatterplots that show their relations.

 

Because we have computed the regression equation, we can also view a plot of Y' vs. Y, or actual vs. predicted Y.

We can (sort of) view the plot in 3D space, where the two predictors are the X and Y axes, and the Z axis is the criterion, thus:

This graph doesn't show it very well, but the regression problem can be thought of as a sort of response surface problem. What is the expected height (Z) at each value of X and Y? The linear regression solution to this problem in this dimensionality is a plane.

R-square (R2)

Just as in simple regression, the dependent variable is thought of as a linear part and an error. In multiple regression, the linear part has more than one X variable associated with it. When we do multiple regression, we can compute the proportion of variance due to regression. This proportion is called R-square. We use a capital R to show that it's a multiple R instead of a single variable r. We can also compute the correlation between Y and Y' and square that. If we do, we will also find R-square.

 

Y

X1

X2

Y'

Resid

2

45

20

1.54

0.46

1

38

30

1.81

-0.81

3

50

30

2.84

0.16

2

48

28

2.50

-0.50

3

55

30

3.28

-0.28

3

53

34

3.45

-0.45

4

55

36

3.80

0.20

4

58

32

3.71

0.29

3

40

34

2.33

0.67

5

55

38

3.98

1.02

3

48

28

2.50

0.50

3

45

30

2.41

0.59

2

55

36

3.80

-1.80

4

60

34

4.06

-0.06

5

60

38

4.41

0.59

5

60

42

4.76

0.24

5

65

38

4.84

0.16

4

50

34

3.19

0.80

3

58

38

4.24

-1.24

M = 3.25

51.9

32.75

3.25

0

V = 1.57

57.46

27.46

1.05

0.52

USS=29.83

19.95

9.88

 

The mean of Y is 3.25 and so is the mean of Y'. The mean of the residuals is 0. The variance of Y is 1.57. The variance of Y' is 1.05, and the variance of the residuals is .52. Together, the variance of regression (Y') and the variance of error (e) add up to the variance of Y (1.57 = 1.05+.52). R-square is 1.05/1.57 or .67. If we compute the correlation between Y and Y' we find that R=.82, which when squared is also an R-square of .67. (Recall the scatterplot of Y and Y'). R-square is the proportion of variance in Y due to the multiple regression.

Testing the Significance of R2

You have already seen this once, but here it is again in a new context:

which is distributed as F with k and (N-k-1) degrees of freedom when the null is true. Now R2 is for the multiple correlation rather than the simple correlation that we saw last time. For our most recent example, we have 2 independent variables, an R2 of .67, and 20 people, so

 

p < .01. (Fcrit for p<.01 is about 6).

Because SStot=SSreg+SSres , we can compute an equivalent F using sums of squares and associated df.

which agrees with our earlier result within rounding error.

 

Relative Importance of the Independent Variables

In simple regression, we have one IV that accounts for a proportion of variance in Y. The influence of this variable (how important it is in predicting or explaining Y) is described by r2. If r2 is 1.0, we know that the DV can be predicted perfectly from the IV; all of the variance in the DV is accounted for. If the r2 is 0, we know that there is no linear association; the IV is not important in predicting or explaining Y. With 2 or more IVs, we also get a total R2. This R2 tells us how much variance in Y is accounted for by the set of IVs, that is, the importance of the linear combination of IVs (b1X1+b2X2+...+bkXk). Often we would like to know the importance of each of the IVs in predicting or explaining Y. In our example, we know that mechanical aptitude and conscientiousness together predict about 2/3 of the variance in job performance ratings. But how important are mech apt and consc in relation to each other? Correlation and regression provide answers to this question. Unfortunately, the answers do not always agree. It is important to understand why they sometimes agree and sometimes disagree. You must understand this potential disagreement to make appropriate interpretations of regression weights.

I am going to introduce Venn diagrams first to describe what happens. You should know that Venn diagrams are not an accurate representation of how regression actually works. Venn diagrams can mislead you in your reasoning. However, most people find them much easier to grasp than the related equations, so here goes. We are going to predict Y from 2 independent variables, X1 and X2. Let's suppose that both X1 and X2 are correlated with Y, but X1 and X2 are not correlated with each other. Our diagram might look like Figure 5.1:

Figure 5.1

Figure 5.2

 

In Figure 5.1, we have three circles, one for each variable. Each circle represents the variance of the variable. The size of the (squared) correlation between two variables is indicated by the overlap in circles. Recall that the squared correlation is the proportion of shared variance between two variables. In Figure 5.1, X1 and X2 are not correlated. This is indicated by the lack of overlap in the two variables. We can compute the correlation between each X variable and Y. These correlations and their squares will indicate the relative importance of the independent variables. Figure 5.1 might correspond to a correlation matrix like this:

R

Y

X1

X2

Y

1

   

X1

.50

1

 

X2

.60

.00

1

 

In the case that X1 and X2 are uncorrelated, we can estimate the shared variance between the two X variables and Y by summing the squared correlations. In our example, the shared variance would be .502+.602 = .25+.36 = .61. This turns out to be 61 percent shared variance, and if we calculated a regression equation, we would find that R2 was .61 (The calculations will be more fully developed later. For now, concentrate on the figures.) If X1 and X2 are uncorrelated, then they don't share any variance with each other. If they do share variance with Y, then whatever variance is shared with Y is must be unique to that X because the X variables don't overlap.

 

On the other hand, it is usually the case that the X variables are correlated and do share some variance, as shown in Figure 5.2, where X1 and X2 overlap somewhat. Note that X1 and X2 overlap both with each other and with Y. There is a section where X1 and X2 overlap with each other but not with Y (labeled 'shared X' in Figure 5.2). There are sections where each overlaps with Y but not with the other X (labeled 'UY:X1' and 'UY:X2'). The portion on the left is the part of Y that is accounted for uniquely by X1 (UY:X1). The similar portion on the right is the part of Y accounted for uniquely by X2 (UY:X2). The last overlapping part shows that part of Y that is accounted for by both of the Y variables ('shared Y').

 

Just as in Figure 5.1, we could compute the correlations between each X and Y. For X1, the correlation would include the areas UY:X1 and shared Y. For X2, the correlation would contain UY:X2 and shared Y. Note that shared Y would be counted twice, once for each X variable. We could also compute a regression equation and then compute R2 based on that equation. If we did, we would find that R2 corresponds to UY:X1 plus UY:X2 plus shared Y. Note that R2 due to regression of Y on both X variables at once will give us the proper variance accounted for, with shared Y only being counted once. Now we want to assign or divide up R2 to the appropriate X variables in accordance with their importance. We can do this a couple of ways. Any way we do this, we will assign the unique part of Y to the appropriate X (UY:X1 goes to X1, UY:X2 goes to X2). But what to do with shared Y? The most common solution to this problem is to ignore it. If we assign regression sums of squares according the magnitudes of the b weights, we will be assigning sums of squares to the unique portions only. The shared portion will assigned to the overall R2, but not to any of the variables that share it. (There are other ways that divvy up the shared part. We'll visit them later.). In multiple regression, we are typically interested in predicting or explaining all the variance in Y. To do this, we need independent variables that are correlated with Y, but not with X. It's hard to find such variables, however. It is more typical to find new X variables that are correlated with old X variables and shared Y instead of unique Y. The desired vs. typical state of affairs in multiple regression can be illustrated with another Venn diagram:

Desired State (Fig 5.3)

Typical State (Fig 5.4)

Notice that in Figure 5.3, the desired state of affairs, each X variable is minimally correlated with the other X variables, but is substantially correlated with Y. In such a case, R2 will be large, and the influence of each X will be unambiguous. The typical state of affairs is shown in Figure 5.4. Note how variable X3 is substantially correlated with Y, but also with X1 and X2. This means that X3 contributes nothing new or unique to the prediction of Y. It also muddies the interpretation of the importance of the X variables as it is difficult to assign shared variance in Y to any X.

Standardized & Unstandardized Weights (b vs. b)

Each X variable will have associated with it one slope or regression weight. Each weight is interpreted to mean the unit change in Y given a unit change in X, so the slope can tell us something about the importance of the X variables. (Strictly speaking, the statement about the interpretation slope isn't true without mentioning the other X variables. But it's close enough untill we get to partial correlations). Variables with large b weights ought to tell us that they are more important because Y changes more rapidly for some of them than for others. The problem with unstandardized or raw score b weights in this regard is that they have different units of measurement, and thus different standard deviations and different meanings. If we measured X = height in feet rather than X = height in inches, the b weight for feet would be 12 times larger than the b for inches (12 inches in a foot; in both cases we interpret b as the unit change in Y when X changes 1 unit). So when we measure different X variables in different units, part of the size of b is attributable to units rather than importance per se. So what we can do is to standardize all the variables (both X and Y, each X in turn). If we do that, then the importance of the X variables will be readily apparent by the size of the b weights -- all will be interpreted as the number of standard deviations that Y changes when each X changes 1 standard deviation. The standardized slopes are called beta (b ) weights. This is an extremely poor choice of symbols, because we have already used b to mean the population value of b (don't blame me; this is part of the literature). From here out, b will refer to standardized b weights, that is, to estimates of parameters, unless otherwise noted.

Regression Equations with b weights

Because we are using standardized scores, we are back into the z-score situation. As you recall from the comparison of correlation and regression:

But b means a b weight when X and Y are in standard scores, so for the simple regression case, r = b , and we have:

The earlier formulas I gave for b were composed of sums of square and cross products (). But with z scores, we will be dealing with standardized sums of squares and cross products. A standardized averaged sum of squares is 1 () and a standardized averaged sum of cross products is a correlation coefficient (). Bottom line on this is we can estimate beta weights (b s) using a correlation matrix. With simple regression, as you have already seen, r=b . With two independent variables,

and

where ry1 is the correlation of y with X1, ry2 is the correlation of y with X2, and r12 is the correlation of X1 with X2. Note that the two formulas are nearly identical, the exception is the ordering of the first two symbols in the numerator.

Our correlation matrix looks like this:

Y

X1

X2

Y

1

X1

0.77

1

X2

0.72

0.68

1

Note that there is a surprisingly large difference in beta weights given the magnitude of correlations.

Let's look at this for a minute, first at the equation for b 1. The numerator says that b 1 is the correlation (of X1 and Y) minus the correlation (of X2 and Y) times the correlation (of X1 and X2). The denominator says boost the numerator a bit depending on the size of the correlation between X1 and X2. Suppose r12 is zero. Then ry2r12 is zero, and the numerator is ry1. The denominator is 1, so the result is ry1, the simple correlation between X1 and Y. If the correlation between X1 and X2 is zero, the beta weight is the simple correlation. On the other hand, if the correlation between X1 and X2 is 1.0, the beta is undefined, because we would be dividing by zero. So our life is less complicated if the correlation between the X variables is zero. Suppose that r12 is somewhere between 0 and 1. Then we will be in the situation depicted in Figure 5.2, where all three circles overlap. The beta weight for X1 (b 1 ) will be essentially that part of the picture labeled UY:X1. We start with ry1, which has both UY:X1 and shared Y in it. (When r12 is zero, we stop here, because we don't have to worry about the shared part). We subtract ry2 times r12, which means subtracting only that pat of ry2 that corresponds to the shared part of X. But the shared part of X contains both shared X with X, and shared Y, so we will take out too much. To correct for this, we divide by 1-r212 to boost b 1 back up to where it should be.

Calculating R2

As I already mentioned, one way to compute R2 is to compute the correlation between Y and Y', and square that. There are some other ways to calculate R2, however, and these are important for a conceptual understanding of what is happening in multiple regression. If the independent variables are uncorrelated, then

This says that R2, the proportion of variance in the dependent variable accounted for by both the independent variables, is equal to the sum of the squared correlations of the independent variables with Y. This is only true when the IVs are orthogonal.

[Review Venn diagrams, Figure 5.1.]

In our example, R2 is .67. The correlations are ry1=.77 and ry2 = .72. If we square and add, we get .772+.722 = .5929+.5184 = 1.11, which is clearly too large a value for R2.

 If the IVs are correlated, then we have some shared X and possibly shared Y as well, and we have to take that into account. Two general formulas can be used to calculate R2 when the IVs are correlated.

This says to multiply the standardized slope (beta weight) by the correlation for each independent variable and add to calculate R2. What this does is to include both the correlation, (which will overestimate the total R2 because of shared Y) and the beta weight (which underestimates R2 because it only includes the unique Y and discounts the shared Y). Appropriately combined, they yield the correct R2. Note that when r12 is zero, then b 1 = ry1 and b 2 = ry2, so that (b 1)( ry1 )= r2y1 and we have the earlier formula where R2 is the sum of the squared correlations between the Xs and Y. For our example, the relevant numbers are (.52).77+(.37).72 = .40+.27 = .67, which agrees with our earlier value of R2.

 A second formula using only correlation coefficients is

This formula says that R2 is the sum of the squared correlations between the Xs and Y adjusted for the shared X and shared Y. Note that the term on the right in the numerator and the variable in the denominator both contain r12, which is the correlation between X1 and X2. Note that this equation also simplifies the simple sum of the squared correlations when r12 = 0, that is, when the IVs are orthogonal. For our example, we have

which is the same as our earlier value within rounding error.

 

Tests of Regression Coefficients

 Each regression coefficient is a slope estimate. With more than one independent variable, the slopes refer to the expected change in Y when X changes 1 unit, CONTROLLING FOR THE OTHER X VARIABLES. That is, b1 is the change in Y given a unit change in X1 while holding X2 constant, and b2 is the change in Y given a unit change in X2 while holding X1 constant. We will develop this more formally after we introduce partial correlation.

For now, consider Figure 5.2 and what happens if we hold one X constant. The amount change in Y due to X1 while holding X2 constant is a function of the unique contribution of X1. If X1 overlaps considerably with X2, then the change in Y due to X1 while holding the X2 constant will be small.

 

The standard error of the b weight for the two variable problem:

where s2y.12 is the variance of estimate (the variance of the residuals). We use the standard error of the b weight in testing t for significance. (Is the regression weight zero in the population? Is the regression weight equal to some other value in the population?) The standard error of the b weight depends upon three things. The variance of estimate tells us about how far the points fall from the regression line (the average squared distance). Large errors in prediction mean a larger standard error. The sum of squares of the IV also matter. The larger the sum of squares (variance) of X, the smaller the standard error. Restriction of range not only reduces the size of the correlation, but also increases the standard error of the b weight. The correlation between the independent variables also matters. The larger the correlation, the larger the standard error of the b weight. So to find significant b weights, we want to minimize the correlation between the predictors, maximize the variance of the predictors, and minimize the errors of prediction.

The variance of prediction is

and the test of the b weight is a t-test with N-k-1 degrees of freedom.

In our example, the sum of squared errors is 9.79, and the df are 20-2-1 or 17. Therefore, our variance of estimate is

.575871 or .58 after rounding. Our standard errors are:

and Sb2 = .0455, which follows from calculations that are identical except for the value of the sum of squares for X2 instead of X1.

 

To test the b weights for significance, we compute a t statistic

in our case t = .0864/.0313 or 2.75. If we compare this to the t distribution with 17 df, we find that it is significant (from a lookup function, we find that p = .0137, which is less than .05).

For b2, we compute t = .0876/.0455 = 1.926, which has a p value of .0710, which is not significant. Note that the correlation ry2 is .72, which is highly significant (p < .01) but b2 is not significant.

Tests of R2 vs. Tests of b

Because the b-weights are slopes for the unique parts of Y and because correlations among the independent variables increase the standard errors of the b weights, it is possible to have a large, significant R2, but at the same time to have nonsignificant b weights (as in our example). Consider Figure 5.4, where there are many IVs accounting for essentially the same variance in Y. Although R2 will be fairly large, when we hold the other X variables constant to test for b, there will be little change in Y for a given X, and it will be difficult to find a significant b weight. It is also possible to find a significant b weight without a significant R2. This can happen when we have lots of independent variables (usually more than 2), all or most of which have rather low correlations with Y. If one of these variables has a large correlation with Y, R2 may not be significant because with such a large number of IVs we would expect to see as large an R2 just by chance. If R2 is not significant, you should usually avoid interpreting b weights that are significant. In such cases, it is likely that the significant b weight is a type I error.

Testing Incremental R2

We can test the change in R2 that occurs when we add a new variable to a regression equation. We can start with 1 variable and compute an R2 (or r2) for that variable. We can then add a second variable and compute R2 with both variables in it. The second R2 will always be equal to or greater than the first R2. If it is greater, we can ask whether it is significantly greater. To do so, we compute

where R2L is the larger R2 (with more predictors), kL is the number of predictors in the larger equation and kS is the number of predictors in the smaller equation. When the null is true, the result is distributed as F with degrees of freedom equal to (kL - kS) and (N- kL -1). In our example, we know that R2y.12 = .67 (from earlier calculations) and also that ry1 = .77 and ry2 = .72. r2y1=.59 and r2y2=.52. Now we can see if the increase of adding either X1 or X2 to the equation containing the other increases R2 to significant extent. To see if X1 adds variance we start with X2 in the equation:

Our critical value of F(1,17) is 4.45, so our F for the increment of X1 over X2 is significant.

For the increment of X2 over X1, we have

Our critical value of F has not changed, so the increment to R2 by X2 is not (quite) significant.