Chapter 4 -- Introducing a Control Variable (Multivariate Analysis)





COWI: Chapter 4
Last Modified 15 August 1998

Human
behavior is usually too complicated to be studied with only two variables. Often
we will want to consider sets of three or more variables (called multivariate
analysis
). We will want to consider three or more variables when we have
discovered a relationship between two variables and want to find out 1) if this
relationship might be due to some other factor, 2) how or why these variables
are related, or 3) if the relationship is the same for different types of individuals.

In each situation,
we identify a third variable that we want to consider. This is called the
control or the test variable. (Although it is possible to use
several control variables simultaneously, we will limit ourselves to one control
variable at a time.) To introduce a third variable, we identify the control
variable and separate the cases in our sample by the categories of the control
variable. For example, if the control variable is age divided into these two
categories--younger and older, we would separate the cases into two groups.
One group would consist of individuals who are younger and the other group
would be those who are older. We would then obtain the crosstabulation of
the independent and dependent variables for each of these age groups. Since
there are two categories in this control variable, we obtain two partial
tables
, each containing part of the original sample. (If there were three
categories in our control variable, for example, young, middle aged, and old,
we would have three partial tables.)

The process of
using a control variable in the analysis is called elaboration and
was developed at Columbia University by Paul Lazarsfeld and his associates.
There are several different types of outcomes to the elaboration process.
We will discuss each briefly.

Table
3.3
showed that females were more likely than males to say they were willing
to vote for a woman. Let's introduce a control variable and see what happens.
In this example we are going to use age as the control variable.

Table 4.1 is
the three-variable table with voting preference as the dependent variable,
sex as the independent variable, and age as the control variable. When we
look at the older respondents (the left-hand partial table), we discover that
this partial table is very similar to the original two-variable table (Table
3.3). The same is true for the younger respondents (the right-hand partial
table). Each partial table is very similar to the original two-variable table.
This is often referred to as replication because the partial tables
repeat the original two-variable table (see Babbie 1997: 393-396). It is not
necessary that they be identical; just that each partial table be basically
the same as the original two-variable table. Our conclusion is that age is
not affecting the relationship between sex and voting preference. In other
words, the difference between males and females in voting preference is not
due to age.


Table
4.1 -- Voting Preference by Sex Controlling for Age


Older

Younger

Male



Female



Total



Male



Female



Total


Voting
Preference
Willing
to Vote for a Woman

43.8 

56.1 

49.0 

44.2 

55.8 

52.9 
Not
Willing to Vote for a Woman

56.2 

43.9 

51.0 

55.8 

44.2 

47.1 

100.0 

100.0 


100.0 

100.0 


100.0 

100.0 


(240) 

(180) 

(420) 

(120) 

(360) 

(480) 

Since
this is a hypothetical example, imagine a different outcome. Suppose we introduce
age as a control variable and instead of getting Table 4.1, we get Table 4.2.
How do these two tables differ? In Table 4.2, the percentage difference between
males and females has disappeared in both of the partial tables. This is called
explanation because the control variable, age, has explained away the
original relationship between sex and voting preference. (We often say that
the relationship between the two variables is spurious, not genuine.)
When age is held constant, the difference between males and females disappears.
The difference in the relationship does not have to disappear entirely, only
be reduced substantially in each of the partial tables. This can only occur
when there is a relationship between the control variable (age) and each of
the other two variables (sex and voting preference).

Next, we are
interested in how or why the two variables are related. Suppose females are
more likely than males to vote for a woman and that this difference cannot
be explained away by age or by any other variable we have considered. We need
to think about why there might be such a difference in the preferences of
males and females. Perhaps females are more often liberal


Table
4.2 -- Voting Preference by Sex Controlling for Age


Older

Younger

Male 




Female 




Total 



Male 




Female 



Total 



Voting
Preference
Willing
to Vote for a Woman

32.9 


33.9 

33.3 


65.8 

66.9 


66.7 
Not
Willing to Vote for a Woman

67.1 

66.1 

66.7 

34.2 

33.1 

33.3 

100.0 

100.0 


100.0 

100.0 


100.0 

100.0 


(240) 

(180) 

(420) 

(120) 

(360) 

(480) 

than males,
and liberals are more likely to say they would vote for a woman. So we introduce
liberalism/conservatism as a control variable in our analysis. If females are
more likely to support a woman because they are more liberal, then the difference
between the preferences of men and women should disappear or be substantially
reduced when liberalism/conservatism is held constant. This process is called
interpretation because we are interpreting how one variable is related
to another variable. Table 4.3 shows what we would expect to find if females
supported the woman because they were more liberal. Notice that in both partial
tables, the differences in the percentages between men and women has disappeared.
(It is not necessary that it disappears entirely, but only that it is substantially
reduced in each of the partial tables.)


Table
4.3 -- Voting Preference by Sex Controlling for Liberalism/Conservatism


Conservative

Liberal

Male 




Female 




Total 



Male 




Female 



Total 



Voting
Preference
Willing
to Vote for a Woman

32.9 


33.9 

33.3 


65.8 

66.9 


66.7 
Not
Willing to Vote for a Woman

67.1 

66.1 

66.7 

34.2 

33.1 

33.3 

100.0 

100.0 


100.0 

100.0 


100.0 

100.0 


(240) 

(180) 

(420) 

(120) 

(360) 

(480) 

Finally,
let's focus on the third of the situations outlined at the beginning of this
section--whether the relationship is the same for different types of individuals.
Perhaps the relationship between sex and voter preference varies with other
characteristics of the individuals. Maybe among whites, females are more likely
to prefer women candidates than the males are, but among blacks, there is little
difference between males and females in terms of voter preference. This is the
outcome shown in Table 4.4. This process is called specification because
it specifies the conditions under which the relationship between sex and voter
preference varies.

In the earlier
section on bivariate analysis, we discussed the use of chi square. Remember
that chi square is a test of independence used to determine if there is a
relationship between two variables. Chi square is used in multivariate analysis
the same way it is in bivariate analysis. There will be a separate value of
chi square for each partial table in the multivariate analysis. You should
keep a number of warnings in mind. Chi square assumes that the expected frequencies
for each cell are five or larger. As long as 80% of these expected frequencies
are five or larger and no single expected frequency is very small, we don't
have to worry. However, the expected frequencies often drop below five when
the number of cases in a column or row gets too small. If this should occur,
you will have to either recode (i.e., combine columns or rows) or eliminate
a column or row from the table.

Table
4.4 -- Voting Preference by Sex Controlling for Race

White

African
American

Male 




Female 



Total 




Male 



Female 




Total 


Voting
Preference
Willing
to Vote for a Woman

42.9 

56.5 

51.2 

50.0 

50.0 

50.0 
Not
Willing to vote for a Woman

57.1 

43.5 

48.8 

50.0 

50.0 

50.0 

100.00 

100.00 


100.00 

100.00 


100.00 

100.00 


(310) 

(490) 

(800) 

(50) 

(50) 

(100) 

Another
point to keep in mind is that chi square is affected by the number of cases
in the table. With a lot of cases it is easy to reject the null hypothesis of
no relationship. With a few cases, it can be quite hard to reject the null hypothesis.
Also, consider the percentages within the table. Look for patterns. Do not rely
on any single piece of information. Look at the whole picture.

We have concentrated
on crosstabulation and chi square. There are other types of statistical analysis
such as regression and log-linear analysis. When you have mastered these techniques,
look at some other types of analysis.



REFERENCES
AND SUGGESTED READING

Methods of
Social Research

  • Riley, Matilda
    White. 1963. Sociological Research I: A Case Approach. New York:
    Harcourt, Brace and World.

Survey Research
and Sampling

  • Babbie, Earl
    R. 1990. Survey Research Methods (2nd Ed.). Belmont, CA:
    Wadsworth.
  • Babbie, Earl
    R. 1997. The Practice of Social Research (8th Ed.). Belmont,
    CA: Wadsworth.

Statistical Analysis

  • Knoke,
    David, and George W. Bohrnstedt. 1991. Basic Social Statistics.
    Itesche, IL: Peacock.
  • Riley, Matilda
    White. 1963. Sociological Research II Exercises and Manual.
    New York: Harcourt, Brace & World.
  • Norusis,
    Marija J. 1997. SPSS 7.5 Guide to Data Analysis. Upper Saddle River,
    New Jersey: Prentice Hall.

Elaboration and
Causal Analysis

  • Hirschi,
    Travis and Hanan C. Selvin. 1967. Delinquency Research--An Appraisal
    of Analytic Methods
    . New York: Free Press.
  • Rosenberg,
    Morris. 1968. The Logic of Survey Analysis. New York: Basic Books.

Data Sources

  • The Field
    Institute. 1985. California Field Poll Study, July, 1985. Machine-readable
    codebook.
  • The Field
    Institute. 1991. California Field Poll Study, September, 1991.
    Machine-readable codebook.
  • The Field
    Institute. 1995. California Field Poll Study, February, 1995. Machine-readable
    codebook.