Chapter 3 -- Introducing a Control Variable (Multivariate Analysis)

Last Modified 15 August 1998
Human behavior is usually too complicated to be studied with only two variables. Often we will want to consider sets of three or more variables (called multivariate analysis). We will want to consider three or more variables when we have discovered a relationship between two variables and want to find out 1) if this relationship might be due to some other factor, 2) how or why these variables are related, or 3) if the relationship is the same for different types of individuals.

In each situation, we identify a third variable that we want to consider. This is called the control or the test variable. (Although it is possible to use several control variables simultaneously, we will limit ourselves to one control variable at a time.) To introduce a third variable, we identify the control variable and separate the cases in our sample by the categories of the control variable. For example, if the control variable is age divided into these two categories--younger and older, we would separate the cases into two groups. One group would consist of individuals who are younger and the other group would be those who are older. We would then obtain the crosstabulation of the independent and dependent variables for each of these age groups. Since there are two categories in this control variable, we obtain two partial tables, each containing part of the original sample. (If there were three categories in our control variable, for example, young, middle aged, and old, we would have three partial tables.)

The process of using a control variable in the analysis is called elaboration and was developed at Columbia University by Paul Lazarsfeld and his associates. There are several different types of outcomes to the elaboration process. We will discuss each briefly.

Table 2.3 showed that females were more likely than males to say they were willing to vote for a woman. Let's introduce a control variable and see what happens. In this example we are going to use age as the control variable.

Table 3.1 is the three-variable table with voting preference as the dependent variable, sex as the independent variable, and age as the control variable. When we look at the older respondents (the left-hand partial table), we discover that this partial table is very similar to the original two-variable table (Table 2.3). The same is true for the younger respondents (the right-hand partial table). Each partial table is very similar to the original two-variable table. This is often referred to as replication because the partial tables repeat the original two-variable table (see Babbie 1997: 393-396). It is not necessary that they be identical; just that each partial table be basically the same as the original two-variable table. Our conclusion is that age is not affecting the relationship between sex and voting preference. In other words, the difference between males and females in voting preference is not due to age.

Table 3.1 -- Voting Preference by Sex Controlling for Age
 
Older
Younger
 
Male 
Female 
Total 
Male 
Female 
Total 
Voting Preference            
Willing to Vote for a Woman 43.8  56.1  49.0  44.2  55.8  52.9 
Not Willing to Vote for a Woman 56.2  43.9  51.0  55.8  44.2 
 
100.0 
100.0 
100.0 
100.0 
100.0 
100.0 
 
(240) 
(180) 
(420) 
(120) 
(360) 
(480) 
Since this is a hypothetical example, imagine a different outcome. Suppose we introduce age as a control variable and instead of getting Table 2.1, we get Table 3.2. How do these two tables differ? In Table 3.2, the percentage difference between males and females has disappeared in both of the partial tables. This is called explanation because the control variable, age, has explained away the original relationship between sex and voting preference. (We often say that the relationship between the two variables is spurious, not genuine.) When age is held constant, the difference between males and females disappears. The difference in the relationship does not have to disappear entirely, only be reduced substantially in each of the partial tables. This can only occur when there is a relationship between the control variable (age) and each of the other two variables (sex and voting preference).

Next, we are interested in how or why the two variables are related. Suppose females are more likely than males to vote for a woman and that this difference cannot be explained away by age or by any other variable we have considered. We need to think about why there might be such a difference in the preferences of males and females. Perhaps females are more often liberal

Table 3.2 -- Voting Preference by Sex Controlling for Age
 
Older
Younger
 
Male % 
Female % 
Total % 
Male % 
Female % 
Total % 
Voting Preference            
Willing to Vote for a Woman
32.9 
33.9 
33.3 
65.8 
66.9 
66.7 
Not Willing to Vote for a Woman
67.1 
66.1 
66.7 
34.2 
33.1 
33.3 
 
100.0
100.0 
100.0 
100.0 
100.0 
100.0 
 
(240) 
(180) 
(420) 
(120) 
(360) 
(480) 
than males, and liberals are more likely to say they would vote for a woman. So we introduce liberalism/conservatism as a control variable in our analysis. If females are more likely to support a woman because they are more liberal, then the difference between the preferences of men and women should disappear or be substantially reduced when liberalism/conservatism is held constant. This process is called interpretation because we are interpreting how one variable is related to another variable. Table 3.3 shows what we would expect to find if females supported the woman because they were more liberal. Notice that in both partial tables, the differences in the percentages between men and women has disappeared. (It is not necessary that it disappears entirely, but only that it is substantially reduced in each of the partial tables.)
Table 3.3 -- Voting Preference by Sex Controlling for Liberalism/Conservatism
 
Older
Younger
 
Male % 
Female % 
Total % 
Male % 
Female % 
Total % 
Voting Preference            
Willing to Vote for a Woman 32.9  33.9  33.3  65.8  66.9  66.7 
Not Willing to Vote for a Woman 67.1  66.1  66.7  34.2  33.1  33.3 
  100.0  100.0  100.0  100.0  100.0  100.0 
  (240)  (180)  (420)  (120)  (360)  (480) 
Finally, let's focus on the third of the situations outlined at the beginning of this section--whether the relationship is the same for different types of individuals. Perhaps the relationship between sex and voter preference varies with other characteristics of the individuals. Maybe among whites, females are more likely to prefer women candidates than the males are, but among blacks, there is little difference between males and females in terms of voter preference. This is the outcome shown in Table 3.4. This process is called specification because it specifies the conditions under which the relationship between sex and voter preference varies.

In the earlier section on bivariate analysis, we discussed the use of chi square. Remember that chi square is a test of independence used to determine if there is a relationship between two variables. Chi square is used in multivariate analysis the same way it is in bivariate analysis. There will be a separate value of chi square for each partial table in the multivariate analysis. You should keep a number of warnings in mind. Chi square assumes that the expected frequencies for each cell are five or larger. As long as 80% of these expected frequencies are five or larger and no single expected frequency is very small, we don't have to worry. However, the expected frequencies often drop below five when the number of cases in a column or row gets too small. If this should occur, you will have to either recode (i.e., combine columns or rows) or eliminate a column or row from the table.

Table 3.4 -- Voting Preference by Sex Controlling for Race
 
White
African American
 
Male % 
Female % 
Total % 
Male % 
Female % 
Total % 
Voting Preference            
Willing to Vote for a Woman
42.9 
56.5 
51.2 
50.0 
50.0 
50.0 
Not Willing to vote for a Woman
57.1 
43.5 
48.8 
50.0 
50.0 
50.0 
  100.00  100.00 
100.00 
100.00  100.00  100.00 
 
(310) 
(490) 
(800) 
(50) 
(50)
(100) 
Another point to keep in mind is that chi square is affected by the number of cases in the table. With a lot of cases it is easy to reject the null hypothesis of no relationship. With a few cases, it can be quite hard to reject the null hypothesis. Also, consider the percentages within the table. Look for patterns. Do not rely on any single piece of information. Look at the whole picture.

We have concentrated on crosstabulation and chi square. There are other types of statistical analysis such as regression and log-linear analysis. When you have mastered these techniques, look at some other types of analysis.


REFERENCES AND SUGGESTED READING

Methods of Social Research

  • Riley, Matilda White. 1963. Sociological Research I: A Case Approach. New York: Harcourt, Brace and World.
Survey Research and Sampling
  • Babbie, Earl R. 1990. Survey Research Methods (2nd Ed.). Belmont, CA: Wadsworth.
  • Babbie, Earl R. 1997. The Practice of Social Research (8th Ed.). Belmont, CA: Wadsworth.
 Statistical Analysis
  • Knoke, David, and George W. Bohrnstedt. 1991. Basic Social Statistics. Itesche, IL: Peacock.
  • Riley, Matilda White. 1963. Sociological Research II Exercises and Manual. New York: Harcourt, Brace & World.
  • Norusis, Marija J. 1997. SPSS 7.5 Guide to Data Analysis. Upper Saddle River, New Jersey: Prentice Hall.
Elaboration and Causal Analysis
  • Hirschi, Travis and Hanan C. Selvin. 1967. Delinquency Research--An Appraisal of Analytic Methods. New York: Free Press.
  • Rosenberg, Morris. 1968. The Logic of Survey Analysis. New York: Basic Books.
Data Sources
  • The Field Institute. 1985. California Field Poll Study, July, 1985. Machine-readable codebook.
  • The Field Institute. 1991. California Field Poll Study, September, 1991. Machine-readable codebook.
  • The Field Institute. 1995. California Field Poll Study, February, 1995. Machine-readable codebook.