Chapter 4: Testing hypotheses with t-statistics

1998; Last Modified 14 August 1998

In this
chapter we examine two procedures for testing whether an observed difference
in averages (means) is statistically significant. The first case looks at the
difference between unemployment rates for blacks and whites. In this example,
we ask whether the observed differences are large enough and systematic enough
to give us a high degree of confidence that unemployment affects the black population
more severely than whites. This form of inquiry is an attempt to rule out the
possibility that the observed differences are solely a reflection of random
variation in unemployment rates for both groups.

The second case
asks a general question about the US's experience with supply side economics
during the 1980s. Supply side policy makers and journalists made extravagant
claims about the positive effects of supply side economics. In particular,
they argued that deep tax cuts and extensive deregulation would improve incentives
for working, investing, and saving. It is well known and widely accepted that
higher savings and investment rates are associated with faster growth in real
GDP and productivity. In its most extreme form, supply siders argued that
the tax cuts would help to shrink the federal deficit. Their flawed reasoning
was based on a serious overestimate of the growth stimulus provided by tax
cuts and deregulation. In simple terms, they argued that when government revenue
becomes a smaller percentage of GDP, the economy grows so much that the dollar
size of revenue is actually more in dollar terms.

In order to examine
these issues, we must conceptualize the economy as a process which generates
many different outcomes. The outcomes are the measured values of the variables
in the dataset. The measured values, however, are not entirely determined
by the systematic operation of our economic system. There are also random
factors that play a role, as well as a certain (unknown) amount of measurement
error. The value of every variable in the data set is a result of all three
of these factors: the systematic processes of the economy, random and uncontrollable
factors external to the economy, and measurement error.

Recognition of
a randomness and measurement error complicates the simple act of comparing
variables. For example, we would like to compare black and white unemployment
rates in order to determine the average difference. We have already calculated
averages for both races, and black rates are higher. The problem, however,
is that we cannot say for certain if there is a systematic component to the
difference given that the higher unemployment rates for blacks could be due
to a couple of years of random events or a couple of years of measurement
errors. Hypothesis tests for a difference in means enables us to test this
possibility. As you might imagine, the procedure depends on both the average
unemployment rates, and the amount of variation they exhibit over time.

We may also want
to compare the values of a single variable measured at different points in
time. For example, the 1980s look different from the 1970s. Deficits were
higher, inflation was lower, real rates of interest were higher, and so forth.
Once again, however, the differences may not be large enough to rule out the
possibility that they are due to measurement error or random, non-repeated
processes. What we really want to know is whether these differences are systematic
enough to give us a high degree of confidence that they cannot be fully explained
by the normal amount of variation which is always occurring.

    4-1
    Paired means: Differences in black and white unemployment rates

In the following,
we are trying to determine if the observed difference in black and white unemployment
rates is large enough and persistent enough so that we can rule out the possibility
that the "true" underlying difference is zero. Formally, let mB represent
the true average rate of unemployment for blacks, and mW the rate
for whites. Our hypothesis is mB = mW, or alternatively,
mB - mW = 0. If we rule this out, then it must be the
case that mB ¹mW, which we will designate our alternative
hypothesis. Formally we call these the null and alternative hypotheses, where
the term "null" conveys the idea of no difference. Symbolically, they can be
written:

H0:
mB = mW,

H1: mB ¹mW,

where H0:
is the symbol for the null hypothesis.

In fact, however,
we never observe the true averages. Instead, we have sample averages which
are based on the available data for a group of years. The sample averages
are subject to measurement error and random variation due to unique events
in particular years. In addition, they are due to the systematic and persistent
factors that determine unemployment rates for each group. The relationship
between the sample and average and the true average is:


Sample average
= x-bar =

m± (t statistic)(standard error of the sample average),

where the standard
error of the sample average is the standard deviation of the unemployment
rate (sur) divided by the square root of the sample size (Ö
n). The t-statistic is the relevant value of a student's t distribution for
n-1 degrees of freedom, and (usually) .025 in each tail. (See a statistics
text for a complete treatment.)

The procedure
for carrying out this test in SPSS is straightforward. We will test three
pairs of unemployment rates, those for black and white men, women, and teens.

    1. Select
      Statistics from the menu bar, choose Compare Means, and Paired Samples
      t test;
    2. Highlight
      bm20u in the variable list box (this clicks it into the Current Selections
      box);
    3. Highlight
      wm20u in the variable list box, and click the arrow to put them into the
      Paired Variables list box;
    4. Do the
      same for bw20u and ww20u;
    5. Do the
      same for btu and wtu;
    6. Click Okay.

The SPSS output
for black and white men is in Table 5. SPSS prints two tables for each
pair of variables. In the upper part of the table, it prints a set of descriptive
statistics, including means, standard deviations, and standard error of the
estimate of the mean (SE Mean). The latter is an estimate of the possible range
for the "true" population mean, given that this is a sample based on 25 observations.
Between the descriptive statistics for bm20u and wm20u, SPSS prints the number
of observations (25), the correlation coefficient (0.949--see Chapter 5), and
a test statistic to determine if bm20u and wm20u are significantly correlated.


Table 5


T-tests for Paired Samples

Variable Number 


of pairs
Corr 2-tail
Sig
Mean SD SE of
Mean
BM20U

11.3147

2.894

0.579
25 0.949 0.000
WM20U

4.9840

1.263


0.253



Paired differences

Mean SD SE of
Mean
t-value df 2-tail
Sig

6.3307

1.742

0.348

18.17

24

0.000


In the second part
of the table, SPSS puts the results of the test H0: mB
= mW. This is the most important information, and the point at which
interpretation of results becomes important. The average difference is 6.3307;
the t-statistic for the test is 18.17. The 2-tail Sig is the probability of
a t-statistic which is 18.17, or larger, in absolute value. To three decimal
places, it has a zero probability. Another way to look at the t-statistic is
as the value of the mean difference (6.3307) when it is transferred to a t-distribution
scale under the assumption that the null hypothesis is true (no difference in
the "true" population mean). Since the t has a zero probability, we can conclude
that there is also a zero probability of getting a sample difference of 6.3307
when the true difference is zero. Hence, we reject the null hypothesis.

What about women?
Is the difference between black and white women significant (i.e. significantly
different from zero)? What about teens? In general, should we reject the idea
that the underlying "true" rates are the same? How confident can you be about
this?

    4-2
    Independent samples: Supply side and the 1980s economy

Proponents of supply
side economics appeared on the scene in the late 1970s, at a time when the traditional
Keynesian consensus was in disarray. Growth had fallen in the 1970s, inflation
had continued to creep up, unemployment rates were consistently higher than
they had been in the 1960s, and Keynesian policy prescriptions seemed to hold
little promise for improving the situation. Compounding these macroeconomic
problems were several microeconomic ones. The US automobile industry experienced
some of its worst years ever and the onslaught of more fuel efficient and reliable
Japanese imports began to swamp Detroit. The US steel industry, consumer electronics,
machine tools, and a number of other traditional manufacturing strengths also
experienced their first real challenge in domestic markets. Some of these industries
disappeared from the US altogether (consumer electronics) while others were
forced to make painful choices in order to restructure over a period of years
(steel).

Given the turmoil
in domestic markets and the macroeconomy, it is not surprising that radical
alternatives to mainstream economic analysis suddenly began to appear. The
supply siders were the most successful of the radical views. They managed
to win the support of an extremely popular president and were blessed (or
cursed) with the opportunity to enact major parts of their program.

During the 1970s,
mainstream conservative economists began to examine the macroeconomic effects
of taxes and regulations. They came up with a number of widely accepted and
credible empirical studies which showed that various taxes and business regulations
had become obstacles to economic growth. The conclusion of many of their studies
was that if these disincentives to work and invest were addressed, then there
would probably be modest improvements in the overall rate of economic growth.
In no way did this body of work support the idea that the much higher rates
of growth of the 1950s and 1960s would return; rather it showed a potential
for relatively modest increases in economic growth.

In the hands
of the supply siders, conservative ideas about taxes and regulation were turned
into a panacea for every economic problem, including inflation, budget deficits,
trade deficits, productivity growth, GDP growth, loss of manufacturing, low
savings and investment, and so on. The key promise they made, however, was
that with a cut in taxes, saving and investment rates would rise. They argued
that when people were allowed to keep a larger piece of future income, they
would work, save, and invest more. The rise in work effort, savings and investment
would raise the rate of growth of GDP and productivity (output per hour worked).

In 1981, President
Reagan took office on the promise that he would enact many of the supply side
proposals. The cornerstone of his policy was an across the board income tax
cut. Legislation was quickly passed cutting everyone's income taxes by 10%
in 1981, 10% in 1982, and 5% in 1983. In addition, he continued the trend
that was begun under his predecessor, President Carter, of deregulating various
sectors of the economy.

We will examine
a number of variables to see if their is any evidence to support the supply
siders' claims. In Chapter 3 we created the variable "is," the share of investment
in GDP. According to the proponents of supply side economics, this variable
should have increased in the 1980s. Similarly, the variable psp, personal
savings as a share of disposable personal income should have risen. The growth
rates of productivity (prod1 and/or prod2) and GDP should have risen and the
size of the average deficit should have shrunk.

In each case,
we can test for the predicted effects by testing the hypothesis that the mean
value (is, psp, GDP growth, productivity growth, deficit as a share of GDP)
for 1970 is different from the 1980 mean. The steps to do this first require
the computation of the variables not already in the data set:

    1. Select
      Transform from the menu bar, then choose Compute . . .;
    2. If you
      have not already done so, create new variables:
    3. growth
      rate of GDP;
    4. deficits/GDP;
    5. growth
      rate of productivity;
    6. investment/GDP;

Use the recode function
to create a marker for the 1970s and 1980s (if you did not do this in the last
chapter).

    1. Select
      Transform from the menu bar, then Recode, and Into Different Variable;
    2. Highlight
      year in the variable list and use the arrow to move it into the Numeric
      Variable -> Output box;
    3. Type sside
      in the Output Variable box and click Change;
    4. Click Old
      and New Values;
    5. In the
      Old Value box, click the Range button and put 1971 and 1980 in the two
      boxes;
    6. In the
      New Value box type 1 and click Add;
    7. Go back
      to the Range boxes and type 1981 and 1990;
    8. In the
      New Value box type 2 and click Add;
    9. Click Continue
      and then click OK.

Test the hypothesis
for each variable,


H0:
m70s = m80s,

H1: m70s ¹m80s,

using the Independent
Samples t test:

    1. Select
      Statistics from the menu bar, choose Compare Means, then Independent Samples
      T-Test;
    2. Highlight
      psp and click the arrow to put it into the Test Variable(s) box;
    3. Do the
      same for the other variables (investment share, rate of growth of GDP
      and productivity, deficits as a share of GDP);
    4. Highlight
      sside and click the arrow to put it into the Grouping Variable box, then
      click Define Groups . . .;
    5. In Group
      1, type 1 and in Group 2, type 2;
    6. Click Continue,
      then OK;

SPSS will perform
t-tests on each variable, comparing the mean value for the 1970s to the mean
for the 1980s. For each variable, there are two tables, one with the means and
standard deviations, and the second with the t value for the tests. Note that
SPSS also automatically performs a test to see if the variances are the same
during the two periods (Levene's test) and calculates separate t values for
each case (equal variances, unequal variances). If the variances are the same,
then the procedure pools all the data from both periods to calculate a pooled
variance. This makes the t-test slightly more powerful if it is valid to pool
the data.

What can you
conclude? Did the growth rate of real GDP increase? Did any of the variables
perform as predicted by supply side politicians? Why do you suppose supply
side theory is ignored by mainstream economists?

    4-3
    Sources
    Krugman, Paul.
    Peddling Prosperity: Economic Sense and Nonsense in the Age of Diminished
    Expectations.
    New York: WW Norton. 1994.

    Krugman is
    a leading American economist who has written an in-depth critique of supply
    side economics that is accessible to non-economists.