*1998; Last Modified 14 August 1998*

Regression,

like correlation, does not determine causation. Its strength is that unlike

correlation, it measures the parameters of the association. That is, correlation

can show that disposable income and household consumption move together, but

regression measures the amount by which consumption will increase when disposable

income rises by a dollar. Because regression goes beyond correlation to a measurement

of the size of the affect of one variable on another, it is the favorite statistical

technique in empirical economics.In the example

of regression just cited there is an implicit assumption about causation,

even though neither regression nor correlation can prove it. Economists assume

that changes in disposable income cause changes in consumption (although we

allow that the reverse is true as well, at least in the aggregate). In the

regression statistical procedure, is assumed that one variable is dependent

(consumption) and the other is independent (disposable income). This assumption

tells us about the researcher’s intuition, or the theory in use, but it cannot

be validated or invalidated with the regression procedure. To repeat, whatever

we think we know about causation must come from theories and histories and

other pieces of information besides regression.

Chart 10

in Chapter 5 plotted the pairs of values for consumption and disposable income

for each year, 1929-1996. In the economics literature, this is known as theconsumption function.Economists theorize that consumption is largely

determined by disposable income (after tax income). Algebraically, we can write

the consumption function in general functional notation:C = f(Y

^{d}),where C is consumption

and Y^{d}is disposable income. The plot of c and dp1 in Chart 10

revealed that the relationship was linear, so we can convert the general functional

notation into a specific, linear functional form:

C = c_{0}

+ c_{1}Y^{d}.In this form,

the consumption function is a straight line, with intercept c_{0}

and slope c_{1}. In economics, the intercept, c_{0}, is calledautonomous consumptionsince it is independent of (autonomous from)

disposable income. The slope, c_{1}, measures the rate of change in

consumption given a change in Y^{d}. For example, if Y^{d}

increases by $1, then C changes by (c_{1})*($1) = c_{1}.Let D stand for

the change in a variable, so D Y^{d}is read as "the change in disposable

income." Then, if Y^{d}changes by D Y^{d}, the change in

C (D C) is c_{1}D Y^{d}:C = c

_{0}

+ c_{1}Y^{d}_{,}, and D C = c_{1}D Y^{d},so that c

_{1 }= D C /_{}D Y^{d}= themarginal= MPC

propensity to consumeAutonomous consumption

and the marginal propensity to consume are theparametersof the linear

consumption function. Mathematically, they are the intercept and slope of

a line that describes the relationship between disposable income and consumption.

Regression analysis is an exercise in estimating their values, but before

we do regression, we have to take into account one more element of every regression

model.The linear consumption

function, C = c_{0}+ c_{1}Y^{d}, is adeterministic. It allows no room for variation away from the relationship. Once

model

Y^{d}is known, C is completely determined (given the parameters c_{0}

and c_{1}). In fact, the consumption function describes a tendency,

not a mathematically fixed relationship. The relationship between consumption

and disposable income isprobabilistic, or to say the same thing with

a 5 dollar word, it is astochastic relationship.Stochastic relationships

are not fixed like deterministic relationships, there is always margin for

variation away from the general tendency. Therefore, if the consumption function

describes the deterministic relationship, we need to add a term to let the

actual behavior of consumption in the actual economy deviate from the value

predicted by disposable income:C = c

_{0}

+ c_{1}Y^{d}+ e,where e is a

random error term.On average, e is zero, so C = c_{0}+ c_{1}

Y^{d}, but in any given year, C could be more than predicted (e >

0), or less than predicted (e < 0). Graphically, the inclusion of a random

error terms allows for the possibility that the scatter points of the consumption

function do not fall on a straight line.With the regression

procedure in SPSS we can compute the values of the parameters c_{0}

and c_{1}.

- Select

Statistics from the menu bar, choose Regression, then select Linear .

. .;- Highlight

c in the variable list, and click the arrow to put it into the Dependent

box;- Highlight

dp1 in the variable list, and click the arrow to put it into the Independent(s)

box;- Click OK.
The results are

inTable 6where I have divided them into 3 parts. In each part, the

most important numbers are in bold. Part 1 has 1 number (Adj. R Square =.99964),

Part 2 has none, and Part 3 has several. One of the keys to using SPSS or any

statistical package, is to not become overwhelmed by the amount of output it

generates; the trick to that is to know what you can ignore, at least initially.

As you become more skilled, you will find uses for the things we are going to

ignore for now.Table 6Elements of Regression Output

Part

1Multiple

R0.99982 R

Square0.99965 Adj.

R Square0.99964Standard

Error27.4352 Part

2Analysis

of VarianceDF Sum

of SquaresMean

SquareRegression 1 141122061.6 141122061.6 Residual 66 49677.6 752.7 F=187489.9 Signif

F=0.000Part

3VariableBSE

BBeta TSig

TDP10.9188150.0021220.99984 433.0010.000(Constant)-12.331774.243069-2.906 0.0050

Part 1 provides

4 measures ofgoodness of fit.These are statistics that tell how well

the data fits the model. R Square and its adjustment, Adj. R Square, can be

interpreted as the percentage of the variation in the dependent variable that

is explained by the independent variable. In our model, C is the dependent variable

and Y^{d}is independent, so movements in Y^{d}explain nearly

all (>99%) of the movement in C. There is no threshold for the R squared or

adjusted R squared where they go from bad to good, but by any criteria, our

model explains nearly all the variation in C.Adjusted R square

is an adjustment to R square (duh!) that takes into account the number of

independent variables. Since we only have one, Y^{d}, the two are

close in value. The adjusted R squared of 0.99964 looks too good to be true

and it probably is; for various technical reasons, some statistical, some

economic, it makes the model look better than it is. (Two reasons: autocorrelation,

and nominal data.) It pays to be skeptical, even (especially) when things

look great.Part 2 provides

a number of statistics that are grouped together under the subject ofanalysis. Basically, Part 2 provides measures that break down the variation

of variance

in C and attribute the different parts to the deterministic part of the model

(c_{0}+ c_{1}Y^{d}) and the stochastic part (e).

These are useful measures in more advanced routines, but they are unnecessary

at this point.Part 3 is the

core of the output. Part 3 has the estimated values of c_{0}(-12.33)

and c_{1}(0.9188). These are in the column labeled B. The next column,

SE B, is the standard error of the estimates (0.002122 for c_{1},

and 4.243069 for c_{0}.) These are measures of the precision of our

estimates of c_{0}and c_{1}. The smaller the standard errors,

the more precise are our estimates. The column labeled Beta can be ignored,

but the following column, T, has important information. T is the value of

the t-statistic that is constructed to test the hypothesis that the "true"

values of c_{0}and c_{1}are zero. Let the unobserved true

values be symbolized with Greek letters, b_{0}and b_{1}.

We want to test the hypotheses:H

_{0}:

b_{0}= 0 versus H_{1}: b_{0}¹ 0, and

H_{0}: b_{1}= 0 versus H_{1}: b_{1}¹

0.If we accept

the second null, then it means that disposable income has no affect on consumption.

Since this is one of the primary reasons for doing regression (i.e., to see

if disposable income affects consumption, and if so, how much), every statistical

package automatically turns out a t-statistic to test this hypothesis. The

formula for the t-statistic is:

t-value = (c_{1}

- value in null hypothesis)/(standard error of the estimate) =

(B - 0)/(SE B) = (0.9188 - 0)/(0.002122) = 433.The last column

of the SPSS printout in Part 3 is labeled Sig T. It is the probability of

the t-statistic, which is also the probability of getting the data in the

dataset when the null hypothesis is true (H_{0}: b_{1}= 0).

Since the probability (to four decimal places) of getting a sample value,

c_{1}, that is 0.9188 with a standard error of 0.002122, is 0, we

should reject the null hypothesis.Let’s try another

regression. (This section and the following borrow heavily from Blanchard, 1997.)

Economists have long known that increases in the rate of growth of GDP enables

more unemployed people to finds jobs. It was not until the 1960s and the work

of Arthur Okun that that this general relationship between unemployment and

GDP growth was estimated empirically. The question is simple and basic: If GDP

falls by 1%, how much does the unemployment rate change? In order to answer

this, we have to compute the change in the unemployment rate, ur - lag(ur).

We will call this variable Dur, where the Greek letter delta, D , indicates

change. We also have to compute the percent change in real GDP, which requires

two steps (you may already have done this). First, use the GDP deflator (gdpdef)

and GDP to compute real GDP. Second, compute the percentage change in real GDP,

and give it a name. Then you are ready to run the regression.The estimated

equation is

Dur = 1.274321

- 0.361428(Percent change in real GDP).The interpretation

of this relationship is that, on average, each 1% increase in the rate of

growth of real GDP, reduces the unemployment rate by 0.36 percent. You should

check the goodness of fit statistics, R square and adjusted R square, and

the t-statistics for the slope and the intercept. Follow the procedure outlined

for the consumption function.The implications

of Okun’s Law are that output must grow by about 3.5% per year (1.27/0.36)

just to keep unemployment from rising. Why? The answer is that the labor force

grows about 1% a year (check this), so output has to grow at about the same

speed to provide enough new jobs. Second, labor productivity (output per hour

worked, prod1 in the dataset) grows at about 2.3 percent a year (check this)

so even if no new jobs are created, output goes up 2.3 percent. Put these

two forces together, and real GDP has to grow over 3 percent a year on average

just to keep the unemployment rate from going up. Because of this relationship,

many economists view "normal" economic growth as approximately 3-3.5%.Okun’s Law has

also been used to try to measure the costs of unemployment to the national

economy. When unemployment holds constant (Dur = 0), real GDP grows about

3.5%. Now solve for the percent change in real GDP if unemployment rises by

1 percentage point (D ur = 1):1 = 1.2743 -

0.36142(Percent change in real GDP)

Þ Percent change in real GDP = 0.75.When GDP growth

falls from 3.5% to 0.75%, we lose about 2.75 percent of potential GDP. Given

that our GDP is roughly 8,000 billion in nominal terms, a loss of 2.75 percent

represents a loss of about $220 billion (0.0275*8,000). In other words, each

1% increase of unemployment costs the US economy around $220 billion in lost

output.The Phillips curve

was one of the key economic discoveries of post-World War II macroeconomics.

Recall that the curve showed a regular relationship between inflation and unemployment.

This seemed to give policymakers a set of inflation-unemployment tradeoffs they

could choose. If inflation was too high, then use Keynesian policies to slowdown

the economy-- unemployment would rise, but the amount was predictable and did

not vary. If unemployment was too high, then do the opposite--inflation would

rise, but again it was predictable and invariant.To see the Phillips

relationship that economists in the 1950s and 1960s worked with, we should

omit the data from the 1930s and World War II. In addition, since the relationship

broke down in the 1970s, we will work with data limited to 1948-1969. Algebraically,

the relationship can be expressed as

p_{t}

= b_{0}+ b_{1}u_{t}+ e_{t},where p

_{t}

is inflation in year t, b_{0}is the intercept of the regression line,

b_{1}is the slope parameter which is expected to be negative, u_{t}

is the unemployment rate in year t, and e_{t}is the random error

terms that measures deviations from the average relationship.In the data set,

the unemployment rate is variable ur, and the inflation rate is a computed

variable that is the percentage change in the CPI. We calculated this in several

earlier exercises.

- Select

Data from the menu bar, then Select Cases. . .;- Click the

button for Based on time or case range, then click Range;- In the

boxes type 1948 and 1969;- Click OK.
Now run the regression

using your inflation variable as the dependent variable and ur for the independent

variable. You should get

p_{t}

= 6.917 - 0.987u_{t}+ e_{t},This is the relationship

that broke down during the 1970s. To see this, change Select Cases to the

years 1970 to 1996 and re-run the regression. Look at the R squared. Does

u_{t}explain anything about inflation? Is the sign on u_{t}

what you expected (i.e., is your estimate of b_{1}negative)? Is it

significantly different from zero? That is, do you accept or reject the null

hypothesis H_{0}: b_{1}= 0?Needless to say,

most economists were puzzled by this. As early as the mid-1970s it was apparent

that the Phillips relation no longer worked. What could have gone wrong? The

answer was waiting in the wings in the form of a earlier prediction made by

Milton Friedman. Friedman had argued that as soon as people changed their

expectations about inflation, the Phillips curve would breakdown. Friedman’s

point was that inflation partly depended on what people expected it to be.

If everyone thought it was going to be high, then workers would demand wage

increases, and businesses would expect higher costs, so they would raise their

prices. The net result would be inflation--in part because everyone expected

it and acted to protect themselves by raising their wage demands and their

prices.Until the late

1960s, prices seemed to have no trend; they were about as likely to fall as

they were to rise. Consequently, it made sense to expect zero inflation since

that was close to the long term average. In the late 1960s and early 1970s,

this changed. Inflation was ratcheted up by a combination of events--the Vietnam

War, domestic spending for the War on Poverty, bad harvests in the early 1970s,

and, in 1973, the first oil crisis. Households and businesses began to expect

that inflation would not be zero, facts bore out the correctness of this view,

and the inflation rate rose. Friedman’s arguments led economists to the "expectations

augmented" Phillips curve, which is just the old Phillips curve with another

variable, expected inflation, on the right hand side:p

_{t}

= b_{0}+ p^{e}+ b_{1}u_{t}+ e_{t},where p

^{e}

is the expected rate of inflation. The old Phillips curve is a variety of

this one in which p^{e}is zero. Here, p^{e}is expected to

be positive, so that for a given unemployment rate, inflation is higher by

that amount.The obvious question

is whether or not this can be measured. That is, how do we know (measure)

the expected rate of inflation? Friedman’s answer was to point out that most

of us use the recent past to form our expectations about the future. For example,

will it be hot or cold today? When my kids ask me that in the morning, I always

tell them that it will be just like yesterday. (Of course, I could look it

up in the weather section of the morning paper, and sometimes I do if there

is reason to believe the weather might be changing. Looking it up--seeking

additional information--is the rational thing to do and conforms to the economic

idea ofrational expectations. It is forward looking and incorporates

all readily available information that is not too costly to obtain. Friedman’s

idea--today is like yesterday--is calledadaptive expectations.)For the sake

of simplicity, we assume that our expectation of inflation today is that it

will be like last period’s rate. Algebraically,p

^{e}

= p_{t-1},where p

_{t-1}

is the inflation rate in year t-1 (i.e., last year if this is year t). Using

this notation, we can re-write the expectations augmented Phillips curve as

p_{t}

= b_{0}+ p_{t-1}+ b_{1}u_{t}+ e_{t},or, moving the

expected inflation term to the left:

p_{t}

- p_{t-1}= b_{0}+ b_{1}u_{t}+ e_{t},which we can

easily estimate for 1970 to 1996. After selecting the years 1970 to 1996,

and computing a new variable p_{t}- p_{t-1}, re-run the regression.

You should get

p_{t}

- p_{t-1}= 7.078 - 1.085u_{t}+ e_{t.}Notice the similarity

to the regression for 1948 to 1969.This regression

has many uses in policy making. For example, it implies that if unemployment

is too low, the left hand side will be positive and inflation will be accelerating

(p_{t}> p_{t-1}). Economists have a special fondness the

rate of unemployment that keeps inflation from rising. Note that this is not

the same thing as zero inflation. The unemployment rate that prevails when

p_{t}- p_{t-1}equals 0 is known as (get ready!) thenon-accelerating, or the

inflation rate of unemploymentNAIRU. A prettier but misleading

name for it is thenatural rate of unemployment.What is the natural

rate? Set the above equation equal to zero, and solve:0 = 7.078 -

1.085u_{t},or u

_{t}

= 6.5. Anything less, and inflation is supposed to increase; anything more

and it decreases. Unemployment is currently less than 5%, so you can guess

why the Federal Reserve and inflation hawks are nervous. Inflation should

be ratcheting up, but it is not. We don’t know why, and the debate rages on

among economists. It is clear, however, that the natural rate of unemployment,

or the NAIRU, changes over time. It seems to have fallen in the 1990s, but

no one can say how low. The data is not loud and clear enough for us to be

certain.