Chapter
Two:Creating a Data File
This
section explains how to set up a file with new data.After
finishing this chapter, you should be able to create a SPSS data file that
will include 1) the data and 2) some labeling indicating what the data
is about.Also, if you don’t have
complete data for a case such as, if someone didn’t answer a question or
if they chose two answers to a question, you will be able to mark it as
missing so it will be excluded from the analysis.To
illustrate this process, we will use a shortened version of the questionnaire
used by the General Social Survey (GSS) conducted by the National Opinion
Research Center (NORC).For this example,
our students wanted to see if their opinions on social issues were similar
to those of the national sample.More
details can be found by looking at theGeneral
Social Survey codebook. See General Social Survey,
Davis, Smith, and Marsden, 2001.
The
students knew they were not a representative sample, even of college students,
but this questionnaire is an interesting way to learn how to create a new
data file.They decided to use the
following questions:
·What
is your age?
·Are
you male or female?
·What
is your religious preference?
·Generally
speaking, in politics do you consider yourself as conservative, liberal,
middle of the road?
·What
kind of marriage do you think is the more satisfying way of life:one
where the husband provides for the family and the wife takes care of the
house and children or one where both the husband and wife have jobs and
both take care of the house and children?
·Do
you think it should be possible for a pregnant woman to obtain a legal
abortion:
If
there is a strong chance of a serious defect in the baby? [ABDEFECT]
If
she is married and does not want any more children? [ABNOMORE]
If
the woman's own health is seriously endangered by pregnancy? [ABHLTH]
If
the family has a very low income and cannot afford any more children?
[ABPOOR]
If
she became pregnant as a result of rape? [ABRAPE]
If
she is not married and does not want to marry the man? [ABSINGLE]
If
the woman wants it for any reason [ABANY]
Basic
Steps in Creating a Data File
There
are a few things that always need to be done to create a data file.It
is best to start your data file with some careful planning.
1.First
we will want to assign each respondent an identification number, not so
individuals can be identified, but so we can keep track of each case when
we go back to check the accuracy of the data entering.For
each question (variable), we need a variable name that is simple but expresses
something about the variable.SPSS
limits variable names to eight characters or less starting with a letter.Variable
names can contain numbers or letters but not spaces and only a few special
characters are permitted, so don’t use any odd symbols.AGE
and SEX would be easy variable names for the first two questions.For
the questions on abortion, we decided to use the first three characters
of the variable names used by the General Social Survey (in brackets after
each question).We used MG for the
preferred type of marriage and called political orientation C-L.Each
variable name can be given an extended variable label that gives more detail,
and they can use spaces or special characters.For
example, C-L could have a variable label that said Conservative-Liberal.
2.After
we have given each variable a name and label, we give each possible response
to the question a code called a value label that is often the number corresponding
to the order of the answers.(We
could use another system, but this is the easiest because SPSS works best
with numeric codes to represent the data.)For
example, SEX could use 1 for male and 2 for female; C-L could use 1 for
conservative, 2 for liberal, and 3 for middle of the road.These
would be given value labels such as Male, Female, Conservative, Liberal, Middle
of the Road.
3.Sometimes
respondents do not answer a question, give more than one answer, or do
something else that would make their answers unusable.In
our example, respondent #2 marked both yes
and no on the last
question, respondent #3 wrote in none
on question 4, and respondent #13 didn’t answer the marriage question.We
can assign these missing value codes so they don’t mess up the analysis.Often
9is used to indicate missing data or 99 if it
is a two-digit value.(Note that this
would cause problems in the analysis if 9 or 99 were real codes, for example,
if there were 9 possible responses to a question or if age included some
ninety-nine-year-olds.So think carefully
before you choose numbers for missing values.).
It
is a good idea to plan all this carefully.It
is often useful to put the data in a matrix like Table 2.1 before entering
it into the SPSS Data Editor.
Table 2.1. Sample
Data Set: Questionnaire Responses
|
id
|
age
|
sex
|
rel
|
c-l
|
mg
|
abd
|
abn
|
abh
|
abp
|
abr
|
abs
|
aba
|
|
01
|
20
|
1
|
4
|
2
|
2
|
2
|
2
|
1
|
3
|
1
|
2
|
2
|
|
02
|
24
|
2
|
5
|
2
|
2
|
1
|
1
|
1
|
1
|
1
|
1
|
9
|
|
03
|
21
|
2
|
2
|
9
|
2
|
2
|
2
|
2
|
2
|
2
|
2
|
2
|
|
04
|
24
|
2
|
5
|
3
|
2
|
1
|
1
|
1
|
1
|
1
|
1
|
1
|
|
05
|
26
|
2
|
4
|
2
|
2
|
1
|
1
|
1
|
1
|
1
|
1
|
1
|
|
06
|
28
|
2
|
2
|
2
|
2
|
2
|
2
|
1
|
2
|
1
|
2
|
2
|
|
07
|
23
|
1
|
1
|
2
|
2
|
1
|
2
|
1
|
1
|
1
|
2
|
2
|
|
08
|
22
|
2
|
4
|
3
|
1
|
1
|
1
|
1
|
1
|
1
|
1
|
1
|
|
09
|
22
|
1
|
5
|
2
|
2
|
1
|
1
|
1
|
1
|
1
|
1
|
1
|
|
10
|
22
|
2
|
4
|
4
|
2
|
1
|
1
|
1
|
1
|
1
|
1
|
1
|
|
11
|
23
|
1
|
2
|
2
|
1
|
2
|
2
|
1
|
2
|
1
|
2
|
3
|
|
12
|
24
|
2
|
2
|
3
|
2
|
1
|
1
|
1
|
1
|
1
|
1
|
2
|
|
13
|
51
|
2
|
1
|
2
|
9
|
1
|
1
|
1
|
1
|
1
|
1
|
1
|
|
14
|
22
|
2
|
2
|
3
|
2
|
1
|
1
|
1
|
1
|
1
|
1
|
1
|
|
15
|
21
|
2
|
4
|
3
|
2
|
1
|
1
|
1
|
1
|
1
|
1
|
1
|
|
16
|
37
|
1
|
1
|
3
|
2
|
1
|
2
|
1
|
2
|
1
|
2
|
2
|
|
17
|
22
|
2
|
4
|
2
|
2
|
1
|
1
|
1
|
1
|
1
|
2
|
2
|
|
18
|
22
|
2
|
3
|
3
|
2
|
1
|
2
|
1
|
2
|
1
|
2
|
2
|
|
19
|
22
|
2
|
4
|
3
|
2
|
3
|
2
|
1
|
2
|
1
|
1
|
1
|
|
20
|
30
|
2
|
5
|
2
|
2
|
1
|
1
|
1
|
1
|
1
|
1
|
1
|
|
21
|
25
|
2
|
5
|
2
|
2
|
1
|
1
|
1
|
1
|
1
|
1
|
1
|
|
22
|
23
|
1
|
2
|
2
|
2
|
1
|
1
|
1
|
1
|
1
|
1
|
1
|
|
23
|
21
|
1
|
1
|
2
|
1
|
1
|
1
|
2
|
1
|
2
|
1
|
1
|
Getting
Started in SPSS
To
create the data file in SPSS, open SPSS (probably by clicking on the SPSS
icon on the desktop).If it says,
“What would you like to do?”, choose “Type in
data” and click OK, see Figure 2-1.This
opens a matrix similar to a spreadsheet such as Excel.
Figure 2-1
In
our example, the rows will be the cases, i.e., the respondents, and the
columns will be the variables, i.e., the questions.
So,
the upper-left cell will contain the identification number for the first
case and the cells to the right will be data about that case.
The
Data Editor has tabs in the lower-left that let you work with your data
in two ways.
It probably opened in
the Data View mode, if not, click the “Data View” tab at the bottom left
of the SPSS screen.
Notice that it
looks like a spreadsheet. (see Figure 2-2)

Entering
Variable and Value Names and Labels
1.You’ll
be using the first column for the respondents’ ID numbers, so type “1”
into the first cell (don’t type the quotation marks, just the number).If
you click the “Variable View” tab, you can assign variable and value names,
Figure 2-3.
2.Do
that now by clicking “Variable View”, double clicking the “var00001” at
the top left column, typing in ID as we have done, and pressing “Enter".
Go back to “Data View”, and you’ll see that the first column is now titled
“id”.Note that SPSS 11 changes variable
names to lower case.In this text
to differentiate variable names from other terms we use all capital letters
for a variable name.
3.The
second column will be age, so change to the “Variable View”, tab at the
bottom left of the SPSS screen, and in row 2, type age under name and tab
over to Missing.Click the little
gray box,
,
to open the “Missing Values” dialog box, click “Discrete missing values”,
type in 99, and click “OK”, Figure 2-4.(Now,
if someone does not give his/her age, we’ll code it 99—and hope no one
is really 99 years old.If you want
to, you can change back to Data View to see that the column is now headed
age.)
Figure 2-4
4.The
third variable is sex, so type that in the third row under Name.Since
we’re going to use the code 1 for males and 2 for females, we’re going
to need value names.Tab over to
the cell under “values” and click the little gray box to get the “Value
Labels” menu.Type a 1 in the “value”
space and Male in the “value label” space and click “Add”.Then,
type a 2 in the “value” space and Female in the “value label” space, figure
2-5, and click “OK”.
Figure 2-5
Now,
SPSS knows that 1 and 2 in the sex variable are really male and female
respectively.For missing values,
click the little gray box to open the menu, click “Discrete” missing values,
type in 9, and click “OK”.(By now
you’ve noticed that SPSS uses lower-case letters for variable and value
names.If you want your results in
another form, e.g., all caps or the first letter capitalized, use the “variable
label” or “value label” box to take care of that.Type
it exactly the way you want your table labeled when you do the analysis—often
this is like a title with the first letter of each important wordcapitalized.)
5.The
next variable is religion, we’re going to
name it RELIG.Notice that it has
five possibilities—Protestant, Catholic, Jewish, other, and no religion.Go
ahead and work out the variable label, value name and labels, and missing
values just as you did above.You
can refer to the “Codebook for Student Questionnaire” located at the end
of this chapter.So far, your data
file should look like Figure 2-6.
Figure 2-6
6.Continue
entering the variables for the rest of the data set.Some
people, especially those who are used to working with spreadsheets, like
to enter all the data in Data View before they set up the variable names,
etc., so you’ll have to figure out what works best for you.It
is very important to save your work as you go along, so do that now.Click
the “Save” icon or use “Save” under “File”, and give your data file a sensible
name.Notice that SPSS automatically
adds ".sav" at the end ofthe
file name.
7.Enter
the codes for each variable in “Data View”.Then
check the accuracy down each column looking for codes that would be impossible.For
example, sex can have only three of your data file by scanning possibilities
since male is 1, female is 2, and missing information is 9, so a 5 would
be a mistake.The best check is to
have one person read the codes while another checks the entries on the
Data File.
Chapter
Two Exercises
At
California State University, Fresno, the Friendly Visitors Service hires
college students to do in-home care for elderly people so they can remain
independent and stay in their homes as long as possible.The
students do cleaning, yard work, shopping, etc.The
staff begins by interviewing clients in their homes and assessing their
need for services. The following information is used to match the seniors
with the students who want employment:
·Age:Age
at last birthday
·Sex:Male
or Female
·Lives
alone:Yes or No
·Low
income:Yes = Eligible for Supplemental
Security Income (SSI)
·Need
for assistance with the activities of daily living (ADL): Bathing, Dressing,
Toileting, Transferring in/out of bed, Eating
·Total
number of ADLs needing help:
·Need
for assistance with the instrumental activities of daily living (IADL):Using
telephone, Shopping, Preparing food, Light housework, Heavy housework,
Finances
·Total
Number IADLs needing help:
To
keep track of the needs of potential clients, the program could create
a data file and use it in SPSS.(Data
from one month’s new applications is presented in Table 2.3.For
this example, we’ll just use the count of the number of activities for
which the seniors need help, but note that they could include the yes/no
responses for each of the activities of daily living.
Exercise
Idea for Instructors to Set Up:
Sometimes
a university will be willing to provide raw data on the students enrolled
on your campus by age and sex.If
so, it is interesting to get the data for the most recent year and for
five or ten years ago, so students can enter it on an SPSS data file and
use it to learn how to do a variety of statistics with SPSS.
Table2.2Sample
Data Set: Friendly Visitor Service Clients
|
id
|
age
|
sex
|
alone
|
low
income
|
#
ADL
|
#IADL
|
|
001
|
74
|
M
|
N
|
N
|
0
|
4
|
|
002
|
66
|
M
|
N
|
N
|
4
|
6
|
|
003
|
81
|
M
|
N
|
N
|
2
|
5
|
|
004
|
76
|
F
|
N
|
N
|
0
|
4
|
|
005
|
74
|
M
|
N
|
N
|
1
|
5
|
|
006
|
69
|
F
|
N
|
Y
|
0
|
4
|
|
007
|
79
|
F
|
Y
|
N
|
0
|
4
|
|
008
|
80
|
M
|
N
|
Y
|
3
|
6
|
|
009
|
89
|
M
|
N
|
N
|
3
|
5
|
|
010
|
60
|
F
|
Y
|
N
|
2
|
6
|
|
011
|
88
|
F
|
Y
|
N
|
0
|
3
|
|
012
|
82
|
F
|
Y
|
N
|
2
|
4
|
|
013
|
79
|
F
|
Y
|
N
|
1
|
4
|
|
014
|
77
|
M
|
N
|
N
|
3
|
6
|
|
015
|
62
|
M
|
Y
|
N
|
1
|
4
|
|
016
|
83
|
M
|
N
|
N
|
4
|
6
|
|
017
|
80
|
F
|
Y
|
N
|
0
|
2
|
|
018
|
85
|
F
|
N
|
N
|
1
|
4
|
|
019
|
66
|
F
|
Y
|
N
|
1
|
3
|
|
020
|
84
|
M
|
N
|
N
|
4
|
6
|
|
021
|
74
|
F
|
N
|
N
|
4
|
4
|
|
022
|
74
|
M
|
N
|
N
|
0
|
2
|
|
023
|
74
|
F
|
Y
|
N
|
0
|
5
|
|
024
|
92
|
M
|
N
|
N
|
3
|
6
|
|
025
|
66
|
F
|
N
|
N
|
2
|
6
|
Student
Survey Questionnaire
What
is your age? ________
Are
you ____ male or ___ female?
What
is your religious preference?
___
Protestant ___Catholic ___ Jewish ___ Some
other religion ___No religion
Generally
speaking, in politics, do you consider yourself as
___conservative,
___ liberal, __ middle of the road, or
What
kind of marriage do you think is the more satisfying way of life?
___ One
where the husband provides for the family and the wife takes care of the
house and children
___ One
where both the husband and wife have jobs and both take care of the house
and children
Do
you think it should be possible for a pregnant women
to obtain a legal abortion:
If
there is a strong chance of serious defect in the baby? __Yes __
No ___Don’t
Know
If
she is married and does not want any more children? __Yes __ No
___Don’t Know
If
the woman's own health is seriously endangered by pregnancy?
__Yes
__ No ___Don’t
Know
If
the family has a very low income and cannot afford any more children?
__Yes
__ No ___Don't Know
If
she became pregnant as a result of rape? __Yes __ No ___Don’t
Know
If
she is not married and does not want to marry the man? __Yes __No
__ Don’t Know
If
the woman wants it for any reason __Yes __ No ___Don’t Know
Codebook for Student Questionnaire
|
Missing Values
|
9 or 99
|
Age
|
Age at last birthday
|
|
Sex
|
1 = male, 2 = female
|
Religious Preference
|
1 = Protestant, 2 = Catholic,
3 = Jewish, 4 = Other, 5 = No
|
Political
|
1 = Conservative, 2 =
Liberal, 3 = Middle of the road
|
Preferred Marriage
|
1 = Traditional, 2 =
Shared
|
|
Abortion if Birth Defect
|
1= Yes, 2 = No, 3 = Don't
Know
|
|
Abortion if No More Children
|
1= Yes, 2 = No, 3 = Don't
Know
|
Abortion if Health Risk
|
1= Yes, 2 = No, 3 = Don't
Know
|
Abortion if Poor
|
1= Yes, 2 = No, 3 = Don't
Know
|
Abortion if Rape:
|
1= Yes, 2 = No, 3 = Don't
Know
|
Abortion if Not Married:
|
1= Yes, 2 = No, 3 = Don't
Know
|
|
Abortion For Any Reason:
|
1= Yes, 2 = No, 3 = Don't
Know
|
References
James
A. Davis, Smith, Tom W., and Marsden, Peter.2001.General
Social Surveys: 1972-2000. Chicago:NationalOpinionResearchCenter.