Chapter 2: Creating a Data File

Chapter 2
Chapter Two:Creating a Data File

This section explains how to set up a file with new data.After finishing this chapter, you should be able to create a SPSS data file that will include 1) the data and 2) some labeling indicating what the data is about.Also, if you don’t have complete data for a case such as, if someone didn’t answer a question or if they chose two answers to a question, you will be able to mark it as missing so it will be excluded from the analysis.To illustrate this process, we will use a shortened version of the questionnaire used by the General Social Survey (GSS) conducted by the National Opinion Research Center (NORC).For this example, our students wanted to see if their opinions on social issues were similar to those of the national sample.More details can be found by looking at theGeneral Social Survey codebook. See General Social Survey, Davis, Smith, and Marsden, 2001.

The students knew they were not a representative sample, even of college students, but this questionnaire is an interesting way to learn how to create a new data file.They decided to use the following questions:

·What is your age?

·Are you male or female?

·What is your religious preference?

·Generally speaking, in politics do you consider yourself as conservative, liberal, middle of the road?

·What kind of marriage do you think is the more satisfying way of life:one where the husband provides for the family and the wife takes care of the house and children or one where both the husband and wife have jobs and both take care of the house and children?

·Do you think it should be possible for a pregnant woman to obtain a legal abortion:

If there is a strong chance of a serious defect in the baby? [ABDEFECT]]

If she is married and does not want any more children? [ABNOMORE]

If the woman's own health is seriously endangered by pregnancy? [ABHLTH]

If the family has a very low income and cannot afford any more children? [ABPOOR]

If she became pregnant as a result of rape? [ABRAPE]

If she is not married and does not want to marry the man? [ABSINGLE]

If the woman wants it for any reason [ABANY]

Basic Steps in Creating a Data File

There are a few things that always need to be done to create a data file.It is best to start your data file with some careful planning.

1.First we will want to assign each respondent an identification number, not so individuals can be identified, but so we can keep track of each case when we go back to check the accuracy of the data entering.For each question (variable), we need a variable name that is simple but expresses something about the variable.SPSS limits variable names to eight characters or less starting with a letter.Variable names can contain numbers or letters but not spaces and only a few special characters are permitted, so don’t use any odd symbols.AGE and SEX would be easy variable names for the first two questions.For the questions on abortion, we decided to use the first three characters of the variable names used by the General Social Survey (in brackets after each question).We used MG for the preferred type of marriage and called political orientation C-L.Each variable name can be given an extended variable label that gives more detail, and they can use spaces or special characters.For example, C-L could have a variable label that said Conservative-Liberal.

2.After we have given each variable a name and label, we give each possible response to the question a code called a value label that is often the number corresponding to the order of the answers.(We could use another system, but this is the easiest because SPSS works best with numeric codes to represent the data.)For example, SEX could use 1 for male and 2 for female; C-L could use 1 for conservative, 2 for liberal, and 3 for middle of the road.These would be given value labels such as Male, Female, Conservative, Liberal, Middle of the Road.

3.Sometimes respondents do not answer a question, give more than one answer, or do something else that would make their answers unusable.In our example, respondent #2 marked both yes and no on the last question, respondent #3 wrote in none on question 4, and respondent #13 didn’t answer the marriage question.We can assign these missing value codes so they don’t mess up the analysis.Often 9is used to indicate missing data or 99 if it is a two-digit value.(Note that this would cause problems in the analysis if 9 or 99 were real codes, for example, if there were 9 possible responses to a question or if age included some ninety-nine-year-olds.So think carefully before you choose numbers for missing values.).

It is a good idea to plan all this carefully.It is often useful to put the data in a matrix like Table 2.1 before entering it into the SPSS Data Editor.

Table 2.1. Sample Data Set: Questionnaire Responses 

id
age
sex
rel
c-l
mg
abd
abn
abh
abp
abr
abs
aba
01
20
1
4
2
2
2
2
1
3
1
2
2
02
24
2
5
2
2
1
1
1
1
1
1
9
03
21
2
2
9
2
2
2
2
2
2
2
2
04
24
2
5
3
2
1
1
1
1
1
1
1
05
26
2
4
2
2
1
1
1
1
1
1
1
06
28
2
2
2
2
2
2
1
2
1
2
2
07
23
1
1
2
2
1
2
1
1
1
2
2
08
22
2
4
3
1
1
1
1
1
1
1
09
22
1
5
2
2
1
1
1
1
1
1
1
10
22
2
4
4
2
1
1
1
1
1
1
1
11
23
1
2
2
1
2
2
1
2
1
2
3
12
24
2
2
3
2
1
1
1
1
1
1
2
13
51
2
1
2
9
1
1
1
1
1
1
1
14
22
2
2
3
2
1
1
1
1
1
1
1
15
21
2
4
3
2
1
1
1
1
1
1
1
16
37
1
1
3
2
1
2
1
2
1
2
2
17
22
2
4
2
2
1
1
1
1
1
2
2
18
22
2
3
3
2
1
2
1
2
1
2
2
19
22
2
4
3
2
3
2
1
2
1
1
1
20
30
2
5
2
2
1
1
1
1
1
1
1
21
25
2
5
2
2
1
1
1
1
1
1
1
22
23
1
2
2
2
1
1
1
1
1
1
1
23
21
1
1
2
1
1
1
2
1
2
1

Getting Started in SPSS

To create the data file in SPSS, open SPSS (probably by clicking on the SPSS icon on the desktop).If it says, “What would you like to do?”, choose “Type in data” and click OK, see Figure 2-1.This opens a matrix similar to a spreadsheet such as Excel.


Figure 2-1
In our example, the rows will be the cases, i.e., the respondents, and the columns will be the variables, i.e., the questions.So, the upper-left cell will contain the identification number for the first case and the cells to the right will be data about that case.The Data Editor has tabs in the lower-left that let you work with your data in two ways.It probably opened in the Data View mode, if not, click the “Data View” tab at the bottom left of the SPSS screen.Notice that it looks like a spreadsheet. (see Figure 2-2) 

Entering Variable and Value Names and Labels

1.You’ll be using the first column for the respondents’ ID numbers, so type “1” into the first cell (don’t type the quotation marks, just the number).If you click the “Variable View” tab, you can assign variable and value names, Figure 2-3.

2.Do that now by clicking “Variable View”, double clicking the “var00001” at the top left column, typing in ID as we have done, and pressing “Enter". Go back to “Data View”, and you’ll see that the first column is now titled “id”.Note that SPSS 11 changes variable names to lower case.In this text to differentiate variable names from other terms we use all capital letters for a variable name.

3.The second column will be age, so change to the “Variable View”, tab at the bottom left of the SPSS screen, and in row 2, type age under name and tab over to Missing.Click the little gray box, , to open the “Missing Values” dialog box, click “Discrete missing values”, type in 99, and click “OK”, Figure 2-4.(Now, if someone does not give his/her age, we’ll code it 99—and hope no one is really 99 years old.If you want to, you can change back to Data View to see that the column is now headed age.)


Figure 2-4
4.The third variable is sex, so type that in the third row under Name.Since we’re going to use the code 1 for males and 2 for females, we’re going to need value names.Tab over to the cell under “values” and click the little gray box to get the “Value Labels” menu.Type a 1 in the “value” space and Male in the “value label” space and click “Add”.Then, type a 2 in the “value” space and Female in the “value label” space, figure 2-5, and click “OK”.

Figure 2-5
Now, SPSS knows that 1 and 2 in the sex variable are really male and female respectively.For missing values, click the little gray box to open the menu, click “Discrete” missing values, type in 9, and click “OK”.(By now you’ve noticed that SPSS uses lower-case letters for variable and value names.If you want your results in another form, e.g., all caps or the first letter capitalized, use the “variable label” or “value label” box to take care of that.Type it exactly the way you want your table labeled when you do the analysis—often this is like a title with the first letter of each important wordcapitalized.)

5.The next variable is religion, we’re going to name it RELIG.Notice that it has five possibilities—Protestant, Catholic, Jewish, other, and no religion.Go ahead and work out the variable label, value name and labels, and missing values just as you did above.You can refer to the “Codebook for Student Questionnaire” located at the end of this chapter.So far, your data file should look like Figure 2-6.


Figure 2-6

6.Continue entering the variables for the rest of the data set.Some people, especially those who are used to working with spreadsheets, like to enter all the data in Data View before they set up the variable names, etc., so you’ll have to figure out what works best for you.It is very important to save your work as you go along, so do that now.Click the “Save” icon or use “Save” under “File”, and give your data file a sensible name.Notice that SPSS automatically adds ".sav" at the end ofthe file name.

7.Enter the codes for each variable in “Data View”.Then check the accuracy down each column looking for codes that would be impossible.For example, sex can have only three of your data file by scanning possibilities since male is 1, female is 2, and missing information is 9, so a 5 would be a mistake.The best check is to have one person read the codes while another checks the entries on the Data File. 

Chapter Two Exercises 

At California State University, Fresno, the Friendly Visitors Service hires college students to do in-home care for elderly people so they can remain independent and stay in their homes as long as possible.The students do cleaning, yard work, shopping, etc.The staff begins by interviewing clients in their homes and assessing their need for services. The following information is used to match the seniors with the students who want employment:

·Age:Age at last birthday

·Sex:Male or Female

·Lives alone:Yes or No

·Low income:Yes = Eligible for Supplemental Security Income (SSI)

·Need for assistance with the activities of daily living (ADL): Bathing, Dressing, Toileting, Transferring in/out of bed, Eating

·Total number of ADLs needing help:

·Need for assistance with the instrumental activities of daily living (IADL):Using telephone, Shopping, Preparing food, Light housework, Heavy housework, Finances

·Total Number IADLs needing help:

To keep track of the needs of potential clients, the program could create a data file and use it in SPSS.(Data from one month’s new applications is presented in Table 2.3.For this example, we’ll just use the count of the number of activities for which the seniors need help, but note that they could include the yes/no responses for each of the activities of daily living.

Exercise Idea for Instructors to Set Up:

Sometimes a university will be willing to provide raw data on the students enrolled on your campus by age and sex.If so, it is interesting to get the data for the most recent year and for five or ten years ago, so students can enter it on an SPSS data file and use it to learn how to do a variety of statistics with SPSS.

Table2.2Sample Data Set: Friendly Visitor Service Clients
 

id
age
sex
alone
low income
# ADL
#IADL
001
74
M
N
N
0
4
002
66
M
N
N
4
6
003
81
M
N
N
2
5
004
76
F
N
N
0
4
005
74
M
N
N
1
5
006
69
F
N
Y
0
4
007
79
F
Y
N
0
4
008
80
M
N
Y
3
6
009
89
M
N
N
3
5
010
60
F
Y
N
2
6
011
88
F
Y
N
0
3
012
82
F
Y
N
2
4
013
79
F
Y
N
1
4
014
77
M
N
N
3
6
015
62
M
Y
N
1
4
016
83
M
N
N
4
6
017
80
F
Y
N
0
2
018
85
F
N
N
1
4
019
66
F
Y
N
1
3
020
84
M
N
N
4
6
021
74
F
N
N
4
4
022
74
M
N
N
0
2
023
74
F
Y
N
0
5
024
92
M
N
N
3
6
025
66
F
N
N
2
6


 


 
Student Survey Questionnaire

What is your age? ________

Are you ____ male or ___ female?

What is your religious preference?

___ Protestant ___Catholic ___ Jewish ___ Some other religion ___No religion

Generally speaking, in politics, do you consider yourself as

___conservative, ___ liberal, __ middle of the road, or

What kind of marriage do you think is the more satisfying way of life?

___ One where the husband provides for the family and the wife takes care of the house and children

___ One where both the husband and wife have jobs and both take care of the house and children

Do you think it should be possible for a pregnant women to obtain a legal abortion:

If there is a strong chance of serious defect in the baby? __Yes __ No ___Dont Know

If she is married and does not want any more children? __Yes __ No ___Dont Know

If the woman's own health is seriously endangered by pregnancy?

__Yes __ No ___Dont Know

If the family has a very low income and cannot afford any more children?

__Yes __ No ___Don't Know

If she became pregnant as a result of rape? __Yes __ No ___Dont Know

If she is not married and does not want to marry the man? __Yes __No __ Don’t Know

If the woman wants it for any reason __Yes __ No ___DonKnow



 

Codebook for Student Questionnaire

Missing Values

9 or 99

Age

Age at last birthday

Sex

1 = male, 2 = female

Religious Preference 

1 = Protestant, 2 = Catholic, 3 = Jewish, 4 = Other, 5 = No

Political 

1 = Conservative, 2 = Liberal, 3 = Middle of the road

Preferred Marriage 

1 = Traditional, 2 = Shared

Abortion if Birth Defect 

1= Yes, 2 = No, 3 = Don't Know

Abortion if No More Children

1= Yes, 2 = No, 3 = Don't Know

Abortion if Health Risk

1= Yes, 2 = No, 3 = Don't Know

Abortion if Poor

1= Yes, 2 = No, 3 = Don't Know

Abortion if Rape:

1= Yes, 2 = No, 3 = Don't Know

Abortion if Not Married:

1= Yes, 2 = No, 3 = Don't Know

Abortion For Any Reason:

1= Yes, 2 = No, 3 = Don't Know

References

James A. Davis, Smith, Tom W., and Marsden, Peter.2001.General Social Surveys: 1972-2000. Chicago:NationalOpinionResearchCenter.



A copy of the questionnaire is included at the end of this chapter.
This, "[ABDEFECT]" is the name of this question, the variable name in the General Social Survey.
You can also set up a data file in a spread sheet like Excel and read the datafile into SPSS.