Chapter 2 -- Describing Populations

Last Modified 19 August 1998

People often want very basic information about housing and population in specific areas like cities or counties. They want to know the number of children within a community, the level of poverty, the kinds of employment that people are engaged in, or the size and age of housing. Political representation and revenue sharing are allocated based on numbers of persons, and the amount of government spending is often based on the numbers of persons with a given characteristic.

Just acquiring the desired information is often not sufficient. To understand the meaning of the data, the values should be compared to a place of similar size or to a larger summary area such as an entire city, county, state, region, or the United States. This information helps one understand whether the acquired data values are greatly different from those of a much larger population. For example, data for the city of San Francisco could be compared to corresponding values for other cities in California or the State as a whole while values of California could be compared to either other states or national averages.

Furthermore, demographers frequently extract the same information for earlier censuses. In this way they get a sense about whether the current values represent increases or decreases from previous decades.

A. Some Basic Population Data Describing a City

As an example for this exercise we will arbitrarily pick the city of Glendale, California. It has a census place code of 30000.


Table 1. Ethnic Populations in Glendale, Los Angeles, and California, 1990 


 

Glendale

Los Angeles City 

California
Area in Square Miles
30.61
 469.3
155,973
Total Population 
180,038
3,485,398
29,760,021
Males
86,606
1,750,055
14,897,627
Females
93,432
1,735,343
14,862,394
Non-Hispanic Whites
114,765
1,299,604
17,029,126
Blacks
2,334
487,674
2,208,801
American Indians
629
15,641
242,164
Hispanics
37,731
1,391,411
7,687,938
Asians and Pacific Islanders
25,453
341,807
2,845,659
Persons / Family
3.22
3.48
3.32
Pop Density per Square Mile
5881.5
 7426.2
190.8
% Male
48.1
50.2
50.1
% Non-Hispanic White
63.7
37.3
57.2
% Black
1.3
14.0
7.4
% American Indian
0.3
0.4
0.8
% Hispanic
21.0
39.9
25.8
%Asian and Pacific Islander
14.1
9.8
9.6

 

Glendale is a city of about 30 square miles located just northeast of downtown Los Angeles. Its 1990 population was just over 180,000.

 Density - The population density of the city seems high compared to all California, but the state contains large, unsettled areas while most cities do not. Glendale does contain some large areas of mountainous open space so its population density is less than that of neighboring Los Angeles. Density values exceeding 20,000 persons per square mile are found in some neighborhoods of large cities. Density computed this way assumes the population is spread evenly over the sampling area, but this is rarely the case.

 Family Size - An important indicator of the number of people in a household is the average number of people per household, but the number of people in an average family is also sometimes used. In Glendale the number of persons per family is slightly lower than the state. This may be a result of an older population, more singles, or the larger white population, a group that tends to have smaller families.

 Sex - There are fewer males than females in Glendale and the proportion is lower than for all California. This may be another indicator of an older population in the city since the number of females tends to exceed the number of males in older age groups. Age data could be extracted to confirm this.

 Ethnicity - Non-Hispanic whites are the largest group within the city. Expressed as a percentage, non-Hispanic whites constituted about 64% of the population while Hispanics and Asians accounted for 21% and 14% respectively. Compared to the State, Glendale has higher percentages of both whites and Asians and a substantially lower percentage of blacks. If more detailed race data had been used, the relatively larger Korean and Filipino communities within Glendale would have been evident within the Asian and Pacific Islander category.

 

B. Examining a Characteristic in All Cities - Ranking Places

Often one wants to see how cities rank according to a given characteristic. Once the ranking is done, those cities that have very high or very low values can be examined in more detail to see if reasons can be determined for their position in the ranking.

 Densely Populated Places

In the example below, cities have been ranked by population density. State names, area, and total population have been included.

Table 2. Highest and Lowest Density Cities

in the U. S., 1990 
State
Area in Sq. Mi.
City
Total Population
Density 
Pop/Sq. Mi.
NJ
1.27
Union City
58,012
45,822
NJ
1.02
West New York
38,125
37,502
NJ
1.27
Hoboken
33,397
26,243
CA
1.17
Maywood
27,850
23,900
NY
308.95
New York
7,322,564
23,702
NJ
0.96
Cliffside Park
20,393
21,153
CA
1.10
Cudahy
22,817
20,728
NJ
2.88
Irvington
59,774
20,725
CA
0.74
Walnut Park
14,722
20,026
CA
1.15
Lennox
22,757
19,785
CA
1.88
West Hollywood
36,118
19,228
NJ
3.92
East Orange
73,552
18,743
NJ
3.10
Passaic
58,041
18,707
         
CA
54.10
Twentynine Palms
11,821
218
OK
79.64
El Reno
15,414
194
NH
61.73
Berlin
11,824
192
MO
97.18
Fort Leonard Wood
15,863
163
FL
74.78
North Port
11,973
160
ME
75.75
Presque Isle
10,550
139
AK
1697.65
Anchorage
226,338
133
VA
400.08
Suffolk
52,141
130
MN
181.68
Hibbing
18,046
99
MT
716.18
Butte-Silver Bow
33,336
47
MT
736.94
Anaconda
10,278
14
AK
2593.56
Juneau
26,751
10

 

Cities with the highest densities are usually found in California or New Jersey. With the exception of New York City, these cities tend to be small in area with fairly small populations. All are a part of major urban areas. Probably they contain many apartment units for people who work in nearby Los Angeles and New York.

Examination of the low density cities indicates a problem with calculating density for cities. There is no guarantee that the corporate limit of a city encompasses populated areas. For political reasons the boundary may have been extended far beyond the settled portion of the city. The areas surrounding Juneau and Anchorage, Alaska are such cases.

Ethnic Composition - An Example

The question on race in the U.S. Census is separate from the question on Hispanic origin. People can indicate a particular race such as white, black, American Indian, any of several Asian groups, or other. Then they may indicate if they are or are not of Spanish/Hispanic origin, such as Mexican, Cuban, or Puerto Rican. Many Hispanics indicate their race as white, yet whites are commonly seen as distinct from Hispanics. Thus, tabulations based on the total reported white race are complicated by two distinctly different groups. To compensate for this it is usually better to use the non-Hispanic white category when tabulating data for "whites." This removes those persons of white race who indicated that they also were of Hispanic origin.

In the following table, the percent of the total white population that is also Hispanic has been calculated and ranked. For the entire United States, 5.8% of the white population is white Hispanic. Table 3a below shows the 10 cities with the highest percentages of white race population that are also Hispanic. Most of these cities are on the U.S. - Mexican border, but there are stongly Hispanic places within the Los Angeles and Miami metropolitan areas.


Table 3a.  Highest Ten United States Cities with the Hispanic White Population as a Percent of Total White Population

1990


State

City

White Race Population

Hispanic Persons Indicating White Race 

% of White Race Persons 

that are Hispanic White
United States
199,686,070
11,557,774
5.8
CA
Calexico
12,628
12,212
96.7
CA
Florence-Graham
11,676
11,219
96.1
TX
Socorro
18,071
17,123
94.8
CA
East Los Angeles
53,330
49,833
93.4
TX
Eagle Pass
11,696
10,855
92.8
TX
Laredo
87,048
80,224
92.2
FL
Sweetwater
10,857
9,967
91.8
CA
Coachella
5,329
4,783
89.8
AZ
Nogales
13,642
1,232
89.7
TX
Mercedes
10,208
9,102
89.2

 

  Table 3b provides this data for cities of 500,000 or more persons. Large cities exhibit a great range in percentage of Hispanics within the entire white population. Obviously the error of using the entire white race population as an indicator for non-Hispanic whites is much more serious in the first half of the list, but only in a few places like Baltimore and Indianapolis are Hispanics so few in number that the entire white race population is a good indicator of the white population as usually understood.


Table 3b.  Hispanic White Population as a Percent of White Population

United States Cities Over 500,000 Persons, 1990


State

City

White Race Population

Hispanic Persons Indicating White Race 

% of White Race Persons 

that are Hispanic White
TX
El Paso
396,122
260,120
65.7
TX
San Antonio
676,082
336,967
49.8
CA
Los Angeles
1,841,182
541,578
29.4
TX
Houston
859,069
196,427
22.9
CA
San Jose
491,280
103,533
21.1
NY
New York
3,827,088
663,963
17.3
IL
Chicago
1.263.524
207.476
16.4
TX
Dallas
556,760
76,780
13.8
CA
San Francisco
387,783
50,665
13.1
CA
San Diego
745,406
93,671
12.6
AZ
Phoenix
803,332
97,640
12.2
DC
Washington
179,667
13,536
7.5
MA
Boston
360,875
22,141
6.1
LA
New Orleans
173,554
9,028
5.2
MI
Detroit
222,316
10,038
4.5
WI
Milwaukee
398,033
16,316
4.1
OH
Cleveland
250,234
8,682
3.5
PA
Philadelphia
848,586
22,747
2.7
FL
Jacksonville
456,529
10,007
2.2
WA
Seattle
388,858
8,435
2.2
MD
Baltimore
287,753
3,566
1.2
IN
Indianapolis
554,423
4,374
0.8
TN
Memphis
268,600
2,110
0.8
OH
Columbus
471,025
3,673
0.8

 

Table 3c indicates the cities in which Hispanics of white race are expressed as percentage of all Hispanics.  The last column represents an intriguing phenomenon. Although in the entire United States, 51.7% of the Hispanics indicated their race as white, in these places, Hispanics identified their race as white at much higher percentages. Because many of these places are too small to have local Hispanic communities, it seems likely that these Hispanics are highly acculturated and assimilated into the general white population.

Table 3c.  Hispanic White Population as a Percent of Hispanic Population 

 United States Cities Over 10,000 Persons, 1990 

State

City

Hispanic Population

Hispanic Persons Indicating 

White Race

% of Hispanic Origin that are Hispanic White
United States
22,354,059
11,557,774
51.7
TX
Pecos
8,769
8,689
99.1
WV
Moundsville
121
117
96.7
FL
Kings Point
60
58
96.7
MI
Grosse Pointe Farms
72
69
95.8
OH
Ironton
23
22
95.7
FL
Aventura
1,067
1,017
95.3
FL
Coral Gables
16,778
15,989
95.3
NJ
Holiday City -Berkeley
99
94
94.9
AL
Albertville
77
73
94.8
OH
Tallmadge
77
73
94.8
NY
Hamburg
76
72
94.7
OH
Norton
52
49
94.2
NJ
Hanover Twp.
266
250
94.0
AZ
Sun City West
33
31
93.9
NY
Massapequa Park
336
315
93.8
AL
Alabaster
80
75
93.8
FL
Hamptons at Boca Raton
319
299
93.7
OH
North Canton
95
89
93.7
FL
Olympia Heights
29,922
27,984
93.5
FL
Westchester
24,554
22,924
93.4

C. Mapping a Distribution

Whenever one analyzes a large number of census observations, it is often very helpful to also produce a map of the data to see if there are any spatial patterns that may not be apparent in a table. Maps reveal spatial qualities that are rarely evident in statistical tabulations. A researcher may notice that certain places seem to occur near one another when values are sorted in a table, but maps provide this information in detail and at a glance. For example, one can see from Table 2 that many of the densely populated cities are in California and New Jersey, but the table doesn't indicate if these cities are clustered together or linked to certain geographical features such as industrial areas, central cities, or agricultural areas.

 Census Geography

To produce maps one needs either a file of the boundaries of the geographic units or a single point for the centroid (spatial center) of the unit. Fortunately, the Census Bureau provides a centroid value in latitude and longitude terms for each of its geographic units. It also publishes the area of these units that can be used to calculate the density of a variable within the unit. The actual boundaries can be obtained in several ways: by using software that will generate them from the street segments in a census TIGER file, by purchasing them from one of several data vendors, or by downloading them (often for free) over the Internet. Usually boundary files provided by data vendors are better in quality than those from other sources. In addition, many geographic information systems (GIS) software packages include boundaries in their sample data for nations, states, counties and ZIP codes.

The size of a statistical area used for analysis can be significant. It is important to realize that the results of analyses are applicable to only the selected units - not to individual people or to units of different sizes. Larger areas mask some of the variability found between smaller areas. Within the state, for example, county rates might vary greatly from that reported for the state and possibly significant differences could be masked if only the state averages are used.

The Census Bureau reports data for blocks, block groups, tracts, counties, states, divisions, and regions. In addition, tabulations are made for places, Congressional districts, metropolitan and rural areas, and various administrative units such as Indian reservations. A block contains about 100 persons, a block group about 1000, and a census tract about 4000. However, there may be a considerable range in these values. In Los Angeles County, census tracts range in value from 0 to over 35,000 persons. The average size is around 5500 persons.

For local area analysis, tracts have long been the preferred areal unit, while at the regional or national level, counties have been used. Within a local area, block-level statistics are occasionally used to compare neighborhoods, but tabulations of data from the sample questions are unavailable for blocks, and so analysis possibilities are more limited.

 Mapping Counts and Percents

Examining patterns of counts of population on maps reveals only part of a picture. Such maps indicate where there are more or fewer people, but they may not indicate differences in the relative concentration of one ethnic group compared to another. For example, mapping the number of Hispanics indicates where the numbers are, but one also would expect to find more Hispanics where there are more people. Thus, mapping counts of population components yields maps that are often very much alike. It is usually more valuable to additionally map the percentage of the total population that is Hispanic to reveal where the group is proportionately more concentrated.

Mapping a group by density (i.e. dividing by the sampling unit area) may also be helpful since it readjusts the total population count for the varying areas of the statistical units. A potential problem with mapping population counts is that larger statistical areas generally contain larger numbers of a population.

Although a very large number of mapping styles are possible for portraying statistical information, in practice only a few are used. This is especially true when using computer software, which typically presents few mapping options.

 Choropleth Maps

The most common census mapping product is probably the choropleth map. Here the statistical areas are shaded in relation to the data values. The technique is very common with census data because values are reported for statistical units. The values for the areal units are sorted and divided into four to eight classes. Each class is assigned a progressively darker or brighter tone such that a visual order is apparent that approximates increasing magnitude of the values. This would seem a straightforward relationship, but many people assign colors to categories in an almost random way. An alternative approach is to use a bi-variate color scheme that uses two hues that progressively darken as values depart from an average or selected base value. For example, one might create employment categories that become more brown for counties in those categories that are below the state average. Employment categories above the state average might be in shades of a progressively darker green.

A real challenge in choropleth mapping is to decide on an appropriate number of classes and on a method for selecting class breaks. There is no simple answer to this problem. As a rule of thumb the method proposed by George Jenks (the default method and currently misnamed "natural breaks" in ArcView) would be preferable to others. This method seeks to minimize variation between values within the classes. In many situations, especially when a number of maps are to be compared, quantile breaks are appropriate. An alternative method occasionally used is to compute the mean of the distribution and to create class breaks based on standard deviation values about the mean.

On choropleth maps data should be expressed as a ratio, index, percentage, or density. Such maps are not appropriate for showing counts of people. This is because large areas tend to appear in higher classes not because of any characteristic, but because larger areas encompass a greater portion of a population distribution. Obviously Texas will have more people than Oklahoma because it covers more area.

Another concern with the difference in size of areal units on choropleth maps is that larger areas will visually dominate on the map and many of these are in rural areas with small populations. Such large areas call undue attention to themselves on the map. On the otherhand, large populations occur in very small areas such as the boroughs of New York or in Washington D. C. and might not be noticed by map readers. An inset map can be helpful in drawing attention to some of these smaller areas if they are not discernible on a map of a large area such as the entire United States.

Graduated Symbol Maps

A second method often found in census mapping is graduated symbols. With this approach the area of a circle or square is made proportional to the value of an attribute. Graduated symbols may be used for point features such as cities and may represent counts of things. A frequent problem with this technique is that the range of values far exceeds the range that can be effectively presented on the map. Thus, it may be necessary to set a lower limit to be displayed. Values below the threshold are either not shown or are assigned a standard symbol. An alternative strategy available in some programs is to define a set of groups and then assign a single symbol size to all values falling within the range of a given group. This method, referred to as "range-graded symbols" is somewhat like the classification scheme used for the choropleth map.

 Dot Maps

A third method is the dot map, a technique that requires the assignment of a given number of individuals to a dot. The dot is then located to represent the approximate location of a group of individuals. When done manually, additional maps and aerial photographs may be used to help determine the appropriate dot placement. It also permits the overlay of multiple distributions on the same map by using dots of different shapes or colors.

Unfortunately, computer programs can only locate the dots randomly within a statistical area. The patterns only begin to become meaningful when statistical areas as shown on the map are very small. In other cases, the look of the distribution can be improved by moving the map to a graphic arts program where dots can be moved individually away from unpopulated areas within the statistical units.

 Mapping with ArcView

The California State University currently has a site-license for ESRI software that includes a mapping/GIS package called ArcView. This package, or any other, can be used to produce choropleth, graduated symbol, and dot maps from census data. Appendix M gives an explanation of the use of ArcView.

 Mapping Cities with High Population Density

In the example below, population densities used for Table 2 have been mapped. To focus on the most densely populated cities, the frequency distribution of cities was divided equally into 20 classes. The top class was split in two. Thus, only the top 5 percent of the cities are presented. Note that most of the cities cluster around New York and Los Angeles. To better examine these areas, look at the two large-scale inset maps. (Two new Views were produced at large scale and added to the Layout Window.)

Chapter 2 -- Describing Populations