How Vocabulary Tests Get It Wrong

October 9, 2014

As is now well-known, scores on “intelligence” tests rose strongly over the last few generations, world-wide—this is the “Flynn Effect.” One striking anomaly, however, appears in American data: slumping students’ scores on academic achievement tests like the SAT. Notes of the decline starting in the 1960s sparked a lot of concern and hand-wringing. A similar decline is evident among adult respondents to the General Social Survey. The GSS gives interviewees a 10-item, multiple choice vocabulary test. (Practically speaking, vocabulary tests yield pretty much the same results as intelligence tests.) In over 40 years of the survey, a pattern emerged: Correct scores rose from the generations born around 1900 to the generations born around 1950 and then dropped afterwards. Are recently-born cohorts dumber—or, at least, less literate—than their parents and grandparents?

A new study presented to the American Sociological Association in August by Shawn Dorius (Iowa State), Duane Alwin (Penn. State), and Juliana Pacheco (U. of Iowa) tested a hunch several researchers have had about the generational pattern in the GSS vocabulary test—that words have histories.

Word Fashions

Dorius and colleagues argue that two generational changes explain the rise and drop in vocabulary scores. One is increasing schooling, especially the rapid rise from Americans born c. 1900 to those born c. 1950. The other is that words go in and out of fashion. They argue that usage of the words that GSS researchers put into the test have waxed and waned over the years, that people are more likely to recognize words that were common when they were growing up, and that this explains the trends in accuracy.

Many social research tools are in flux. Words come and go.

To track how common words were, they turned to Google’s Ngram program. (I have discussed this tool before and its frequent misuse—here and here—but Dorius et al. do a great job with it.) Google has scanned millions of books over centuries, allowing users to see how often, proportionately to all the words used, a word appeared in an American book in any particular year. They took the ten test words – most of which became relatively less common over the century—and also the words that appeared in the answers respondents were given to choose from. (The actual words used are kept confidential by the GSS.) They statistically distinguished “basic” words from “advanced” vocabulary words. Each of over 20,000 respondents in the cumulative GSS survey, 1974 to 2012, got a score for how common the words were in the years between the respondent’s birth and the year he or she turned fifteen.

Dorius and colleagues found that, other things being equal, the rise in test scores from the earliest cohorts to the mid-century cohorts is largely explained by the schooling those cohorts got. And importantly, the decline in test scores from the latter cohorts to the latest ones can be explained by the declining use of certain words, especially “advanced” ones. Once both factors are taken into account, there is little difference among generations in vocabulary scores.

One lesson here is for students of historical change, to realize that their measures also change. Unlike chem lab test tubes or thermometers, many social research tools are in flux. Not only do words come and go—one reason why reading, say, Jane Austen (not to mention Shakespeare) can be so difficult—they sometimes change in their meaning. (A character’s “gay friends” in a 1914 novel are not the same sort as one’s in a 2014 novel.)

Another is, as the authors argue, that, if we assume their discovery with regard to the GSS vocabulary test can be applied to achievement tests—which is plausible—then the American anomaly regarding the Flynn Effect seems to go away. Shawn Doruis kindly provided the summary figure below. The solid lines with dots shows what the vocabulary scores are after adjusting for early exposure to the words (as well as adjusting for age and education) for each birth cohort (the X-axis). The dashed line simply summarizes the gross trend: the later born, the higher the adjusted score. The Flynn Effect emerges.

How Vocabulary Tests Get It Wrong

Donate to support work like this:

Artificial Reason

How to Lie with (Political) Statistics

From the Editors: Checks and Balances Won’t Save Us Now

Get our newsletter