Zombie Meditations

Are Sociologists More Employable Than a Pet Rock? A Case Study on Time Wise


A 2006 article from Tim Wise, What Kind of Card is Race? The Absurdity (and Consistency) of White Denial, lays out the well–known “anti–racist” author’s central thesis: white people are in a self–serving state of denial about the overwhelming extent to which rampant, systematic racism is still responsible for violently holding non–white members of society down—and therefore, by corollary, artificially propping them up—because it would threaten white peoples’ self–image to accept the self–abasing truth that their relative successes are largely the result of racism rather than any genuine hard work or achievement. A consistent underlying current of Wise’s rhetoric is that we can’t move forward on these problems so long as white people are too smug to admit that they are as severe as Wise tells us they are—and Wise apparently sees it as an important part of his mission to knock this self–confidence down a few pegs in order to pave the way for a more sombre accounting of the piteous state of affairs.

The article mentions a few different studies. But I’ve found it to be a significant rule of thumb that if I ever see a long list of studies plastered together to support a point, I’m probably going to find something surprising if I spend any significant amount of time digging into any particular one of them. I’ve only done that with one of the studies on this list, so that’s the one I want to talk about now. He writes: “That bringing up racism (even with copious documentation) is far from an effective ‘card’ to play in order to garner sympathy, is evidenced by the way in which few people even become aware of the studies confirming its existence. How many Americans do you figure have even heard, for example, (…) that persons with ‘white sounding names,’ according to a massive national study, are fifty percent more likely to be called back for a job interview than those with ‘black sounding names,’ even when all other credentials are the same?”

The study he refers to is Marianne Bertrand and Sendhil Mullainathan’s 2004 “Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment in Labor Market Discrimination.” The importance of the study in shaping perceptions can’t be stressed enough: a search on the Web of Science database for the topic “race” in the area“sociology” and domain of “social sciences” shows that the rough average of number of citations for the top 100 studies is around 600. Are Emily and Greg More Employable than Lakisha and Jamal? has been cited a whopping 1,555.

What the study claimed to have found is staggering: Quoting the National Bureau of Economic Research’s summary, “In total, the authors responded to more than 1,300 employment ads in the sales, administrative support, clerical, and customer services job categories, sending out nearly 5,000 resumes. The ads covered a large spectrum of job quality, from cashier work at retail establishments and clerical work in a mailroom to office and sales management positions. The results indicate large racial differences in callback rates to a phone line with a voice mailbox attached and a message recorded by someone of the appropriate race and gender.” The study ‘indicates that a white name yields as many more callbacks as an additional eight years of experience.’” An article at Salon reporting on the study said that “[We] have a national religion… [and] it’s Denialism. … [R]acism and white privilege dominate American society. … This truth is everywhere. … You can see it in a 2004 MIT study showing that job–seekers with “white names receive 50 percent more callbacks for interviews” than job seekers with comparable résumés and African American–sounding names.”

First, a nearly universal impression (which the authors of the study and articles summarizing it have apparently done little to correct) is that the study actually kept the resumes identical, except for race. Wise repeats the assumption almost directly when he says that “all other credentials” in this study (besides race) were “the same.” But this simply isn’t true. The study explains in its section IIA on page 994 of the AER (page 5 of the study) that it “begin(s) with resumes posted on two job search Web sites as the basis for [its] artificial resumes. ( . . . ) During this process, we classify the resumes within each detailed occupational category into two groups: high and low quality. In judging resume quality, we use criteria such as labor market experience, career profile, existence of gaps in employment, and skills listed. Such a classification is admittedly subjective . . . to further reinforce the quality gap between the two sets of resumes, we add to each high-quality resume a subset of the following features: summer or while-at-school employment experience, volunteering experience, extra computer skills, certification degrees, foreign language skills, honors, or some military experience.”

Nowhere in the study was there actually indication that the resumes were kept identical except for the racial connotation of the name of the applicant—and yet this claim was repeated widely throughout the media. Indeed, the fact that “high quality” resumes were given a different addition from one of the “high quality” sets of features already proves that they were not, in fact, identical. Further, in section IIC on page 996 of the AER (page 7 of the study), we read: ”For each ad, we use the bank of resumes to sample four resumes (two high-quality and two low-quality that fit the job description as closely as possible. In some cases, we slightly alter the resumes to improve the quality of the match, such as by adding the knowledge of a specific software program.” Yet again, we see that different resumes appear in fact to have had different qualifications.  And it continues: “The final resumes are formatted, with fonts, layout, and cover letter style chosen at random.” We will come back to this later—it actually does turn out to be quite potentially relevant, once we identify the study’s other flaws.

First, we have to ask how disparate the rate of callbacks between candidates actually were. When summaries state that the study “responded to more than 1,300 employment ads . . . sending out nearly 5,000 resumes,” this seems to imply that we’re dealing with very large numbers of difference in callback rates. However, as it turns out, that just isn’t the case. The study explains on p.7 that it uses both male and female names for sales jobs, but uses female names “nearly exclusively” for administrative and clerical jobs in order to ensure higher overall callback rates. In total, of 5000 resumes, 1124 were male. Of this total, 9 racially distinct names were then used to create 575 white and 549 black resumes. If we assume these names were divided amongst the total for each race equally, this means about 62 resumes were sent out for each name. The title of the study compares “Greg” to “Jamal,” telling us that the former received a 7.8% callback rate while the later received a 6.6% callback rate. (I’ll return to the question of the selection of names for the heading of the study momentarily.) If Greg received 7.8% callbacks out of 62 attempts, this means he received 5 actual calls. Meanwhile, if Jamal received 6.6% callbacks out of 62 attempts, this means he received 4 actual calls. The actual difference between them? One call. Yet, in the statistical framing often applied to the study, that one call actually represents “almost a 20% difference” (I’ll elaborate more on the way percent increase is calculated later as well).

The situation with Emily and Lakisha is only somewhat improved. Again, of 5000 resumes, 3876 were female. If we assume the 9 female names chosen for each race were distributed equally between 1938 white and 1938 black names, this means each name was sent out about 215 times. Emily’s 7.9% callback rate would thus translate into 17 actual calls; Lakisha’s 5.5% callback rate into 12 actual calls. Now it may look like we’re finding that the larger our sample size is, the more clearly we find the purported effect—the sample size just happens to be larger for the female applicants. But here is where it becomes relevant to look at the choice by the authors of the study of which names and associated percentages to use for the title of the study; it turns out that the general findings are simply nowhere near as dramatic as the isolated examples that the selective choice of individual names would imply.

In the chart on page 19, the percentage of callbacks acquired by each name are recorded. And if we look close enough, we notice something startling: if we average five of the white female names, Emily, Anne, Jill, Allison, and Laurie together, we get a callback rate of 8.5%. If we average four of the black female names, Latoya, Kenya, Latonya, and Ebony together, we get a callback rate of 8.95%—a 5.3% larger callback rate for the African–American applicants (even though, again, this is actually only a difference of 18 total white versus 19 total black callbacks).  If we include Laurie and Tanisha, this only drops to 8.7% versus 8.3%—a difference of 19 total white versus 18 total black callbacks. Why should this be? Why this overwhelming equality between more than half of the sample size? Why should Kristen receive an overwhelming advantage over Anne—13.1% (or 28/215 calls) to Anne’s 8.3% (or 19/215 calls)? And why should Brad receive such an advantage over Todd—15.9% (or 10/63 calls) to Todd’s 5.9% (or 4/63 calls)? The difference in the callback rate within races is as large as most of the between-race differences. Why would employers prefer Jamal more than Todd? Why would rampantly racist employers like Emily less than Kenya (which is the name of a black–majority country)?! And why would they discriminate against Aisha but not against Ebony, a name that literally means “black”?!

Why would Jermaine (9.6) and Leroy (9.4)—who together have an average 9.5% callback rate—beat Todd (5.9), Neil (6.6), Geoffrey (6.8), Brett (6.8), Brendan (7.7), Greg (7.8), and Matthew (9.0)—who together have an average 7.2% callback rate—by an additional 2.3%? Pay close attention to the often visually misleading way that percent increases are calculated in studies like these. The actual difference between 9.5% and 7.2% is ±2.3%. But 2.3 is 32% of 7.2; so the percentage increase from 7.2 to 9.5 isn’t 2.3%, even though that is the actual difference between them—it’s 32%. If the baseline risk of developing skin cancer is 0.005%, and one month of daily tanning bed use increases that risk by 50%, that sounds like a lot—enough to scare most people away from considering it. But what that actually means is that a mere 0.0025% (50% of 0.005) is added to the original, baseline risk of 0.005%, to arrive at a new risk of 0.0075%—or in other words, an extra 2 or 3 people per 100,000 who use a tanning bed daily for a month. In this case, the 32% advantage for the top two “black” names over the bottom six “white” names simply represents an average of about 6 calls for each of these “black” names and an average of 4.5 calls for each of these “white” names—or in other words, one additional call for every 42 attempts. 

Bertrand summarizes the meaning of the study when she writes that: “Applicants with white names need to send about 10 resumes to get one callback. Applicants with black names need to send about 15 resumes to achieve the same result.” In fact, however, what the study actually found was that black applicants named Jermaine or Leroy apparently need to send about 10, while white applicants named Todd need to send about 20. White applicants named Neil, Geoffrey, or Brett need to send about 15. Black applicants named Kenya needs to send about 11.5. And so on.

In any case, the authors of the study themselves actually acknowledge that this is a problem for their thesis. On pp.19–20, they write: “there is significant variation in callback rates by name. Of course, chance alone could produce such variation….” And this finally returns us full circle to the opening point: nowhere do the authors actually state that they did in fact send out identical resumes with only the names changed; and their description lends itself perfectly well to the interpretation that they chose existing resumes, altered them to their discretion, and then applied either a white or black name to that one particular resume before tossing it into either a high–quality or low–quality pile (rinse and repeat for all applicants). If so, this would perfectly well explain why Brad would perform a full 10% better than Todd (or somewhere close to a 300% increase), while Jermaine performs a full 6.6% better than Rasheed, Kristen performs a full 5.2% better than Emily, and Ebony performs a full 7.4% better than Aisha: they all had different resumes. Otherwise, what could explain the shameless supremacy of Kristen and Ebony?

Thus, the actual implication of this finding would in fact be exactly the opposite of what it was universally taken to have proved: how an applicant presented themselves—whether by way of fonts, layouts, and cover styles or by way of qualifications—actually had a far more significant impact on their likelihood of being called back in response to a job application than the racial connotations of their name. Only that, or else all the findings of the study being no more than the simple result of chance alone, could explain why the variation in callback rates between names within the two racial categories was so much greater than the variation between the two categories taken as a whole.

_______ ~.::[༒]::.~ _______

Some of these problems were explained to Tim Wise by A. R. Ward in a debate that took place between the two of them. The section in which Ward summarized these criticisms was published around Februrary of 2011. In January of 2012, Ward published the entire transcript of the whole debate on his website, writing: “After 5 rounds of back–and–forth I’ve decided to publish the debate for all to read. I’m still waiting for him to respond to my final entry (it’s been 9 months), and I’ll post his response if I get it.” On February 10, Ward updated the posting to link to Wise’s summary and stat that Wise’s final reply was supposed to be on the way. Yet, as of May 2015, there is not a trace that Wise has ever returned.

In fact, the summary that Wise did upload on February 9 included only the first two rounds of the debate—leaving out the part in which Wise faced specific criticisms he has never once responded to—and yet Wise accuses Ward of “perhaps needing attention, [since he] decided to go ahead and publish an incredibly partial, truncated excerpt from the debate on his site….” Wise claims that he wants the reader to “see each completed round as it currently stands, rather than just snippets intended to make one debater seem particularly absurd and the other especially bright,” and goes on to promise that “Upon finishing up my final statement, I will post his closing and then mine, for a fully completed debate.” However, Wise is in fact the one truncating the most trenchant criticisms against his own claims out of his summary of the debate—and now more than two years later, there is no indication to be found that he has ever provided the promised response.

And yet as recently as March of 2014, this study was still at the top of just five references Wise chose to employ in one of his major public speeches, where presumably he would want to restrict his choices to only the most powerful pieces of evidence to pack as much quality into a limited quantity of time possible. Having been made directly aware of the depths of the problems with this study, Wise has not seen fit to address them in any detail anywhere—despite having pledged to—and yet he still sees fit to quote this study as one of the most compelling pieces of evidence of just how overwhelmingly systematic the influence of racism is in employment in the United States today; and his rhetoric has not shifted so much as to even suggest any hint of his comprehension of the possibility that someone might not consider it so obviously overwhelmingly damning after all.

_______ ~.::[༒]::.~ _______

Do we have any other evidence suggesting what the impact of distinctly black names on employment prospects might be? As a matter of fact, we do. The Causes and Consequences of Distinctly Black Names was co-authored by Steven Levitt, the white economist of Freakonomics fame, along with Roland Fryer, a black economist who after an abusive childhood involving one parent’s abandonment and the other’s physical abuse became (to quote Freakonomics itself) “[a] full fledged gangster by his teens”—but later, in 1998, graduated magna cum laude from the University of Texas at Arlington while holding down a full time job. In 2008, at the age of 30, he became the youngest African–American to ever receive tenure at Harvard. He also maintains an office at the W. E. B. Du Bois Institute.

What were the findings of this study? In brief: “(…) We find … no negative relationship between having a distinctively Black name and later life outcomes….” The data set for this study was, without question, overwhelmingly more comprehensive than that conjured by the Mullainathan study — The Causes and Consequences of Distinctively Black Names looked at birth certificate information for every single child born in California since 1961, covering more than 16 million births — that is, births of real living people, not conjured hypothetical ones. Steven Levitt, in an article for Slate, explains: “how much does your name really matter? Over the years, a series of studies have tried to measure how people perceive different names. Typically, a researcher would send two identical (and fake) résumés, one with a traditionally white name and the other with an immigrant or minority–sounding name, to potential employers. The “white” résumés have always gleaned more job interviews. Such studies are tantalizing but severely limited, since they offer no real–world follow–up or analysis beyond the résumé stunt.

 The California names data, however, afford a more robust opportunity. By subjecting this data to the economist’s favorite magic trick—a statistical wonder known as regression analysis—it’s possible to tease out the effect of any one factor (in this case, a person’s first name) on her future education, income, and health.” And with these advantages, the study found “no relationship between how Black one’s name is and life outcomes….” [1] That is the finding of the single most overwhelmingly large study of the impact of the racial connotation of a person’s name on their chance of being hired conducted on measurements of real people instead of extrapolation from fictional “résumé stunts.”

Notably, this study has been cited only 265 times, in comparison to the Mullainathan study’s 1555.

That, for the record, is “an 83% reduction” in the citation rate.

_______ ~.::[༒]::.~ _______

In the time since this article was originally written, a new study has appeared (“Race and gender effects on employer interest in job applicants: new evidence from a resume field experiment”) which addresses the debates raised by these two studies directly. Specifically, they performed a “callback” study like Mullainathan’s, but when they did so this time, they corrected for the fact identified by Fryer and Levitt that distinctively black first names like “Precious” and “Tyrone” are correlated with poverty (with blacks of the same socioeconomic status, with or without these names, having perfectly equivalent life outcomes). And they did this by assigning the job applicants in their study distinctively black last names like Washington or Jefferson (up to 90% of people with these last names in the United States are black), and ambiguous first names used by both black and white Americans alike (Chloe, Ryan).

They found “little evidence of systematic employer preferences for applicants from particular race and gender groups.” They write: “Fryer and Levitt (2004) show that after taking into account the socioeconomic correlates of distinctively African-American sounding names, the large effect of these names on employer responses attenuates. Our findings provide evidence consistent with this point using newer, experimental data.” In other words, even if employers discriminate against “Precious Henderson” or “Tyrone Williams” because of the socioeconomic and cultural background their name indicates, it appears that they do not discriminate against “Morgan Jackson” or “Jordan Jenkins”.

_______ ~.::[༒]::.~ _______

[1] Yes; this study does leave open the possibility that race could be a significant variable even once other correlates are subtracted from it, since the study simply didn’t address this. But my purpose here is purely to address the Mullainathan study. And the Levitt & Fryer study most certainly does that. This post is not supposed to be an analysis of anything other than what I have clearly said it is supposed to be an analysis of: the soundness of this study, Wise’s integrity in handling criticism pertaining specifically to that particular study, and the ease with which such feeble data as this study actually contained can become unquestioningly transformed into so much more than it actually is throughout the media without anyone stopping to notice the obvious.

The Levitt & Fryer does, however, strongly suggest one thing that changes the analysis of the question of the impact of race on employment prospects when all else is held equal: to whatever extent employers in the real world discriminate against black applicants, they must discriminate about exactly as much against black applicants named “James” as they do against black applicants named “Jermaine”—or else about exactly as little. That can offer us a meaningful empirical basis for asking whether it’s more likely that employers discriminate against “James” significantly more, or “Jermaine” significantly less, on the basis of race (as opposed to any other number of factors that might correlate with race as well as one’s name) than we would have expected. In fact, since the Koedel, et al study referred to above investigated discrimination against black applicants with names like “Chloe Washington” (a simple Google search reveals several images of women named Chloe Washington, and all of them are black—the same thing goes for Ryan Washington, with the exception of a white reporter named Ryan at the Washington Post), we already have the answer to that question.