Bias in the language of justice

This is what we learned in three days.

4 min readJul 7, 2021

Authors: Abigail Atchison, Isaac Goldstein, Andrew Quirk

After every court case in the United States, judges write judicial opinions: documents that explain the decision reached and provide an analysis of the relevant laws and precedents. In a legal system based on upholding precedent, judicial opinions and the bias within them shape the application of law to our daily lives. Over three days, our team used computational linguistics to examine the bias in these documents.

We leveraged a popular Machine Learning technique called word embeddings¹ that seeks to find the semantic similarity between any given pair of words. This approach assigns a number to each word in a set of documents. That number represents its definition, inferred from the words surrounding it. From here, one can reveal how closely two words are associated by comparing their “distance” from each other. The “closer” two words are mathematically, the more similar they are semantically. For more information on how this works we recommend reading the paper¹ in the footnotes.

Using this method, we examined a set of judicial opinions from the Illinois Court of Appeals to quantify how those justices are writing about race² — what we found was shocking.

Figure I. (above) examines the semantic similarity between the words ‘white’ and ‘black’ and these six other words above that appear in the judicial opinions we examined.

In these legal documents, the words person, human, and innocent are more strongly associated with the word white, while the words dangerous, violent, and emotional are more strongly associated with the word black. The words used to describe an individual’s race have fundamentally different associations.

These initial observations prompted us to question whether or not the identity of the author also affects the bias in judicial texts. In short, we sought to investigate: does the race of the justice authoring the opinion affect their implicit associations? The answer is yes.

We split our original corpus of documents into two separate sets: opinions written by white justices and opinions written by Black justices. We then selected the 50 words most closely associated with race in each of these collections. Below we have displayed a subset of those words.

Table I. Shows selected words and their similarity to the word race in opinions authored by white justices. Order is the closeness to the word race out of all the words in the corpus. Similarity is the closeness metric calculated between each word and race.

Table II. Shows some selected words and their similarity to the word race in opinions authored by Black justices. Order and similarity hold the same meaning as Table I (above)

There seems to be a distinct difference in the words white versus Black justices associate with the concept of race. Why does this matter?

“[Opinions are] the function most emphasized among law students, law teachers and members of the bar, particularly as they study opinions in an attempt to ascertain ‘what the law is’ on one point or another”

In Some Observations Concerning Judicial Opinions³, Robert A. Leflar asserts “In a legal system built on stare decisis, the law-announcing function of opinions as precedents is constantly thought about. It is the function most emphasized among law students, law teachers and members of the bar, particularly as they study opinions in an attempt to ascertain ‘What the law is’ on one point or another.” In a system that values precedent, biased language in judicial opinions is not just preserved as an artifact, but is rather amplified across ongoing applications of these opinions.

The idea that racial bias permeates the American justice system is not novel. However, we believe that the ability to quantify this bias, in the aversive way that it can appear, is a key step along the path of change. Our exploration over three days only scratches the surface of how Machine Learning may reveal racism in the language of justice.

1 Tomas Mikolov, Kai Chen, Greg Corrado, and Jerey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.

2 For the purpose of this analysis we are looking at “white,” and “black,” despite the multi-dimensional and nuanced realities of race

3 Leflar, Robert A. “Some Observations Concerning Judicial Opinions.” Columbia Law Review 61.5 (1961): 810–820.

Bias in the language of justice

This is what we learned in three days.

Written by Isaac Goldstein