Data Feminism • Data Feminism

1. The Power Chapter

Published on: Jul 27, 2020
License: Creative Commons Attribution 4.0 International License (CC-BY 4.0)

Principle: Examine Power

Data feminism begins by analyzing how power operates in the world.


When tennis star Serena Williams disappeared from Instagram in early September 2017, her six million followers assumed they knew what had happened. Several months earlier, in March of that year, Williams had accidentally announced her pregnancy to the world via a bathing suit selfie and a caption that was hard to misinterpret: “20 weeks.” Now, they thought, her baby had finally arrived.

But then they waited, and waited some more. Two weeks later, Williams finally reappeared, announcing the birth of her daughter and inviting her followers to watch a video that welcomed Alexis Olympia Ohanian Jr. to the world.1 The video was a montage of baby bump pics interspersed with clips of a pregnant Williams playing tennis and having cute conversations with her husband, Reddit cofounder Alexis Ohanian, and then, finally, the shot that her fans had been waiting for: the first clip of baby Olympia. Williams was narrating: “So we’re leaving the hospital,” she explains. “It’s been a long time. We had a lot of complications. But look who we got!” The scene fades to white, and the video ends with a set of stats: Olympia’s date of birth, birth weight, and number of grand slam titles: 1. (Williams, as it turned out, was already eight weeks pregnant when she won the Australian Open earlier that year.)

Williams’s Instagram followers were, for the most part, enchanted. But soon, the enthusiastic congratulations were superseded by a very different conversation. A number of her followers—many of them Black women like Williams herself—fixated on the comment she’d made as she was heading home from the hospital with her baby girl. Those “complications” that Williams experienced—other women had had them too. In Williams’s case, the complications had been life-threatening, and her self-advocacy in the hospital played a major role in her survival.

On Williams’s Instagram feed, dozens of women began posting their own experiences of childbirth gone horribly wrong. A few months later, Williams returned to social media—Facebook, this time—to continue the conversation (figure 1.1). Citing a 2017 statement from the US Centers for Disease Control and Prevention (CDC), Williams wrote that “Black women are over 3 times more likely than white women to die from pregnancy- or childbirth-related causes.”2

These disparities were already well-known to Black-women-led reproductive justice groups like SisterSong, the Black Mamas Matter Alliance, and Raising Our Sisters Everywhere (ROSE), some of whom had been working on the maternal health crisis for decades. Williams helped to shine a national spotlight on them. The mainstream media also recently had begun to pay more attention to the crisis as well. A few months earlier, Nina Martin of the investigative journalism outfit ProPublica, working with Renee Montagne of NPR, had reported on the same phenomenon.3 “Nothing Protects Black Women from Dying in Pregnancy and Childbirth,” the headline read. In addition to the study cited by Williams, Martin and Montagne cited a second study from 2016, which showed that neither education nor income level—the factors usually invoked when attempting to account for healthcare outcomes that diverge along racial lines—impacted the fates of Black women giving birth.4 On the contrary, the data showed that Black women with college degrees suffered more severe complications of pregnancy and childbirth than white women without high school diplomas.

A screenshot of a facebook post from Serena Williams on January 15, 2018, with the following caption:

“I didn’t expect that sharing our family’s story of Olympia’s birth and all of complications after giving birth would start such an outpouring of discussion from women — especially black women — who have faced similar complications and women whose problems go unaddressed. 

These aren’t just stories: according to the CDC, (Center for Disease Control) black women are over 3 times more likely than White women to die from pregnancy- or childbirth-related causes. We have a lot of work to do as a nation and I hope my story can inspire a conversation that gets us to close this gap.

Let me be clear: EVERY mother, regardless of race, or background deserves to have a healthy pregnancy and childbirth. I personally want all women of all colors to have the best experience they can have. My personal experience was not great but it was MY experience and I'm happy it happened to me. It made me stronger and it made me appreciate women -- both women with and without kids -- even more. We are powerful!!! 

I want to thank all of you who have opened up through online comments and other platforms to tell your story. I encourage you to continue to tell those stories. This helps. We can help others. Our voices are our power.”

Figure 1.1: A Facebook post by Serena Williams responding to her Instagram followers who had shared their stories of pregnancy and childbirth-related complications with her. Image from Serena Williams, January 15, 2018. Source: https://www.facebook.com/SerenaWilliams/videos/10156086135726834/. Credit: Serena Williams/Facebook.

So what were these complications, more precisely? And how many women had actually died as a result? Nobody was counting. A 2014 United Nations report, coauthored by SisterSong, described the state of data collection on maternal mortality in the United States as “particularly weak.”5 The situation hadn’t improved in 2017, when ProPublica began its reporting. In 2018, USA Today investigated these racial disparities, and found what was an even more fundamental problem: there was still no national system for tracking complications sustained in pregnancy and childbirth, even though similar systems had long been in place for tracking any number of other health issues, such as teen pregnancy, hip replacements, or heart attacks.6 They also found that there was still no reporting mechanism for ensuring that hospitals follow national safety standards, as is required for both hip surgery and cardiac care. “Our maternal data is embarrassing,” stated Stacie Geller, a professor of obstetrics and gynecology at the University of Illinois, when asked for comment. The chief of the CDC’s Maternal and Infant Health branch, William Callaghan, makes the significance of this “embarrassing” data more clear: “What we choose to measure is a statement of what we value in health,” he explains.7 We might edit his statement to add that it’s a measure of who we value in health, too.8

Why did it take the near-death of an international sports superstar for the media to begin paying attention to an issue that less famous Black women had been experiencing and organizing around for decades? Why did it take reporting by the predominantly white mainstream press for US cities and states to begin collecting data on the issue?9 Why are those data still not viewed as big enough, statistically significant enough, or of high enough quality for those cities and states, and other public institutions, to justify taking action? And why didn’t those institutions just #believeblackwomen in the first place?10

The answers to these questions are directly connected to larger issues of power and privilege. Williams recognized as much when asked by Glamour magazine about the fact that she had to demand that her medical team perform additional tests in order to diagnose her own postnatal complications—and because she was Serena Williams, twenty-three-time grand slam champion, they complied.11 “If I wasn’t who I am, it could have been me,” she told Glamour, referring to the fact that the privilege she experienced as a tennis star intersected with the oppression she experienced as a Black woman, enabling her to avoid becoming a statistic herself. As Williams asserted, “that’s not fair.”12

Needless to say, Williams is right. It’s absolutely not fair. So how do we mitigate this unfairness? We begin by examining systems of power and how they intersect—like how the influences of racism, sexism, and celebrity came together first to send Williams into a medical crisis and then, thankfully, to keep her alive. The complexity of these intersections is the reason that examine power is the first principle of data feminism, and the focus of this chapter. Examining power means naming and explaining the forces of oppression that are so baked into our daily lives—and into our datasets, our databases, and our algorithms—that we often don’t even see them. Seeing oppression is especially hard for those of us who occupy positions of privilege. But once we identify these forces and begin to understand how they exert their potent force, then many of the additional principles of data feminism—like challenging power (chapter 2), embracing emotion (chapter 3), and making labor visible (chapter 7)—become easier to undertake.

Power and the Matrix of Domination

But first, what do we mean by power? We use the term power to describe the current configuration of structural privilege and structural oppression, in which some groups experience unearned advantages—because various systems have been designed by people like them and work for people them—and other groups experience systematic disadvantages—because those same systems were not designed by them or with people like them in mind. These mechanisms are complicated, and there are “few pure victims and oppressors,” notes influential sociologist Patricia Hill Collins. In her landmark text, Black Feminist Thought, first published in 1990, Collins proposes the concept of the matrix of domination to explain how systems of power are configured and experienced.13 It consists of four domains: the structural, the disciplinary, the hegemonic, and the interpersonal. Her emphasis is on the intersection of gender and race, but she makes clear that other dimensions of identity (sexuality, geography, ability, etc.) also result in unjust oppression, or unearned privilege, that become apparent across the same four domains.

The structural domain is the arena of laws and policies, along with schools and institutions that implement them. This domain organizes and codifies oppression. Take, for example, the history of voting rights in the United States. The US Constitution did not originally specify who was authorized to vote, so various states had different policies that reflected their local politics. Most had to do with owning property, which, conveniently, only men could do. But with the passage of the Fourteenth Amendment in 1868, which granted the rights of US citizenship to those who had been enslaved, the nature of those rights—including voting—were required to be spelled out at the national level for the first time. More specifically, voting was defined as a right reserved for “male citizens.” This is a clear instance of codified oppression in the structural domain.


Table 1.1: The four domains of the matrix of domination14

Structural domain

Organizes oppression: laws and policies.

Disciplinary domain

Administers and manages oppression. Implements and enforces laws and policies.

Hegemonic domain

Circulates oppressive ideas: culture and media.

Interpersonal domain

Individual experiences of oppression.


It would take until the passage of the Nineteenth Amendment in 1920 for most (but not all) women to be granted the right to vote.15 Even still, many state voting laws continued to include literacy tests, residency requirements, and other ways to indirectly exclude people who were not property-owning white men. These restrictions persist today, in the form of practices like dropping names from voter rolls, requiring photo IDs, and limits to early voting—the burdens of which are felt disproportionately by low-income people, people of color, and others who lack the time or resources to jump through these additional bureaucratic hoops.16 This is the disciplinary domain that Collins names: the domain that administers and manages oppression through bureaucracy and hierarchy, rather than through laws that explicitly encode inequality on the basis of someone’s identity.17

Neither of these domains would be possible without the hegemonic domain, which deals with the realm of culture, media, and ideas. Discriminatory policies and practices in voting can only be enacted in a world that already circulates oppressive ideas about, for example, who counts as a citizen in the first place. Consider an anti-suffragist pamphlet from the 1910s that proclaims, “You do not need a ballot to clean out your sink spout.”18 Pamphlets like these, designed to be literally passed from hand to hand, reinforced preexisting societal views about the place of women in society. Today, we have animated GIFs instead of paper pamphlets, but the hegemonic function is the same: to consolidate ideas about who is entitled to exercise power and who is not.

The final part of the matrix of domination is the interpersonal domain, which influences the everyday experience of individuals in the world. How would you feel if you were a woman who read that pamphlet, for example? Would it have more or less of an impact if a male family member gave it to you? Or, for a more recent example, how would you feel if you took time off from your hourly job to go cast your vote, only to discover when you got there that your name had been purged from the official voting roll or that there was a line so long that it would require that you miss half a day’s pay, or stand for hours in the cold, or ... the list could go on. These are examples of how it feels to know that systems of power are not on your side and, at times, are actively seeking to take away the small amount of power that you do possess.19

The matrix of domination works to uphold the undue privilege of dominant groups while unfairly oppressing minoritized groups. What does this mean? Beginning in this chapter and continuing throughout the book, we use the term minoritized to describe groups of people who are positioned in opposition to a more powerful social group. While the term minority describes a social group that is comprised of fewer people, minoritized indicates that a social group is actively devalued and oppressed by a dominant group, one that holds more economic, social, and political power. With respect to gender, for example, men constitute the dominant group, while all other genders constitute minoritized groups. This remains true even as women actually constitute a majority of the world population. Sexism is the term that names this form of oppression. In relation to race, white people constitute the dominant group (racism); in relation to class, wealthy and educated people constitute the dominant group (classism); and so on.20

Using the concept of the matrix of domination and the distinction between dominant and minoritized groups, we can begin to examine how power unfolds in and around data. This often means asking uncomfortable questions: who is doing the work of data science (and who is not)? Whose goals are prioritized in data science (and whose are not)? And who benefits from data science (and who is either overlooked or actively harmed)?21 These questions are uncomfortable because they unmask the inconvenient truth that there are groups of people who are disproportionately benefitting from data science, and there are groups of people who are disproportionately harmed. Asking these who questions allows us, as data scientists ourselves, to start to see how privilege is baked into our data practices and our data products.22

Data Science by Whom?

It is important to acknowledge the elephant in the server room: the demographics of data science (and related occupations like software engineering and artificial intelligence research) do not represent the population as a whole. According to the most recent data from the US Bureau of Labor Statistics, released in 2018, only 26 percent of those in “computer and mathematical occupations” are women.23 And across all of those women, only 12 percent are Black or Latinx women, even though Black and Latinx women make up 22.5 percent of the US population.24 A report by the research group AI Now about the diversity crisis in artificial intelligence notes that women comprise only 15 percent of AI research staff at Facebook and 10 percent at Google.25 These numbers are probably not a surprise. The more surprising thing is that those numbers are getting worse, not better. According to a research report published by the American Association of University Women in 2015, women computer science graduates in the United States peaked in the mid-1980s at 37 percent, and we have seen a steady decline in the years since then to 26 percent today (figure 1.2).26 As “data analysts” (low-status number crunchers) have become rebranded as “data scientists” (high status researchers), women are being pushed out in order to make room for more highly valued and more highly compensated men.27

A graphical representation of the proportion of men and women awarded computer science (CS) degrees in the U.S. from 1970 to 2010. The horizontal axis lists all the years from 1970 to 2010, increasing in 5-year increments, and the vertical axis shows the percentage and the title of the graph reads “Computer Science, The Man Factory.”

In the graph, there is a line graph showing the percentage of men who were awarded CS degrees. Below this line, the graph is shaded grey which represents the proportion of men and above the line, the graph is shaded light purple, which represents the proportion of women. The ratio starts at around 85% men / 15% women in 1970, then the share of women increases to 63% men / 37% women in 1984 (At this point, there is a caption which reads “Women received 37% of CS degrees in 1984, the closest we have come to gender parity”), and then that share decreases back to around 80% men / 20% women in 2010. Throughout the entire timeline, the amount of men awarded CS degrees is disproportionately larger than the amount of women.

Figure 1.2: Computer science has always been dominated by men and the situation is worsening (even while many other scientific and technical fields have made significant strides toward gender parity). Women awarded bachelor’s degrees in computer science in the United States peaked in the mid-1980s at 37 percent, and we have seen a steady increase in the ratio of men to women in the years since then. This particular report treated gender as a binary, so there was no data about nonbinary people. Graphic by Catherine D’Ignazio. Data from the National Center for Education Statistics. Source: Data from Christianne Corbett and Catherine Hill, Solving the Equation: The Variables for Women’s Success in Engineering and Computing (Washington, DC: American Association of University Women, 2015). Credit: Graphic by Catherine D’Ignazio.

There are not disparities only along gender lines in the higher education pipeline. The same report noted specific underrepresentation for Native American women, multiracial women, white women, and all Black and Latinx people. So is it really a surprise that each day brings a new example of data science being used to disempower and oppress minoritized groups? In 2018, it was revealed that Amazon had been developing an algorithm to screen its first-round job applicants. But because the model had been trained on the resumes of prior applicants, who were predominantly male, it developed an even stronger preference for male applicants. It downgraded resumes with the word women and graduates of women’s colleges. Ultimately, Amazon had to cancel the project.28 This example reinforces the work of Safiya Umoja Noble, whose book, Algorithms of Oppression, has shown how both gender and racial biases are encoded into some of the most pervasive data-driven systems—including Google search, which boasts over five billion unique web searches per day. Noble describes how, as recently as 2016, comparable searches for “three Black teenagers” and “three white teenagers” turned up wildly different representations of those teens. The former returned mugshots, while the latter returned wholesome stock photography.29

The problems of gender and racial bias in our information systems are complex, but some of their key causes are plain as day: the data that shape them, and the models designed to put those data to use, are created by small groups of people and then scaled up to users around the globe. But those small groups are not at all representative of the globe as a whole, nor even of a single city in the United States. When data teams are primarily composed of people from dominant groups, those perspectives come to exert outsized influence on the decisions being made—to the exclusion of other identities and perspectives. This is not usually intentional; it comes from the ignorance of being on top. We describe this deficiency as a privilege hazard.

How does this come to pass? Let’s take a minute to imagine what life is like for someone who epitomizes the dominant group in data science: a straight, white, cisgender man with formal technical credentials who lives in the United States. When he looks for a home or applies for a credit card, people are eager for his business. People smile when he holds his girlfriend’s hand in public. His body doesn’t change due to childbirth or breastfeeding, so he does not need to think about workplace accommodations. He presents his social security number in jobs as a formality, but it never hinders his application from being processed or brings him unwanted attention. The ease with which he traverses the world is invisible to him because it has been designed for people just like him. He does not think about how life might be different for everyone else. In fact, it is difficult for him to imagine that at all.

This is the privilege hazard: the phenomenon that makes those who occupy the most privileged positions among us—those with good educations, respected credentials, and professional accolades—so poorly equipped to recognize instances of oppression in the world.30 They lack what Anita Gurumurthy, executive director of IT for Change, has called “the empiricism of lived experience.”31 And this lack of lived experience—this evidence of how things truly are—profoundly limits their ability to foresee and prevent harm, to identify existing problems in the world, and to imagine possible solutions.

The privilege hazard occurs at the level of the individual—in the interpersonal domain of the matrix of domination—but it is much more harmful in aggregate because it reaches the hegemonic, disciplinary and structural domains as well. So it matters deeply that data science and artificial intelligence are dominated by elite white men because it means there is a collective privilege hazard so great that it would be a profound surprise if they could actually identify instances of bias prior to unleashing them onto the world. Social scientist Kate Crawford has advanced the idea that the biggest threat from artificial intelligence systems is not that they will become smarter than humans, but rather that they will hard-code sexism, racism, and other forms of discrimination into the digital infrastructure of our societies.32

What’s more, the same cis het white men responsible for designing those systems lack the ability to detect harms and biases in their systems once they’ve been released into the world.33 In the case of the “three teenagers” Google searches, for example, it was a young Black teenager that pointed out the problem and a Black scholar who wrote about the problem. The burden consistently falls upon those more intimately familiar with the privilege hazard—in data science as in life—to call out the creators of those systems for their limitations.

For example, Joy Buolamwini, a Ghanaian-American graduate student at MIT, was working on a class project using facial-analysis software.34 But there was a problem—the software couldn’t “see” Buolamwini’s dark-skinned face (where “seeing” means that it detected a face in the image, like when a phone camera draws a square around a person’s face in the frame). It had no problem seeing her lighter-skinned collaborators. She tried drawing a face on her hand and putting it in front of the camera; it detected that. Finally, Buolamwini put on a white mask, essentially going in “whiteface” (figure 1.3).35 The system detected the mask’s facial features perfectly.

Digging deeper into the code and benchmarking data behind these systems, Buolamwini discovered that the dataset on which many of facial-recognition algorithms are tested contains 78 percent male faces and 84 percent white faces. When she did an intersectional breakdown of another test dataset—looking at gender and skin type together—only 4 percent of the faces in that dataset were women and dark-skinned. In their evaluation of three commercial systems, Buolamwini and computer scientist Timnit Gebru showed that darker-skinned women were up to forty-four times more likely to be misclassified than lighter-skinned males.36 It’s no wonder that the software failed to detect Buolamwini’s face: both the training data and the benchmarking data relegate women of color to a tiny fraction of the overall dataset.37

Photograph of Joy Buolamwini, a Black woman, in front of a laptop, wearing a white theater mask.

Figure 1.3: Joy Buolamwini found that she had to put on a white mask for the facial detection program to “see” her face. Buolamwini is now founder of the Algorithmic Justice League. Courtesy of Joy Buolamwini. Credit: Courtesy of Joy Buolamwini.

This is the privilege hazard in action—that no coder, tester, or user of the software had previously identified such a problem or even thought to look. Buolamwini’s work has been widely covered by the national media (by the New York Times, by CNN, by the Economist, by Bloomberg BusinessWeek, and others) in articles that typically contain a hint of shock.38 This is a testament to the social, political, and technical importance of the work, as well as to how those in positions of power—not just in the field of data science, but in the mainstream media, in elected government, and at the heads of corporations—are so often surprised to learn that their “intelligent technologies” are not so intelligent after all. (They need to read data journalist Meredith Broussard’s book Artificial Unintelligence).39 For another example, think back to the introduction of this book, where we quoted Shetterly as reporting that Christine Darden’s white male manager was “shocked at the disparity” between the promotion rates of men and women. We can speculate that Darden herself wasn’t shocked, just as Buolamwini and Gebru likely were not entirely shocked at the outcome of their study either. When sexism, racism, and other forms of oppression are publicly unmasked, it is almost never surprising to those who experience them.

For people in positions of power and privilege, issues of race and gender and class and ability—to name only a few—are OPP: other people’s problems. Author and antiracist educator Robin DiAngelo describes instances like the “shock” of Darden’s boss or the surprise in the media coverage of Buolamwini’s various projects as a symptom of the “racial innocence” of white people.40 In other words, those who occupy positions of privilege in society are able to remain innocent of that privilege. Race becomes something that only people of color have. Gender becomes something that only women and nonbinary people have. Sexual orientation becomes something that all people except heterosexual people have. And so on. A personal anecdote might help illustrate this point. When we published the first draft of this book online, Catherine told a colleague about it. His earnestly enthusiastic response was, “Oh great! I’ll show it to my female graduate students!” To which Catherine rejoined, “You might want to show it to your other students, too.”

If things were different—if the 79 percent of engineers at Google who are male were specifically trained in structural oppression before building their data systems (as social workers are before they undertake social work)—then their overrepresentation might be very slightly less of a problem.41 But in the meantime, the onus falls on the individuals who already feel the adverse effects of those systems of power to prove, over and over again, that racism and sexism exist—in datasets, in data systems, and in data science, as in everywhere else.

Buolamwini and Gebru identified how pale and male faces were overrepresented in facial detection training data. Could we just fix this problem by diversifying the data set? One solution to the problem would appear to be straightforward: create a more representative set of training and benchmarking data for facial detection models. In fact, tech companies are starting to do exactly this. In January 2019, IBM released a database of one million faces called Diversity in Faces (DiF).42 In another example, journalist Amy Hawkins details how CloudWalk, a startup in China in need of more images of faces of people of African descent, signed a deal with the Zimbabwean government for it to provide the images the company was lacking.43 In return for sharing its data, Zimbabwe will receive a national facial database and “smart” surveillance infrastructure that it can install in airports, railways, and bus stations.

It might sound like an even exchange, but Zimbabwe has a dismal record on human rights. Making things worse, CloudWalk provides facial recognition technologies to the Chinese police—a conflict of interest so great that the global nonprofit Human Rights Watch voiced its concern about the deal.44 Face harvesting is happening in the US as well. Researchers Os Keyes, Nikki Stevens and Jacqueline Wernimont have shown how immigrants, abused children, and dead people are some of the groups whose faces have been used to train software—without their consent.45 So is a diverse database of faces really a good idea? Voicing his concerns in response to the announcement of Buolamwini and Gebru’s 2018 study on Twitter, an Indigenous Marine veteran shot back, “I hope facial recognition software has a problem identifying my face too. That’d come in handy when the police come rolling around with their facial recognition truck at peaceful demonstrations of dissent, cataloging all dissenters for ‘safety and security.’”46

Better detection of faces of color cannot be characterized as an unqualified good. More often than not, it is enlisted in the service of increased oppression, greater surveillance, and targeted violence. Buolamwini understands these potential harms and has developed an approach that works across all four domains of the matrix of domination to address the underlying issues of power that are playing out in facial analysis technology. Buolamwini and Gebru first quantified the disparities in the dataset—a technical audit, which falls in the disciplinary domain of the matrix of domination. Then, Buolamwini went on to launch the Algorithmic Justice League, an organization that works to highlight and intervene in instances of algorithmic bias. On behalf of the AJL, Buolamwini has produced viral poetry projects and given TED talks—taking action in the hegemonic domain, the realm of culture and ideas. She has advised on legislation and professional standards for the field of computer vision and called for a moratorium on facial analysis in policing on national media and in Congress.47 These are actions operating in the structural domain of the matrix of domination—the realm of law and policy. Throughout these efforts, the AJL works with students and researchers to help guide and shape their own work—the interpersonal domain. Taken together, Buolamwini’s various initiatives demonstrate how any “solution” to bias in algorithms and datasets must tackle more than technical limitations. In addition, they present a compelling model for the data scientist as public intellectual—who, yes, works on technical audits and fixes, but also works on cultural, legal, and political efforts too.

While equitable representation—in datasets and data science workforces—is important, it remains window dressing if we don’t also transform the institutions that produce and reproduce those biased outcomes in the first place. As doctoral health student Arrianna Planey, quoting Robert M. Young, states, “A racist society will give you a racist science.”48 We cannot filter out the downstream effects of sexism and racism without also addressing their root cause.

Data Science for Whom?

One of the downstream effects of the privilege hazard—the risks incurred when people from dominant groups create most of our data products—is not only that datasets are biased or unrepresentative, but that they never get collected at all. Mimi Onuoha—an artist, designer, and educator—has long been asking who questions about data science. Her project, The Library of Missing Datasets (figure 1.4), is a list of datasets that one might expect to already exist in the world, because they help to address pressing social issues, but that in reality have never been created. The project exists as a website and as an art object. The latter consists of a file cabinet filled with folders labeled with phrases like: “People excluded from public housing because of criminal records,” “Mobility for older adults with physical disabilities or cognitive impairments,” and “Total number of local and state police departments using stingray phone trackers (IMSI-catchers).” Visitors can tab through the folders and remove any particular folder of interest, only to reveal that it is empty. They all are. The datasets that should be there are “missing.”

Photograph of a Black woman's hands sifting through a white file cabinet of empty folders from The Library of Missing Datasets. Each folder is labeled with a dataset for which data doesn’t currently exist.

Figure 1.4: The Library of Missing Datasets, by Mimi Onuoha (2016) is a list of datasets that are not collected because of bias, lack of social and political will, and structural disregard. Courtesy of Mimi Onuoha. Photo by Brandon Schulman. Credit: Photo by Brandon Schulman

By compiling a list of the datasets that are missing from our “otherwise data-saturated” world, Onuoha explains, “we find cultural and colloquial hints of what is deemed important” and what is not. “Spots that we’ve left blank reveal our hidden social biases and indifferences,” she continues. And by calling attention to these datasets as “missing,” she also calls attention to how the matrix of domination encodes these “social biases and indifferences” across all levels of society.49 Along similar lines, foundations like Data2X and books like Invisible Women have advanced the idea of a systematic “gender data gap” due to the fact that the majority of research data in scientific studies is based around men’s bodies. The downstream effects of the gender data gap range from annoying—cell phones slightly too large for women’s hands, for example—to fatal. Until recently, crash test dummies were designed in the size and shape of men, an oversight that meant that women had a 47 percent higher chance of car injury than men.50

The who question in this case is: Who benefits from data science and who is overlooked? Examining those gaps can sometimes mean calling out missing datasets, as Onuoha does; characterizing them, as Invisible Women does; and advocating for filling them, as Data2X does. At other times, it can mean collecting the missing data yourself. Lacking comprehensive data about women who die in childbirth, for example, ProPublica decided to resort to crowdsourcing to learn the names of the estimated seven hundred to nine hundred US women who died in 2016.51 As of 2019, they’ve identified only 140. Or, for another example: in 1998, youth living in Roxbury—a neighborhood known as “the heart of Black culture in Boston”52—were sick and tired of inhaling polluted air. They led a march demanding clean air and better data collection, which led to the creation of the AirBeat community monitoring project.53

Scholars have proposed various names for these instances of ground-up data collection, including counterdata or agonistic data collection, data activism, statactivism, and citizen science (when in the service of environmental justice).54 Whatever it’s called, it’s been going on for a long time. In 1895, civil rights activist and pioneering data journalist Ida B. Wells assembled a set of statistics on the epidemic of lynching that was sweeping the United States.55 She accompanied her data with a meticulous exposé of the fraudulent claims made by white people—typically, that a rape, theft, or assault of some kind had occurred (which it hadn’t in most cases) and that lynching was a justified response. Today, an organization named after Wells—the Ida B. Wells Society for Investigative Reporting—continues her mission by training up a new generation of journalists of color in the skills of data collection and analysis.56

A counterdata initiative in the spirit of Wells is taking place just south of the US border, in Mexico, where a single woman is compiling a comprehensive dataset on femicides—gender-related killings of women and girls.57 María Salguero, who also goes by the name Princesa, has logged more than five thousand cases of femicide since 2016.58 Her work provides the most accessible information on the subject for journalists, activists, and victims’ families seeking justice.

The issue of femicide in Mexico rose to global visibility in the mid-2000s with widespread media coverage about the deaths of poor and working-class women in Ciudad Juárez. A border town, Juárez is the site of more than three hundred maquiladoras: factories that employ women to assemble goods and electronics, often for low wages and in substandard working conditions. Between 1993 and 2005, nearly four hundred of these women were murdered, with around a third of those murders exhibiting signs of exceptional brutality or sexual violence. Convictions were made in only three of those deaths. In response, a number of activist groups like Ni Una Más (Not One More) and Nuestras Hijas de Regreso a Casa (Our Daughters Back Home) were formed, largely motivated by mothers demanding justice for their daughters, often at great personal risk to themselves.59

These groups succeeded in gaining the attention of the Mexican government, which established a Special Commission on Femicide. But despite the commission and the fourteen volumes of information about femicide that it produced, and despite a 2009 ruling against the Mexican state by the Inter-American Human Rights Court, and despite a United Nations Symposium on Femicide in 2012, and despite the fact that sixteen Latin American countries have now passed laws defining femicide—despite all of this, deaths in Juárez have continued to rise.60 In 2009 a report pointed out that one of the reasons that the issue had yet to be sufficiently addressed was the lack of data.61 Needless to say, the problem remains.

How might we explain the missing data around femicides in relation to the four domains of power that constitute Collins’s matrix of domination? As is true in so many cases of data collected (or not) about women and other minoritized groups, the collection environment is compromised by imbalances of power.

The most grave and urgent manifestation of the matrix of domination is within the interpersonal domain, in which cis and trans women become the victims of violence and murder at the hands of men. Although law and policy (the structural domain) have recognized the crime of femicide, no specific policies have been implemented to ensure adequate information collection, either by federal agencies or local authorities. Thus the disciplinary domain, in which law and policy are enacted, is characterized by a deferral of responsibility, a failure to investigate, and victim blaming. This persists in a somewhat recursive fashion because there are no consequences imposed within the structural domain. For example, the Special Commission’s definition of femicide as a “crime of the state” speaks volumes to how the government of Mexico is deeply complicit through inattention and indifference.62

Of course, this inaction would not have been tolerated without the assistance of the hegemonic domain—the realm of media and culture—which presents men as strong and women as subservient, men as public and women as private, trans people as deviating from “essential” norms, and nonbinary people as nonexistent altogether. Indeed, government agencies have used their public platforms to blame victims. Following the femicide of twenty-two-year-old Mexican student Lesvy Osorio in 2017, researcher Maria Rodriguez-Dominguez documented how the Public Prosecutor’s Office of Mexico City shared on social media that the victim was an alcoholic and drug user who had been living out of wedlock with her boyfriend.63 This led to justified public backlash, and to the hashtag #SiMeMatan (If they kill me), which prompted sarcastic tweets such as “#SiMeMatan it’s because I liked to go out at night and drink a lot of beer.”64

It is into this data collection environment, characterized by extremely asymmetrical power relations, that María Salguero has inserted her femicides map. Salguero manually plots a pin on the map for every femicide that she collects through media reports or through crowdsourced contributions (figure 1.5a). One of her goals is to “show that these victims [each] had a name and that they had a life,” and so Salguero logs as many details as she can about each death. These include name, age, relationship with the perpetrator, mode and place of death, and whether the victim was transgender, as well as the full content of the news report that served as the source. Figure 1.5b shows a detailed view for a single report from an unidentified transfemicide, including the date, time, location, and media article about the killing. It can take Salguero three to four hours a day to do this unpaid work. She takes occasional breaks to preserve her mental health, and she typically has a backlog of a month’s worth of femicides to add to the map.

Although media reportage and crowdsourcing are imperfect ways of collecting data, this particular map, created and maintained by a single person, fills a vacuum created by her national government. The map has been used to help find missing women, and Salguero herself has testified before Mexico’s Congress about the scope of the problem. Salguero is not affiliated with an activist group, but she makes her data available to activist groups for their efforts. Parents of victims have called her to give their thanks for making their daughters visible, and Salguero affirms this function as well: “This map seeks to make visible the sites where they are killing us, to find patterns, to bolster arguments about the problem, to georeference aid, to promote prevention and try to avoid femicides.”

A map of Mexico with colored markers to represent locations where femicides have occurred. The color of the marker corresponds to the year in which the femicide occurred: red for 2016, purple for 2017, and light blue for 2018. There is an immense concentration of femicides near southern Mexico, and they become less concentrated further away.
A zoomed in version of the femicide map over Ciudad Juarez, a Mexican city just south of El Paso. A purple marker (representing a femicide of a trans woman from 2017) is selected and a description box to the right of the map contains information about the attack, including its date & time, its location, and a brief description. The description box reads the following: 

Nombre (Incident Title)
#Transfeminicidio Identidad Reservada

Fecha (Date)
15/08/2017

Lugar (Place)
Pedro Meneses Hoyos, Ciudad Juárez, Chihuahua, 32730 México

Hechos (Description)
MARTES 15 DE AGOSTO DE 2017 | POR EDITOR 12
Juárez, Chih.- Un individuo que aparentemente pertenecía a la comunidad LGBT fue localizado sin vida por la noche en un fraccionamiento ubicado al sur oriente de la ciudad, reportaron las corporaciones policicas. 
El cuerpo del hombre vestido de mujer y en avanzado estado de descomposición fue encontrado en el fondo de un pozo de contención de aguas pluviales. 
El occiso tení una bolsa de plástico en la cabeza, aunque personal de la Fiscalí General del estado asegura no le pudieron encontrar huellas externas de violencia. 
Al lugar de los hechos llegaron sus familiares y lo identificaron como Hilario Lopez Ruiz, de quien no se proporcionó más información. 
El cuerpo fue enviado al Servicio Médico Forense donde se le practicara la autopsia de ley y determinar de esa manera las causas reales de su fallecimiento.
Latitude
31.680782

Longitude
-106.414466

Figure 1.5: María Salguero’s map of femicides in Mexico (2016–present) can be found at https://feminicidiosmx.crowdmap.com/. (a) Map extent showing the whole country. (b) A detailed view of Ciudad Juárez with a focus on a single report of an anonymous transfemicide. Salguero crowdsources points on the map based on reports in the press and reports from citizens to her. Courtesy of María Salguero. (a) Source: https://feminicidiosmx.crowdmap.com/. (b) Source: https://www.google.com/maps/d/u/0/viewer?mid=174IjBzP-fl_6wpRHg5pkGSj2egE&ll=21.347609098250942%2C-102.05467709375&z=5. Credit: María Salguero.

It is important to make clear that the example of missing data about femicides in Mexico is not an isolated case, either in terms of subject matter or geographic location. The phenomenon of missing data is a regular and expected outcome in all societies characterized by unequal power relations, in which a gendered, racialized order is maintained through willful disregard, deferral of responsibility, and organized neglect for data and statistics about those minoritized bodies who do not hold power. So too are examples of individuals and communities using strategies like Salguero’s to fill in the gaps left by these missing datasets—in the United States as around the world.65 If “quantification is representation,” as data journalist Jonathan Stray asserts, then this offers one way to hold those in power accountable. Collecting counterdata demonstrates how data science can be enlisted on behalf of individuals and communities that need more power on their side.66

Data Science with Whose Interests and Goals?

Far too often, the problem is not that data about minoritized groups are missing but the reverse: the databases and data systems of powerful institutions are built on the excessive surveillance of minoritized groups. This results in women, people of color, and poor people, among others, being overrepresented in the data that these systems are premised upon. In Automating Inequality, for example, Virginia Eubanks tells the story of the Allegheny County Office of Children, Youth, and Families in western Pennsylvania, which employs an algorithmic model to predict the risk of child abuse in any particular home.67 The goal of the model is to remove children from potentially abusive households before it happens; this would appear to be a very worthy goal. As Eubanks shows, however, inequities result. For wealthier parents, who can more easily access private health care and mental health services, there is simply not that much data to pull into the model. For poor parents, who more often rely on public resources, the system scoops up records from child welfare services, drug and alcohol treatment programs, mental health services, Medicaid histories, and more. Because there are far more data about poor parents, they are oversampled in the model, and so their children are overtargeted as being at risk for child abuse—a risk that results in children being removed from their families and homes. Eubanks argues that the model “confuse[s] parenting while poor with poor parenting.”

This model, like many, was designed under two flawed assumptions: (1) that more data is always better and (2) that the data are a neutral input. In practice, however, the reality is quite different. The higher proportion of poor parents in the database, with more complete data profiles, the more likely the model will be to find fault with poor parents. And data are never neutral; they are always the biased output of unequal social, historical, and economic conditions: this is the matrix of domination once again.68 Governments can and do use biased data to marshal the power of the matrix of domination in ways that amplify its effects on the least powerful in society. In this case, the model becomes a way to administer and manage classism in the disciplinary domain—with the consequence that poor parents’ attempts to access resources and improve their lives, when compiled as data, become the same data that remove their children from their care.

So this raises our next who question: Whose goals are prioritized in data science (and whose are not)? In this case, the state of Pennsylvania prioritized its bureaucratic goal of efficiency, which is an oft-cited reason for coming up with a technical solution to a social and political dilemma. Viewed from the perspective of the state, there were simply not enough employees to handle all of the potential child abuse cases, so it needed a mechanism for efficiently deploying limited staff—or so the reasoning goes. This is what Eubanks has described as a scarcity bias: the idea that there are not enough resources for everyone so we should think small and allow technology to fill the gaps. Such thinking, and the technological “solutions” that result, often meet the goals of their creators—in this case, the Allegheny County Office of Children, Youth, and Families—but not the goals of the children and families that it purports to serve.

Corporations also place their own goals ahead of those of the people their products purport to serve, supported by their outsize wealth and the power that comes with it. For example, in 2012, the New York Times published an explosive article by Charles Duhigg, “How Companies Learn Your Secrets,”69 which soon became the stuff of legend in data and privacy circles. Duhigg describes how Andrew Pole, a data scientist working at Target, was approached by men from the marketing department who asked, “If we wanted to figure out if a customer is pregnant, even if she didn’t want us to know, can you do that?”70 He proceeded to synthesize customers’ purchasing histories with the timeline of those purchases to give each customer a so-called pregnancy prediction score (figure 1.6).71 Evidently, pregnancy is the second major life event, after leaving for college, that determines whether a casual shopper will become a customer for life.

Target turned around and put Pole’s pregnancy detection model into action in an automated system that sent discount coupons to possibly pregnant customers. Win-win—or so the company thought, until a Minneapolis teenager’s dad saw the coupons for baby clothes that she was getting in the mail and marched into his local Target to read the manager the riot act. Why was his daughter getting coupons for pregnant women when she was only a teen?!

It turned out that the young woman was indeed pregnant. Pole’s model informed Target before the teenager informed her family. By analyzing the purchase dates of approximately twenty-five common products, such as unscented lotion and large bags of cotton balls, the model found a set of purchase patterns that were highly correlated with pregnancy status and expected due date. But the win-win quickly became a lose-lose, as Target lost the trust of its customers in a PR disaster and the Minneapolis teenager lost far worse: her control over information related to her own body and her health.

A screenshot from statistician Andrew Pole's presentation at Predictive Analytics World about Target's pregnancy detection model in October 2010. 
The powerpoint slide reads the following: 
Acquire and convert prenatal mothers before they have their baby 

Analytics
Develop a model to predict if a woman is likely to be pregnant with child

Data for analysis
Date of purchase and sales of key baby items in store or online, baby registrant, browse for baby products online, guest age, and children

Result
Identified 30% more guests to contact with profitable acquisition mailer

Figure 1.6: Screenshot from a video of statistician Andrew Pole’s presentation at Predictive Analytics World about Target’s pregnancy detection model in October 2010, titled “How Target Gets the Most out of Its Guest Data to Improve Marketing ROI.” He discusses the model at 47:50. Image by Andrew Pole for Predictive Analytics World. Source: Andrew Pole, “How Target Gets the Most out of Its Guest Data to Improve Marketing ROI,” filmed October 2010 at Predictive Analytics World, video, 47:50, https://www.predictiveanalyticsworld.com/patimes/how-target-gets-the-most-out-of-its-guest-data-to-improve-marketing-roi/6815/.

This story has been told many times: first by Pole, the statistician; then by Duhigg, the New York Times journalist; then by many other commentators on personal privacy and corporate overreach. But it is not only a story about privacy: it is also a story about gender injustice—about how corporations approach data relating to women’s bodies and lives, and about how corporations approach data relating to minoritized populations more generally. Whose goals are prioritized in this case? The corporation’s, of course. For Target, the primary motivation was maximizing profit, and quarterly financial reports to the board are the measurement of success. Whose goals are not prioritized? The teenager’s and those of every other pregnant woman out there.

How did we get to the point where data science is used almost exclusively in the service of profit (for a few), surveillance (of the minoritized), and efficiency (amidst scarcity)? It’s worth stepping back to make an observation about the organization of the data economy: data are expensive and resource-intensive, so only already powerful institutions—corporations, governments, and elite research universities—have the means to work with them at scale. These resource requirements result in data science that serves the primary goals of the institutions themselves. We can think of these goals as the three Ss: science (universities), surveillance (governments), and selling (corporations). This is not a normative judgment (e.g., “all science is bad”) but rather an observation about the organization of resources. If science, surveillance, and selling are the main goals that data are serving, because that’s who has the money, then what other goals and purposes are going underserved?

Let’s take “the cloud” as an example. As server farms have taken the place of paper archives, storing data has come to require large physical spaces. A project by the Center for Land Use Interpretation (CLUI) makes this last point plain (figure 1.7). In 2014, CLUI set out to map and photograph data centers around the United States, often in those seemingly empty in-between areas we now call exurbs. In so doing, it called attention to “a new kind of physical information architecture” sprawling across the United States: “windowless boxes, often with distinct design features such as an appliqué of surface graphics or a functional brutalism, surrounded by cooling systems.” The environmental impacts of the cloud—in the form of electricity and air conditioning—are enormous. A 2017 Greenpeace report estimated that the global IT sector, which is largely US-based, accounted for around 7 percent of the world’s energy use. This is more than some of largest countries in the world, including Russia, Brazil, and Japan.72 Unless that energy comes from renewable sources (which the Greenpeace report shows that it does not), the cloud has a significant accelerating impact on global climate change.

So the cloud is not light and it is not airy. And the cloud is not cheap. The cost of constructing Facebook’s newest data center in Los Lunas, New Mexico, is expected to reach $1 billion.73 The electrical cost of that center alone is estimated at $31 million per year.74 These numbers return us to the question about financial resources: Who has the money to invest in centers like these? Only powerful corporations like Facebook and Target, along with wealthy governments and elite universities, have the resources to collect, store, maintain, analyze, and mobilize the largest amounts of data. Next, who is in charge of these well-resourced institutions? Disproportionately men, even more disproportionately white men, and even more than that, disproportionately rich white men. Want the data on that? Google’s Board of Directors is comprised of 82 percent white men. Facebook’s board is 78 percent male and 89 percent white. The 2018 US Congress was 79 percent male—actually a better percentage than in previous years—and with a median net worth of five times more than the average American household.75 These are the people who experience the most privilege within the matrix of domination, and they are also the people who benefit the most from the current status quo.76

Photograph of the side-view of a data center in North Bergen, NJ under cloudy skies and in front of a row of bushes. The data center is a 3-story white building with orange stripes and blue tinted windows.

Photograph of a data center in North Bergen, NJ.
Photograph of a data center in Dalles, OR during a bright, sunny day with a few clouds. The data center is in a fairly rural area, with an abandoned construction site to the left of it, large green hilly mountains behind it, and telephone lines running along the side of the building.
Photograph of a data center in Ashburn, VA during a sunny day with clear skies and in front of a field of grass. The data center is split into several buildings, all with a light yellow color.
Photograph of a data center in Lockport, NY on a bright, cloudy day, in front of an empty road. The data center is a 4-story white building with cyan tinted windows and has a gated fence surrounding the back of the building.

Figure 1.7: Photographs from Networked Nation: The Landscape of the Internet in America, an exhibition by the Center for Land Use Interpretation staged in 2013. The photos show four data centers located in North Bergen, NJ; Dalles, OR; Ashburn, VA; and Lockport, NY (counterclockwise from top right). They show how the “cloud” is housed in remote locations and office parks around the country. Images by the Center for Land Use Interpretation. Source: Networked Nation: The Landscape of the Internet in America, exhibit, 2013, Center for Land Use Interpretation. Credit: Images by the Center for Land Use Interpretation.

In the past decade or so, many of these men at the top have described data as “the new oil.”77 It’s a metaphor that resonates uncannily well—even more than they likely intended. The idea of data as some sort of untapped natural resource clearly points to the potential of data for power and profit once they are processed and refined, but it also helps highlight the exploitative dimensions of extracting data from their source—people—as well as their ecological cost. Just as the original oil barons were able to use their riches to wield outsized power in the world (think of John D. Rockefeller, J. Paul Getty, or, more recently, the Koch brothers), so too do the Targets of the world use their corporate gain to consolidate control over their customers. But unlike crude oil, which is extracted from the earth and then sold to people, data are both extracted from people and sold back to them—in the form of coupons like the one the Minneapolis teen received in the mail, or far worse.78

This extractive system creates a profound asymmetry between who is collecting, storing, and analyzing data, and whose data are collected, stored, and analyzed.79 The goals that drive this process are those of the corporations, governments, and well-resourced universities that are dominated by elite white men. And those goals are neither neutral nor democratic—in the sense of having undergone any kind of participatory, public process. On the contrary, focusing on those three Ss—science, surveillance, and selling—to the exclusion of other possible objectives results in significant oversights with life-altering consequences. Consider the Target example as the flip side of the missing data on maternal health outcomes. Put crudely, there is no profit to be made collecting data on the women who are dying in childbirth, but there is significant profit in knowing whether women are pregnant.

How might we prioritize different goals and different people in data science? How might data scientists undertake a feminist analysis of power in order to tackle bias at its source? Kimberly Seals Allers, a birth justice advocate and author, is on a mission to do exactly that in relation to maternal and infant care in the United States. She followed the Serena Williams story with great interest and watched as Congress passed the Preventing Maternal Deaths Act of 2018. This bill funded the creation of maternal health review committees in every state and, for the first time, uniform and comprehensive data collection at the federal level. But even as more data have begun to be collected about maternal mortality, Seals Allers has remained frustrated by the public conversation: “The statistics that are rightfully creating awareness around the Black maternal mortality crisis are also contributing to this gloom and doom deficit narrative. White people are like, ‘how can we save Black women?’ And that’s not the solution that we need the data to produce.”80

A flow chart (as a series of screenshots) which shows the sign-up process and the app platform for Irth, a mobile app which helps brown and Black mothers find prenatal, birthing, postpartum and pediatric reviews of care. The screenshots of the sign-up process show the app asking users to input key identifying details such as race, ethnicity, self-identity, relationship status, etc. There is also an optional page where users can include information such as their religion and education level. The screenshots of the app’s platform showcase the main features of the app, including viewing reviews for specific doctors and nurses, as well as writing reviews based on the user’s personal experiences.

Figure 1.8: Irth is a mobile app and web platform focused on removing bias from birth (including prenatal, birth, and postpartum health care). Users post intersectional reviews of the care they received from individual nurses and doctors, as well as whole practices and hospitals. When parents to be are searching for providers, they can consult Irth to see what kind of care people like them received in the hands of specific caregivers. Wireframes from Irth’s first prototype are shown here. Images by Kimberly Seals Allers and the Irth team, 2019. Credit: Kimberly Seals Allers and the Irth team.

Seals Allers—and her fifteen-year-old son, Michael—are working on their own data-driven contribution to the maternal and infant health conversation: a platform and app called Irth—from birth, but with the b for bias removed (figure 1.8). One of the major contributing factors to poor birth outcomes, as well as maternal and infant mortality, is biased care. Hospitals, clinics, and caregivers routinely disregard Black women’s expressions of pain and wishes for treatment.81 As we saw, Serena Williams’s own story almost ended in this way, despite the fact that she is an international tennis star. To combat this, Irth operates like an intersectional Yelp for birth experiences. Users post ratings and reviews of their prenatal, postpartum, and birth experiences at specific hospitals and in the hands of specific caregivers. Their reviews include important details like their race, religion, sexuality, and gender identity, as well as whether they felt that those identities were respected in the care that they received. The app also has a taxonomy of bias and asks users to tick boxes to indicate whether and how they may have experienced different types of bias. Irth allows parents who are seeking care to search for a review from someone like them—from a racial, ethnic, socioeconomic, and/or gender perspective—to see how they experienced a certain doctor or hospital.

Seals Allers’s vision is that Irth will be both a public information platform, for individuals to find better care, and an accountability tool, to hold hospitals and providers responsible for systemic bias. Ultimately, she would like to present aggregated stories and data analyses from the platform to hospital networks to push for change grounded in women’s and parents’ lived experiences. “We keep telling the story of maternal mortality from the grave,” she says. “We have to start preventing those deaths by sharing the stories of people who actually lived.”82

Irth illustrates the fact that “doing good with data” requires being deeply attuned to the things that fall outside the dataset—and in particular to how datasets, and the data science they enable, too often reflect the structures of power of the world they draw from. In a world defined by unequal power relations, which shape both social norms and laws about how data are used and how data science is applied, it remains imperative to consider who gets to do the “good” and who, conversely, gets someone else’s “good” done to them.

Examine Power

Data feminism begins by examining how power operates in the world today. This consists of asking who questions about data science: Who does the work (and who is pushed out)? Who benefits (and who is neglected or harmed)? Whose priorities get turned into products (and whose are overlooked)? These questions are relevant at the level of individuals and organizations, and are absolutely essential at the level of society. The current answer to most of these questions is “people from dominant groups,” which has resulted in a privilege hazard so acute that it explains the near-daily revelations about another sexist or racist data product or algorithm. The matrix of domination helps us to understand how the privilege hazard—the result of unequal distributions of power—plays out in different domains. Ultimately, the goal of examining power is not only to understand it, but also to be able to challenge and change it. In the next chapter, we explore several approaches for challenging power with data science.

Footnotes

  1. Serena Williams, “Meet Alexis Olympia Ohanian Jr. You have to check out link in bio for her amazing journey. Also check out my IG stories 😍😍❤❤,” September 13, 2017, https://www.instagram.com/p/BY-7H9zhQD7/.

  2. See Serena Williams, Facebook, January 15, 2018, https://www.facebook.com/SerenaWilliams/videos/10156086135726834/.

  3. Nina Martin and Renee Montagne, “Nothing Protects Black Women from Dying in Pregnancy and Childbirth,” ProPublica, December 7, 2017, https://www.propublica.org/article/nothing-protects-black-women-from-dying-in-pregnancy-and-childbirth.

  4. See New York City Department of Health and Mental Hygiene, Severe Maternal Morbidity in New York City, 2008–2012 (New York, 2016), https://www1.nyc.gov/assets/doh/downloads/pdf/data/maternal-morbidity-report-08-12.pdf.

  5. SisterSong, National Latina Institute for Reproductive Health, and Center for Reproductive Rights, Reproductive Injustice: Racial and Gender Discrimination in U.S. Health Care (New York: Center for Reproductive Rights, 2014), https://tbinternet.ohchr.org/Treaties/CERD/Shared%20Documents/USA/INT_CERD_NGO_USA_17560_E.pdf.

  6. USA Today’s ongoing reporting on maternal mortality can be found at https://www.usatoday.com/series/deadlydeliveries/.

  7. Robin Fields and Joe Sexton, “How Many American Women Die from Causes Related to Pregnancy or Childbirth? No One Knows,” ProPublica, October 23, 2017, https://www.propublica.org/article/how-many-american-women-die-from-causes-related-to-pregnancy-or-childbirth.

  8. In studies that try to infer maternal mortality through other indicators, like hospital records, what has been consistently shown is that the United States is one of the only countries in the world where maternal morbidity is increasing for all races, and increasing even more steeply for Black and brown women. For example, between 2000 and 2014, the CDC reported a 26.6 percent increase in the maternal mortality ratio in the United States. A 2018 report, Trends and Disparities in Delivery Hospitalizations Involving Severe Maternal Morbidity, 2006–2015, showed that life-threatening complications increased for all races and ethnicities during that time. In 2015, dying in the hospital was three times more likely for Black mothers than for white mothers. There was no change in the disparity between white mothers and Black mothers during the period that the data covered.

  9. According to the 2018 Newspaper Diversity Survey led by researcher Meredith Clarke for the American Society of News Editors, ProPublica’s leadership is 89 percent white, with no Black people in leadership positions, and USA Today’s leadership is 85 percent white. See https://www.asne.org/diversity-survey-2018.

  10. As we wrote this chapter, people were tweeting #believeblackwomen (see https://twitter.com/search?q=believeblackwomen&src=typd) to grieve the death of a young Black woman named Lashonda Hazard who was pregnant and experiencing severe pain. She died at Women and Infants Hospital in Rhode Island after posting on Facebook that medical staff weren’t listening to her. In response, a community organization named Sista Fire RI wrote an open letter to the hospital calling for an end to what they characterized as a pattern of racialized gender violence: “In a state that does not put Black women or women of color first, we believe and trust Black women.” See https://docs.google.com/forms/d/e/1FAIpQLSd-B1sBFiip8tB41L-q3j5vu75qwLxVA9a3h5toX53lMEifFA/viewform, accessed May 11, 2019.

  11. Lindsay Schallon, “Serena Williams on the Pressure of Motherhood: ‘I’m Not Always Going to Win,’” Glamour, April 27, 2018, https://www.glamour.com/story/serena-williams-motherhood-activism-me-too.

  12. Schallon, “Serena Williams on the Pressure of Motherhood.”

  13. Patricia Hill Collins, Black Feminist Thought: Knowledge, Consciousness, and the Politics of Empowerment (New York: Routledge, 2008), 21.

  14. Chart based on concepts introduced by Patricia Hill Collins in Black Feminist Thought: Knowledge, Consciousness, and the Politics of Empowerment.

  15. Even then, Native Americans of all genders were still legally excluded from voting—at least for another few years—since they had yet to be granted US citizenship. The Fourteenth Amendment explicitly excluded Native Americans from US citizenship—another instance of oppression being codified in the structural domain of the matrix of domination. In 1924, the passage of the Indian Citizenship Act granted joint US citizenship to all Native Americans, clearing the path for enfranchisement. But it would take until 1962 for the last US state (New Mexico) to change its laws so that all Native Americans could vote. Even then, obstacles abounded; the 1965 Voting Rights Act offered additional legal language to contest disenfranchisement, but that act is in the process of being dismantled by the Supreme Court (as of 2013, with Shelby County v. Holder), which threatens many of its protections. On the subject of voting rights in the United States, it’s also worth pointing out that Puerto Rico did not have universal suffrage until 1935, and like other US territories, still does not have voting power in the US Congress or representation in the electoral college.

  16. Other disenfranchisement methods devised over the years have included undue wait times for registering to vote, having to pay a tax to vote, or having to take a test about the Constitution. Well through the passage of the Voting Rights Act of 1965, Black and brown people seeking to vote faced threats of bodily harm. Note that the history of voter suppression perpetrated by white people on people of color is not over. One need only consider the 2018 gubernatorial election in Georgia, in which Brian Kemp, secretary of state and a white man, presided over his own gubernatorial race against Stacey Abrams, a Black woman. In his capacity as secretary of state, his actions included purging voter rolls and putting fifty-three thousand voter registrations on hold, 70 percent of which were for voters of color. Long lines and technical problems plagued the election day efforts, and the NAACP and ACLU sued the state of Georgia for voting irregularities. In short, voter suppression—enacted in the disciplinary domain of the matrix of domination—is alive and well. See German Lopez, “Voter Suppression Really May Have Made the Difference for Republicans in Georgia,” Vox, November 7, 2018, https://www.vox.com/policy-and-politics/2018/11/7/18071438/midterm-election-results-voting-rights-georgia-florida.

  17. Note that the disciplinary domain does not just have to do with government power and policy, but also with corporate, private, and institutional policies. A particular company prohibiting its workers from leaving early to vote or penalizing those who distribute information about voting on the factory floor is an example of the disciplinary domain.

  18. Eleanor Barkhorn, “‘Vote No on Women’s Suffrage’: Bizarre Reasons for Not Letting Women Vote,” Atlantic, November 6, 2012, https://www.theatlantic.com/sexes/archive/2012/11/vote-no-on-womens-suffrage-bizarre-reasons-for-not-letting-women-vote/264639/.

  19. This is a point that Collins underscores: “Oppression is not simply understood in the mind—it is felt in the body in myriad ways,” she writes (Black Feminist Thought, 293).

  20. For further explanation of why minoritized makes more sense to use than minority, see I. E. Smith, “Minority vs. Minoritized: Why the Noun Just Doesn’t Cut It,” Odyssey, September 2, 2016, https://www.theodysseyonline.com/minority-vs-minoritize; and Yasmin Gunaratnam, Researching Race and Ethnicity: Methods, Knowledge and Power (London: Sage, 2003).

  21. This role often entails what Sara Ahmed has described as being a “feminist killjoy.” As she writes in the first post on her blog, you might be a feminist killjoy if you “have ruined the atmosphere by turning up or speaking up” or “have a body that reminds people of histories they find disturbing” or “are angry because that’s a sensible response to what is wrong.” The feminist killjoy exposes racism and sexism, but “for those who do not have a sense of the racism or sexism you are talking about, to bring them up is to bring them into existence.” In the process of exposing the problem, the feminist killjoy herself becomes a problem. She is “causing trouble” or getting in the way of the happiness of others by bringing up the issue. For example, a personal killjoy moment from the book-writing process happened when Catherine shared the topic of the book with a former professor, who responded that she should stay focused on data literacy and not become one of those “grumpy feminists” who were uncomfortable with their sexuality and sought to make problems for people. For the record, Catherine is not grumpy, feels confident in her sexuality, and is working on the killjoy skills of making more feminist problems for people. Read more about how to navigate being or becoming a feminist killjoy at feministkilljoys.com, or see Sara Ahmed, Living a Feminist Life (Durham, NC: Duke University Press, 2017).

  22. Feminist methods involve continually asking who questions, as AI researcher Michael Muller has observed: By whom, for whom, who benefits, who is harmed, who speaks, who is silenced. Muller articulated what some of the who questions are for human-computer interaction in his essay “Feminism Asks the ‘Who’ Questions in HCI,” Interacting with Computers 23, no. 5 (2011): 447–449, and in this book we articulate what some of the who questions are for data science.

  23. “Bureau of Labor Statistics Data Viewer,” US Bureau of Labor Statistics, 2019, accessed April 10, 2019, https://beta.bls.gov/dataViewer/view/timeseries/LNU02070002Q.

  24. “Data Brief: Women and Girls of Color in Computing,” Women of Color in Computing Collaborative, 2018, accessed April 10, 2019, https://www.wocincomputing.org/wp-content/uploads/2018/08/WOCinComputingDataBrief.pdf.

  25. Sarah West Myers, Meredith Whittaker, and Kate Crawford. “Discriminating Systems: Gender, Race and Power in AI,” AI Now Institute, 2019, https://ainowinstitute.org/discriminatingsystems.pdf.

  26. Christianne Corbett and Catherine Hill, Solving the Equation: The Variables for Women’s Success in Engineering and Computing, American Association of University Women (Washington, DC: 2015). For comparison, 26% women graduates today is the same percentage of women computer science graduates in 1974, and in subfields like machine learning, the proportion of women is far less. As per the points made in this chapter, even knowing the exact extent of the disparity is challenging. According to a 2014 Mother Jones report about diversity in Silicon Valley, tech firms convinced the US Labor Department to treat their demographics as a trade secret and didn’t divulge any data until after they were sued by Mike Swift of the San Jose Mercury News. See Josh Harkinson, “Silicon Valley Firms Are Even Whiter and More Male Than You Thought” Mother Jones, May 29, 2014. There are analyses that have obtained the data in other ways. For example, a gender analysis by data scientists at LinkedIn has shown that tech teams at tech companies have far less gender parity than tech teams in other industries, including healthcare, education, and government. See Sohan Murthy, “Measuring Gender Diversity with Data from LinkedIn,” LinkedIn (blog), June 17, 2015.

  27. See Nadya A. Fouad, “Leaning in, but Getting Pushed Back (and Out),” presentation at the American Psychological Association, August 2014, https://www.apa.org/news/press/releases/2014/08/pushed-back.pdf.

  28. In the case of a different resume screening tool (not the one developed by Amazon), it was found that the most predictive factors of job performance success were whether someone was named “Jared” and if they had played lacrosse. We might laugh at the absurdity of such random and specific details, but note how they tell us a lot about the group characteristics of who is getting hired: Jared is a mostly men’s name, a mostly white name, and lacrosse—in spite of its Native American origins—is an expensive and predominantly elite, white sport. On biased job algorithms, see Dave Gershgorn, “Companies Are on the Hook if Their Hiring Algorithms Are Biased,” Quartz, October 22, 2018, https://qz.com/1427621/companies-are-on-the-hook-if-their-hiring-algorithms-are-biased/; and Rachel Kraus, “Amazon’s Sexist AI Has a Deeper Problem than Code,” Mashable, October 10, 2018, https://mashable.com/article/amazon-sexist-recruiting-algorithm-gender-bias-ai/#VSsbMcGmvqqa. On the origins of lacrosse, see Anthony Aveni, “The Indian Origins of Lacrosse,” Colonial Williamsburg Journal (Winter 2010).

  29. Safiya Umoja Noble, Algorithms of Oppression: How Search Engines Reinforce Racism (New York: NYU Press, 2018), 80–81.

  30. Feminist legal scholar Martha R. Mahoney summarizes the effect of this privilege hazard with respect to race: “A crucial part of the privilege of a dominant group is the ability to see itself as normal and neutral. This quality of being ‘normal’ makes whiteness and the racial specificity of our own lives invisible as air to white people, while it is visible or offensively obvious to people defined outside the circle of whiteness.” The passage appears in “Whiteness and Women, In Practice and Theory: A Response to Catherine McKinnon,” Yale Journal of Law & Feminism 5, no. 2 (1993): 217–251.

  31. From Anita Gurumurthy’s keynote address at Data Justice 2018, Cardiff University.

  32. Kate Crawford, “Artificial Intelligence’s White Guy Problem,” New York Times, June 26, 2016, https://www.nytimes.com/2016/06/26/opinion/sunday/artificial-intelligences-white-guy-problem.html.

  33. Cis het is shorthand for cisgender heterosexual. These are two dominant group identities: a person is cisgender when their gender identity matches the sex that they were assigned at birth, and a person is heterosexual when they are sexually attracted to people of the opposite sex.

  34. Facial analysis software is used to detect faces in a larger image, such as when some digital cameras create outlines around any faces that they detect in a frame. Facial recognition is when the software both detects a face and then matches that face against a database to cross-reference the face with personal information, such as name, demographics, criminal history, and so on.

  35. Blackface refers to the racist practice of predominantly non-Black performers painting their faces to signal their caricatured representation of Black people. The tradition has a long history, and has directly contributed to the spread of racist stereotypes about Black people. On its history, see Eric Lott, Love and Theft: Blackface Minstrelsy and the American Working Class (New York: Oxford University Press, 1993). On some contemporary manifestations, see Lauren Michele Jackson, “We Need to Talk about Digital Blackface in Reaction GIFs,” Teen Vogue, August 2, 2017, https://www.teenvogue.com/story/digital-blackface-reaction-gifs.

  36. See Joy Buolamwini and Timnit Gebru, “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification,” Proceedings of Machine Learning Research 81 (2018): 1–15, http://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a.pdf.

  37. Training automated systems involves using training data to teach the model how to classify things. For example, in the case of Buolamwini’s work, the training data would consist of images with and without faces, and the model would be trained to detect whether or not there is a face in each image and, if so, to identify the specific location of the face. Once the model is trained, it is evaluated using another dataset—called a test dataset—to determine whether the model works only on the training data or whether it is likely to perform well with new data. Finally, once the model has been tested, it is evaluated again with what’s called a benchmarking dataset. Benchmarking data consists of an agreed upon standard dataset that makes possible to compare different models—so a researcher could say something like, “The facial detection model from X university performed at 90 percent accuracy, whereas the model from Y corporation performed at 87 percent accuracy.”

  38. For a full list of media outlets that have written about Buolamwini’s work, see https://www.poetofcode.com/press.

  39. In Artificial Unintelligence (Cambridge, MA: MIT Press, 2018), data journalist and professor Meredith Broussard outlines the concept of technochauvinism: the belief that the technological solution to a problem is the right one. She argues that artificial intelligence is often not the most efficient, nor most effective, nor even a remotely adequate solution to a given problem at hand.

  40. In her book White Fragility, DiAngelo goes further, demonstrating how racial innocence can be viewed as a deliberate social strategy for maintaining power and dominance in society. In Racial Innocence, literary scholar Robin Bernstein explores its historical roots. See White Fragility: Why It’s So Hard for White People to Talk about Racism (London: Penguin Books, 2019); and Racial Innocence: Performing American Childhood from Slavery to Civil Rights (New York: NYU Press, 2011).

  41. Danielle Brown, “Google Diversity Annual Report 2018,” Google, 2018, accessed April 10, 2019, https://static.googleusercontent.com/media/diversity.google/en//static/pdf/Google_Diversity_annual_report_2018.pdf; and Catherine D’Ignazio, “How Might Ethical Data Principles Borrow from Social Work?,” Medium, September 2, 2018, https://medium.com/@kanarinka/how-might-ethical-data-principles-borrow-from-social-work-3162f08f0353.

  42. The paper about DiF states, “For face recognition to perform as desired—to be both accurate and fair—training data must provide sufficient balance and coverage.” See Michele Merler, Nalini Ratha, Rogerio Feris, and John R. Smith, “Diversity in Faces,” IBM Research, 2019, https://arxiv.org/abs/1901.10436.

  43. Amy Hawkins, “Beijing’s Big Brother Tech Needs African Faces,” Foreign Policy, July 24, 2018, https://foreignpolicy.com/2018/07/24/beijings-big-brother-tech-needs-african-faces/.

  44. Hawkins, “Beijing’s Big Brother Tech.”

  45. See Os Keyes, Nikki Stevens, and Jacqueline Wernimont, “The Government Is Using the Most Vulnerable People to Test Facial Recognition Software,” Slate, March 17, 2019, https://slate.com/technology/2019/03/facial-recognition-nist-verification-testing-data-sets-children-immigrants-consent.html.

  46. @ShovelRemi, “I hope facial recognition software has a problem identifying my face too. That’d come in handy when the police come rolling around with their facial recognition truck at peaceful demonstrations of dissent, cataloging all dissenter for ‘safety and security,’” Twitter, February 12, 2018, 7:58 p.m., https://twitter.com/ShovelRemi/status/963215680559489024.

  47. “Research shows facial analysis technology is susceptible to bias and even if accurate can be used in ways that breach civil liberties. Without bans on harmful use cases, regulation, and public oversight, this technology can be readily weaponized, employed in secret government surveillance, and abused in law enforcement,” Buolamwini warns. In early 2019, the AJL collaborated with the Center on Technology & Privacy at Georgetown Law to launch the Safe Face Pledge, a set of four ethical commitments that businesses and governments make when using facial analysis technology. Many AI companies and prominent researchers have signed the Safe Face Pledge at the time of this writing. Notably, Amazon, which sells its Rekognition technology to police departments around the country, has not signed and has actively attacked Buolamwini’s research. In response, top AI researchers have come to her defense and have called on Amazon to stop selling Rekognition to police departments. See Matt O’Brien, “Face Recognition Researcher Fights Amazon over Biased AI,” Associated Press, April 3, 2019, https://apnews.com/24fd8e9bc6bf485c8aff1e46ebde9ec1. For other references, see Joy Buolamwini, “AI Ain’t I a Woman?,” YouTube video, 3:32, June 2018, https://www.youtube.com/watch?v=QxuyfWoVV98; Joy Buolamwini, “How I’m Fighting Bias in Algorithms,” filmed November 2016 in Boston, TED video, 8:34, https://www.ted.com/talks/joy_buolamwini_how_i_m_fighting_bias_in_algorithms?language=en; Federal Trade Commission, “Hearings on Competition and Consumer Protection in the 21st Century,” event agenda, November 13–14, 2018, https://www.ftc.gov/news-events/press-releases/2018/10/ftc-announces-agenda-seventh-session-its-hearings-competition; and Soledad O’Brien and Joy Buolamwini, “Artificial Intelligence Is Biased: She’s Working to Fix It,” Matter of Fact, September 8, 2018, https://matteroffact.tv/artificial-intelligence-is-biased-shes-working-to-fix-it/.

  48. Arrianna Planey, “Devalued Lives, & Premature Death: Intervening at the Axes of Social ‘Difference,’” Arrianna Planey’s Blog, March 29, 2019, https://arriannaplaney.wordpress.com/2019/03/29/intervening-at-the-axes-of-social-difference-devalued-lives-premature-death/.

  49. Mimi Onuoha, “On Missing Data Sets,” GitHub, January 25, 2018, https://github.com/MimiOnuoha/missing-datasets.

  50. Mayra Buvinic, Rebecca Furst-Nichols and Gayatri Koolwal, Mapping Gender Data Gaps (New York: Data2X, 2014), https://data2x.org/wp-content/uploads/2019/05/Data2X_MappingGenderDataGaps_FullReport.pdf; and Caroline Criado Perez, Invisible Women: Exposing Data Bias in a World Designed for Men (New York: Random House, 2019).

  51. See Adriana Gallardo, “How We Collected Nearly 5,000 Stories of Maternal Harm,” ProPublica, March 20, 2018, https://www.propublica.org/article/how-we-collected-nearly-5-000-stories-of-maternal-harm.

  52. See https://www.boston.gov/neighborhood/roxbury.

  53. Penn Loh, Jodi Sugerman-Brozan, Standrick Wiggins, David Noiles, and Cecelia Archibald, “From Asthma to AirBeat: Community-Driven Monitoring of Fine Particles and Black Carbon in Roxbury, Massachusetts,” Environmental Health Perspectives 110 (April 2002): 297–301.

  54. On counter-data, see Morgan Currie, Britt S. Paris, Irene Pasquetto, and Jennifer Pierre, “The Conundrum of Police Officer-Involved Homicides: Counter-Data in Los Angeles County,” Big Data & Society 3, no. 2 (2016): 1–14. On data activism, see Stefania Milan and Lonneke Van Der Velden, “The Alternative Epistemologies of Data Activism,” Digital Culture & Society 2, no. 2 (2016): 57–74. On statactivism, see the introduction to the special issue of Partecipazione e conflitto. The Open Journal of Sociopolitical Studies on the topic, edited by Isabelle Bruno, Emmanuel Didier, and Tommaso Vitale: “Statactivism: Forms of Action between Disclosure and Affirmation,” Partecipazione e conflitto: The Open Journal of Sociopolitical Studies 7, no. 2 (2014): 198–220. There is a large body of literature on citizen science; a good starting point is Sara Ann Wylie, Kirk Jalbert, Shannon Dosemagen, and Matt Ratto, “Institutions for Civic Technoscience: How Critical Making Is Transforming Environmental Research,” Information Society 30, no. 2 (2014): 116–126.

  55. See Ida B. Wells, “A Red Record: Tabulated Statistics and Alleged Causes of Lynchings in the United States, 1892-1893-1894: Respectfully Submitted to the Nineteenth Century Civilization in ‘the Land of the Free and the Home of the Brave,’” New York Public Library Digital Collections, accessed July 24, 2019, http://digitalcollections.nypl.org/items/510d47df-8dbd-a3d9-e040-e00a18064a99.

  56. See the About and Data Institute pages of the Ida B. Wells Society website. Since 2016, the Ida B. Wells Society has partnered with ProPublica to offer a two-week data science institute for both journalism students and working reporters. See http://idabwellssociety.org/data-institute/, accessed August 8, 2019.

  57. Femicide is a term first used publicly by feminist writer and activist Diana Russell in 1976 while testifying before the first International Tribunal on Crimes Against Women. Her goal was to situate the murders of women in a context of unequal gender relations. In this context, men use violence to systematically dominate and exert power over women. And the research bears this out. While male victims of homicide are more likely to have been killed by strangers, a 2009 report published by the World Health Organization and partners notes a “universal finding in all regions” that women are far more likely to have been murdered by someone they know. Femicide includes a range of gender-related crimes, including intimate and interpersonal violence, political violence, gang activity, and female infanticide. Such deaths are often depicted as isolated incidents and treated as such by authorities, but those who study femicides characterize them as a pattern of underrecognized and underaddressed systemic violence. See World Health Organization, Strengthening Understanding of Femicide: Using Research to Galvanize Action and Accountability (Washington, DC: Program for Appropriate Technology in Health [PATH], InterCambios, Medical Research Council of South Africa [MRC], and World Health Organization [WHO], 2009), 110.

  58. See Maria Salguero’s map at https://feminicidiosmx.crowdmap.com/ and https://www.google.com/maps/d/u/0/viewer?mid=174IjBzP-fl_6wpRHg5pkGSj2egE&ll=23.942983359872816%2C-101.9008685&z=5.

  59. Indeed, Marisela Escobedo Ortiz, the mother of one such victim, was herself shot at point-blank range and killed while demonstrating in front of the Governor’s Palace in Chihuahua in 2010.

  60. The toll now stands at more than 1,500. Three hundred women were killed in Juárez in 2011 alone, and only a tiny fraction of those cases have been investigated. The problem extends beyond Ciudad Juárez and the state of Chihuahua to other states, including Chiapas and Veracruz.

  61. Strengthening Understanding of Femicide states that “instances of missing, incorrect, or incomplete data mean that femicide is significantly underreported in every region.” See World Health Organization, Strengthening Understanding of Femicide, 4.

  62. After three years of investigating, the commission, chaired by politician Marcela Lagarde, found that femicide was indeed occurring and that the Mexican government was systematically failing to protect women and girls from being killed. Lagarde suggested that femicide be considered, “a crime of the state which tolerates the murders of women and neither vigorously investigates the crimes nor holds the killers accountable.” See World Health Organization, Strengthening Understanding of Femicide, 11.

  63. See Maria Rodriguez-Dominguez, “Femicide and Victim Blaming in Mexico, Council on Hemispheric Affairs, October 2, 2017, http://www.coha.org/wp-content/uploads/2017/10/Maria-Rodriguez-Femicidio-Mexico-.pdf.

  64. Mara Miranda (@MaraMiranda25), “#SiMeMatan es porque me gustaba salir de noche y tomar mucha cerveza ... ,” Twitter, May 5, 2017, 11:17 a.m., https://twitter.com/MaraMiranda25/status/860559096285720581. For an in-depth study of the hashtag and its use in social and political organizing, see Elizabeth Losh, Hashtag (New York: Bloomsbury, 2019).

  65. Missing data is not a new problem; the fields of critical cartography and critical GIS have long considered the phenomenon of missing data. Contemporary examples of missing data and counterdata collection include “The Missing and Murdered Indigenous Women Database,” created by doctoral student Annita Lucchesi, which tracks Indigenous women who are killed or disappear under suspicious circumstances in the United States and Canada (https://www.sovereign-bodies.org/mmiw-database). Jonathan Gray, Danny Lämmerhirt, and Liliana Bounegru also wrote a report which includes case studies of citizen involvement in collecting data on drones, police killings, water supplies, and pollution. See “Changing What Counts: How Can Citizen-Generated and Civil Society Data Be Used as an Advocacy Tool to Change Official Data Collection?,” 2016, https://dx.doi.org/10.2139/ssrn.2742871. Environmental health and justice is an area in which communities are out front collecting data when agencies refuse or neglect to do so. The MappingBack Network (http://mappingback.org/home_en/aboutus/) provides mapping capacity and support to Indigenous communities fighting extractive industries, and Sara Wylie, cofounder of Public Lab, works with communities impacted by fracking to measure hydrogen sulfide using low-cost DIY sensors. See Sara Wylie, Elisabeth Wilder, Lourdes Vera, Deborah Thomas, and Megan McLaughlin, “Materializing Exposure: Developing an Indexical Method to Visualize Health Hazards Related to Fossil Fuel Extraction,” Engaging Science, Technology, and Society 3 (2017): 426–463. Indigenous cartographers Margaret Wickens Pearce and Renee Pualani Louis describe cartographic techniques for recuperating Indigenous perspectives and epistemologies (often absent or misrepresented) into GIS maps. See Margaret Pearce and Renee Louis, “Mapping Indigenous Depth of Place,” American Indian Culture and Research Journal 32, no. 3 (2008): 107–126. All that said, participatory data collection efforts have their own silences, as Heather Ford and Judy Wajcman show in their study of the “missing women” of Wikipedia: “‘Anyone Can Edit,’ Not Everyone Does: Wikipedia’s Infrastructure and the Gender Gap,” Social Studies of Science 47, no. 4 (2017): 511–527.

  66. Jonathan Stray, The Curious Journalist’s Guide to Data (New York: Columbia Journalism School, 2016).

  67. Virginia Eubanks, Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor (New York: St. Martin’s Press, 2018).

  68. “The data are not neutral” is a recurring theme of data feminism. This doesn’t mean that data are never useful, just that they are never neutral representations of some sort of essential truth. Examining and understanding the asymmetries of power in the data collection environment (that lead to inequities in the dataset itself) is one of the key responsibilities of the feminist data scientist.

  69. Charles Duhigg, “How Companies Learn Your Secrets,” New York Times, February 19, 2012.

  70. Duhigg, “How Companies Learn Your Secrets.”

  71. The Target “pregnancy prediction score” was more detection than actual prediction because by the time the products were purchased, the customer was likely already pregnant.

  72. “Clicking Clean,” Greenpeace, May 2015, accessed April 10, 2019, http://www.greenpeace.org/usa/global-warming/click-clean/#top.

  73. Joshua S. Hill, “Facebook Los Lunas Data Center Boosted by 100 Megawatts of Solar,” CleanTechnica, October 23, 2018, https://cleantechnica.com/2018/10/23/facebook-los-lunas-data-center-boosted-by-100-megawatts-of-solar/.

  74. Marie C. Baca, “It’s Official: Facebook Breaks Ground in New Mexico Next Month,” Albuquerque Journal, September 15, 2016, https://www.abqjournal.com/844876/facebook-picks-los-lunas-for-its-data-center.html.

  75. On the percentage, see “Women in U.S. Congress 2018,” Rutgers Eagleton Institute of Politics, December 13, 2018, https://cawp.rutgers.edu/women-us-congress-2018. On the wealth, see David Hawkings, “Wealth of Congress: Richer than Ever, but Mostly at the Very Top,” Roll Call, February 27, 2018, https://www.rollcall.com/news/hawkings/congress-richer-ever-mostly-top.

  76. A good visual exploration of the whiteness and the maleness of power across domains can be seen in a photographic data visualization: Haeyoun Park, Josh Keller, and Josh Williams, “The Faces of American Power, Nearly as White as the Oscar Nominees,” New York Times, February 26, 2016, https://www.nytimes.com/interactive/2016/02/26/us/race-of-american-power.html.

  77. See “The World’s Most Valuable Resource Is No Longer Oil, but Data,” Economist, May 6, 2017. For a list of these CEOs, see Michael Haupt, “‘Data Is the New Oil’—A Ludicrous Proposition,” Medium, May 2, 2016. If you want to hear many people in a row say the phrase, check out the supercut by Neil Perry, “MyDataMyDollars_2018,” YouTube, January 28, 2019, https://youtu.be/kKwr1Tp0TBA?t=587.

  78. For example, once advertising giants like Facebook and Google have your gender, they can turn around and use it against you. In 2018, Facebook was accused of gender discrimination because it permitted employers to show job ads only to men. Part of this hinges on corporations’ reluctance to take responsibility for any of the content that passes through their platforms: Is Facebook discriminating against women? Or merely letting its customers use Facebook data to discriminate? The news article about the suit says that “Facebook said that it was still reviewing the ads but that it generally did not take down job ads that exclude a gender.” See Noam Scheiber, “Facebook Accused of Allowing Bias Against Women in Job Ads,” New York Times, September 18, 2018, https://www.nytimes.com/2018/09/18/business/economy/facebook-job-ads.html. In another example, computer scientists scraped YouTube videos by transgender users and used their images (without consent) to try to train an algorithm to recognize transgender faces. People found their images included in scientific research papers about the technology when they had never granted permission. Because of cissexism, this kind of unethical practice poses severe risk of harm to transgender users in the form of discrimination and violence. See James Vincent, “Transgender YouTubers Had Their Videos Grabbed to Train Facial Recognition Software,” Verge, August 22, 2017, https://www.theverge.com/2017/8/22/16180080/transgender-youtubers-ai-facial-recognition-dataset. For an example from the government sector with even more severe ethical implications, see Keyes, Stevens, and Wernimont, “The Government Is Using the Most Vulnerable People.”

  79. In their widely cited paper “Critical Questions for Big Data,” danah boyd and Kate Crawford outlined the challenges of unequal access to big data, noting that the current configuration (in which corporations own and control massive stores of data about people) creates an imbalance of power in which there are “Big Data rich” and “Big Data poor.” Boyd and Crawford, “Critical Questions for Big Data: Provocations for a Cultural, Technological, and Scholarly Phenomenon,” Information, Communication & Society 15, no. 5 (2012): 662–679. Media scholar Seeta Peña Gangadharan has detailed how contemporary data profiling disproportionately impacts the poor, communities of color, migrants, and Indigenous groups. See Seeta Gangadharan, “Digital Inclusion and Data Profiling,” First Monday 17, no. 5 (April 13, 2012). Social scientist Zeynep Tufecki warns that corporations have emerged as “power brokers” with outsized potential to influence politics and publics precisely because of their exclusive data ownership. See Zeynep Tufekci, “Engineering the Public: Big Data, Surveillance and Computational Politics,” First Monday, 19, no. 7 (July 2, 2014). And in advancing the idea of Black data to refer to the intersection of informatics and Black queer life, Shaka McGlotten states, “How can citizens challenge state and corporate power when those powers demand we accede to total surveillance, while also criminalizing dissent?” See Shaka McGlotten, “Black Data,” in No Tea, No Shade: New Writings in Black Queer Studies (Durham, NC: Duke University Press, 2016), 262–286.

  80. Indeed, four prominent Black maternal health scholars and leaders wrote an essay titled “An Inconvenient Truth: You Have No Answer That Black Women Don’t Already Possess.” They assert that we should use this moment of increased attention to uplift the work that Black women are already doing, including the “support of Black women in paid, leadership and research roles.” Karen A. Scott, Stephanie R. M. Bray, Ifeyinwa Asiodu, and Monica R. McLemore, “An Inconvenient Truth: You Have No Answer That Black Women Don’t Already Possess,” Black Women Birthing Justice, October 31, 2018, https://www.blackwomenbirthingjustice.org/single-post/2018/10/31/An-inconvenient-truth-You-have-no-answer-that-Black-women-don’t-already-possess.

  81. See Jochen Profit, Jeffrey B. Gould, Mihoko Bennett, Benjamin A. Goldstein, David Draper, Ciaran S. Phibbs, and Henry C. Lee, “Racial/Ethnic Disparity in NICU Quality of Care Delivery,” Pediatrics 140, no. 3 (2017) : e20170918; as well as Kelly M. Hoffman, Sophie Trawalter, Jordan R. Axt, and M. Norman Oliver, “Racial Bias in Pain Assessment and Treatment Recommendations, and False Beliefs about Biological Differences between Blacks and Whites,” Proceedings of the National Academy of Sciences 113, no. 16 (April 4, 2016): 4296–4301, https://doi.org/10.1073/pnas.1516047113.

  82. Kimberly Seals Allers, interview by Catherine D’Ignazio, February 26, 2019.