Last year, some smart people asked me to write a review essay about data in the history of the social sciences. I tried. I did. But it turned out I didn’t have it in me. Such an undertaking requires the writer to float high above the terrain, describing its contours. Yet what interests me most is thinking about how to crouch close to the ground. So, what follows is an essay about a stance for thinking about data and the way we know what know about modern societies.
I begin with a supposition: that the majority of social numbers in modern societies result from the agency of states, the responses of “the people,” (in its various manifestations) and descends more or less directly from political action or the exercise of power. State offices and company skyscrapers are frequently marginal when we think of sites of social science, as are streets claimed by protestors. Yet we learn a great deal about where social numbers come from when we situate ourselves in those types of places and see what we can see.
In that spirit, I begin with a story about social numbers far from any seminar room, but exemplary of how social data systems work. In September 1956, the St. Regis Paper Company had recently completed a slew of acquisitions of other companies and would in the coming year make three more significant additions. The United States’ Federal Trade Commission had been charged by Congress with keeping an eye on corporate expansion. St. Regis Paper’s action caught the attention of someone in the FTC, who opened an initial inquiry into possible anti-trust violations. FTC staff called on St. Regis Paper to cooperate, asking the company to hand over data pertaining to its business and especially its purchases of other corporations. The company delayed.
By 1959, the FTC’s sniffing around had graduated to the status of a formal investigation, which meant that the FTC now demanded the previously requested data and further ordered the delivery of more data on even more acquisitions made by the still-expanding St. Regis Paper. But, again, the company delayed. Lawsuits followed, culminating in a 1961 decision of the U.S. Supreme Court.
Among the data that St. Regis Paper refused to disclose were copies of reports that the company had earlier made to the U.S. Census Bureau. The reports themselves were protected by the confidentiality assurances in Title 13, the census law. But what was the status of the company’s own copy of that report, which it had retained at the behest of the bureau? (Each form bore the label: “Keep this copy for your files.”) The Supreme Court decided in favor of the trust-busting regulators and insisted the company should turn over its copies.
The Census Bureau and then Congress responded in alarm. By October 1962 a new law assured the confidentiality of such file copies, seemingly placing the needs of the statistical service over other enforcement priorities. Or did it? When advocating for legislative action, the Secretary of Commerce (who oversaw the Census Bureau) made a fascinating case that these statistical data were not fit for use in regulatory actions. Numbers generated for aggregate tables not only shouldn’t be used to judge individuals–they couldn’t be.
It’s worth considering this moment of controversy for the ways it shakes loose unstable assumptions about how it is that numbers and data systems really work in modern societies.
A substantial part of the problem was timing: the bureau wanted to generate annual statistics and to have time to tabulate all its inputs, it needed company figures before those companies were able to reckon out their own precise figures. Another problem had to do with definitions: the Census Bureau had its own uniform accounting systems, but they were often not the same as those employed by companies (requiring anyone who filled out the forms for the bureau to make some quick and dirty translations) or by other federal enforcement agencies. Finally, there was the problem of authority: companies relied on lawyers and accountants to prepare official financial statements, which would go out with executive approval. Bureau submissions, by contrast, were made by lower level staffers and the bureau couldn’t afford to wait for each to get official approval. Official statistics, in other words, were built on estimates, compiled by underlings, and altogether unreliable in any individual case. When all mashed together, however, they generated official facts that proved broadly useful: to the government, in business, and to social scientists.
The story of the secret file copies lays bare the haphazard magnificence of modern data systems. Thrown together with more or less care in all their jury-rigged splendor, they present powerful pictures of the world. Statistics may be a form of “thin description,” as Ted Porter has argued. Still, “The drive for thinness, while often highly technical, is dense with human meaning and leads into unfathomable depths.” Thin descriptions of the world, like the census forms, answer the questions asked by those who seek to know (and usually manage) how society works.
They posit a reality and become, in their own way, real.
SOME LIMITS BUILT IN
Data systems generate models of society and when they take hold, it becomes difficult to see any difference between the model and what it represents. For that reason, it pays to change our situation, to try to find a way to see the system from the outside—or at least from the edges, from places that explicitly question the assumptions at the heart of the model.
The story of the birth of statistics and modern data systems usually starts in seventeenth century England, where a learned haberdasher (named John Graunt) wove reports of plague deaths into a grand picture of the course of disease. Yet this way of telling the story already obscures how much such data systems were being engineered to obscure. In her 2021 book, Reckoning with Slavery, Jennifer L. Morgan shifts the focus back further in time, to the sixteenth century, and away from Europe to Africa, and she asks readers to see the world through the eyes of women, rather than men. These shifts each pose their own difficulties, precisely because the archives we have most ready access to today were constructed to such a history seem, in the Michel-Rolph Trouillot’s phrasing, “unthinkable” and thus difficult to document.
Morgan has persevered, though, revealing how early modern colonial projects set definitions, accounting standards, and frames for thinking with that persist to this day. Some of the most important intellectual work in making modern facts possible involved writing off African practices and institutions. To take one crucial example, European writers asserted the failure of Africans to possess money or understand value, a market failure made manifest by the supposed insufficiency of the cowrie shell as currency. And yet, Europeans still somehow managed to use supposed non-functioning markets and definitely-not-money money to purchase enslaved people and fuel fledgling empires. Curious, that. (I’ve written more about this book here.)
At the same time, Europeans expanded dramatically their practices for putting people into ledgers, according to a fundamental distinction. Africans appeared in account books as people who could be bought and sold. Other people, who would come be to known as “white,” understood themselves as special precisely because they could not be sold. They defined their families according the love and care they showed their kin and the inviolability of their domestic ties, an inviolability cast in stark relief by the way the families of the enslaved could be ripped apart and by the enslavers’ denial (against all evidence) of even the possibility of maternal affection among the enslaved. Morgan argues, convincingly, that when the historian tries to stand in the place of enslaved women, the true limits of modern statistics stand out and we can see again more clearly how what has so long been called simply “rationality” along with dominant systems of accounting rested on the false premises of an emergent anti-Black racial capitalism.^
Back in England, the printed tables of statistical innovators presented productive illusions of order and mastery, over plague in the case of Graunt, or over transatlantic colonial projects as depicted by Virginia Company pamphlets. In Numbered Lives, Jacqueline Wernimont, draws out the emotional and rhetorical power hidden behind the supposed unfeeling numbers of mortality bills and Graunt’s fold-out summary table of casualties or the tables of sums printed to fund the settlement of Virginia. Those who turned to tables as a new, modern form of data visualization did so in a way that called attention to and sought the attention of white men, especially those able to fight or willing to spend. “The aesthetic rationalism of Atlantic colonial accounts helped procolonialists render the often-violent and dangerous settler life as beautiful and controlled” writes Wernimont, and that “arithmetic sublime” further allowed company promoters to transmute unruly adventures broad into a neatly quantifiable opportunity for investment. From their founding era, modern western data systems built a reality, piecemeal, in a hodge-podge, in an expression of the value of some lives over those of others.
One result of proliferating thin description has been an enormous outpouring of feeling and ensuing action: just think about what entire nations have done in the twentieth century for the sake of GDP. That is one of the insights driving Michelle Murphy’s investigations in The Economization of Life. For Benedict Anderson, modern nationalism became possible through print capitalism, which allowed for the widespread construction of an “imagined community.” Imagined, but still very real. The proof of that reality for Anderson was that people—so many people—fought and died in the name of nations. As Murphy points out, measures of national income, when tied to models of population and economic growth, inspired mass sterilization across the world. Computer simulations taught postcolonial leaders to see money value in “averted births” and economic dashboards gave material form to “quantification and social science methods to calibrate and then exploit the differential worth of human life for the sake of the macrological figure of ‘economy.’” In the end, and down to this day, asks Murphy: “What has not been done for GDP?”
Sustained examinations of data systems from their edges expose the vast social spaces and perspectives that they leave out.^^ At the same time, critical inquiries into the history of data highlight the way such systems have shaped lives and societies, often noting the way that access, opportunity, and surveillance play out differently according to race, class, gender, religion, sexuality, or other relevant categories of difference. These avenues of inquiry can give the wrong impression and can falsely amplify the capacities of a data system to do what it claims to do, unless the system’s promises are themselves treated with a healthy skepticism.
ALWAYS JUST GETTING BY
In a 2022 podcast episode from the US-based New York Times, the popular journalist and commentator Ezra Klein sought support from the historian and public intellectual Adam Tooze. Wasn’t the world of today simply too complex to be understood by statistics? Hadn’t something fundamental shifted, leaving an unbridgeable gap between reality and our capacities to represent societies and economies? “Some of these systems,” said Klein, “are now more complicated than human minds can fully grasp.”
Tooze rejected the premise. It seemed to Tooze that Klein was just giving up by asserting a fundamental and unconquerable unknowability that must therefore limit the actions of governments and the judgments of pundits. He suggested Klein’s position shared important features of a form of critique popularized by Friedrich Hayek in the middle of the twentieth century: in Hayek’s judgment, central state planners ran on hubris, making policy on unsupportable knowledge claims, while robbing authority from the one system that really could process the complexity of the social world, the price-driven market economy.^^^^ Klein demurred: “I don’t think I’m as fatalistic as where you could take that.”
Klein’s approach seems to me a likely outcome in societies that tout data-driven objectivity (or “evidence-based policymaking”) without building a robust theory and history of how objective data have been made and have worked in the world. His is one of a handful of common means by which a brittle faith in data can fracture, crumbling in the face of the evidence of error, bias, or uncertainty. And the beneficiaries of that analytical fragility seems likely to be those who already had the most power, money, and data.^^^^^
This exchange among notable opinion-makers in an elite American milieu suggests a possibly surprising reason for the critical approach this essay has so far employed: those who use and rely on social scientific facts should understand that they have always been built from incomplete materials, and yet have often still served important purposes. It is not the case, as Klein suggested, that social or economic systems are only now too complex for our minds to grasp. They always have been.
The interesting thing is to understand how governments, businesses, scholars, and activists have built nonetheless built bodies of data from which they could act in an always-too-complicated world. In a 2008 essay titled “Trouble with Numbers,” Tooze already pointed in this general direction, acknowledging the landmark scholarship of those who called into question the simple objectivity of statistics, and then calling for more work to tells the stories behind particular numbers that made a difference in the world. He wrote: “We should overcome our inhibitions and move from a generalized history of statistics as a form of governmental knowledge to a history of the construction and use of particular facts.”
Or, casting the net a bit more broadly: we should be (and are) writing histories of entire data processes, the circumstances that produced them, and their consequences in making social or economic systems tangible and manipulable.^^^^^^
MORE THAN NUMBERS, OR, A PARADE OF AVALANCHES
The history of any particular number or fact tells one very important part of the story of how a data system responds to and remakes the world. A history centered on printed or digitized final figures offers one “cut” the historian can make, to borrow an approach from the STS scholar Karen Barad. Barad’s work draws its inspiration from Niels Bohr’s quantum interpretations, where Bohr’s theory of complementarity argued that the incapacity to remeasure both momentum and position precisely (as is described in Heisenberg as an epistemological problem) was really a fundamental issue of experimental design: the measuring instruments that could see momentum were different from those that could see position. In Barad’s expansive interpretation, Bohr’s theory offers a way of understanding all scientific work, wherein the tools used for measuring are part of the system that generates any particular phenomenon. We can understand the world precisely, but only by acknowledging the ways our tools of measurement are actively constituting that world.
Instruments for generating statistics or social scientific facts are often very large, extending throughout a society being measured. To understand society, we should look closely at the ways the entire measurement instrument generates different forms or stages of data, from the moment of design to the relative chaos of collection to the centralized control exerted by data cleaners and on to a cacophony of post-publication interpretations. (I apply this model to the US census in Democracy’s Data)
A statistical study generates aggregates, published as numbers, charts, or maps. It also simultaneously creates new ways of defining individuals or describing relations among individuals. This was one of the crucial contributions of the recently departed philosopher Ian Hacking to the history of quantification. In a 1982 essay, “Biopower and the Avalanche of Printed Numbers,” Hacking built on and expanded the theories of Michel Foucault to explain why historians identified an efflorescence of statistical theory in the middle of the nineteenth century.
He began his answer by pointing to an often overlooked precondition: before the math and methods came a looming mounting of data. To explain where all that data came from Hacking identified a kind of spiraling process: first, bureaucracies get worried (about disorder, revolution, you name it) and set commissions that in turn produce reports filled with numbers. From those numbers, analysts discover “laws” and look more data, inspiring new bureaucracies to manage in accordance with those, and those bureaucracies generate more reports and more data and soon enough new laws and categories and worries. Then comes, you guessed it, more data. In an even more famous 2006 essay called “Making Up People,” Hacking emphasized the ways that this statistical cycle could create new kinds of people. But well over 20 years earlier, Hacking’s theory of the avalanche already encompassed the fact that counting entailed the mass production of individual identities alongside social facts.
Hacking began his story with a great numerical avalanche in 1820s Europe. Such a periodization fits best when one is trying to explain, as Hacking was, European statistical theories closely related to human social and psychological behavior. One contest Hacking offered were debates among his contemporaries over what should and should be listed in the Diagnostic and Statistical Manual.
But we would be better off imagining the history of data and social science as a series of avalanches, and they began well before the nineteenth century.
There was the sixteenth century’s plague-counts and colonial outpourings, as I discussed in the context of Morgan and Wernimont’s books. William Deringer in Calculated Values identified an avalanche of numbers (printed in bickering political pamphlets) brought on eighteenth-century parliamentary outsiders trying to corral the ruling party in Britain.
And well after Hacking’s historical window the avalanches have kept on coming, taking on many different forms. Eli Cook, for instance, has argued that the sorts of “moral statistics” that Hacking considered were merely one cascade of numbers following on earlier American efforts at quantifying household production and preceding a turn back toward economic concerns, this time fixated on cotton and capital. The era of international fairs, following London’s Crystal Palace exhibition, elicited its own floods of official numbers and colorful charts, driven too by what Autumn Womack calls a “survey spirit” emanating from settlement houses in poor neighborhoods. National income accounting, which we already touched on with Murphy, was made possible in part by the rise of corporations, which generated the centralized paper work that could make such calculations possible, argues Daniel Hirschman. Growing state capacity also made new kinds of calculations possible, as Emmanuel Didier shows in his reconstruction of agricultural and employment figures as put together by the New Deal administration. The United Nations spurred an mid-twentieth-century program to expand censuses across the world, as Emily Merchant reveals. Arunabh Ghosh’s account of one of the largest (though Soviet-inspired) censuses, in China, points to another grand and very complicated rush of numbers. Amid these massive state-led operations, the peculiar project that Rebecca Lemov called “the Database of Dreams” stands out still for its ambitions, if not for its accomplishments.
Elizabeth Popp Berman’s Thinking Like an Economist recounts a budgetary avalanche. Belt-tightening brought on the war in Vietnam inspired a planning and budgeting system. It has mixed effects in terms of trimming costs or guaranteeing effective policy, but it succeeded magnificently in spreading offices equipped to make economic calculations throughout the government and the private sector. This empire of economists might well be said to have reached its maturity at precisely the moment that a historian of science published a book trying to explain why so many institutions had appeared to cede judgment to quantification. Porter’s Trust in Numbers offered a classic and persuasive argument that weak bureaucrats in democratic governments were particularly likely to turn to mechanical decision criteria. Yet there’s a different explanation for why Porter asked the question in the first place, and for why some AI decision systems continue to be so alluring: it begins with an avalanche of budgetary numbers.
Thank you to Jamie Cohen-Cole for inspiring this essay and to Joanna Radin for early conversations that shaped it.
Some notes… ^For a continued investigation of the ways enslaved lives were manifested in ledgers, and evidence of the sophisticated data systems built for slavery, see Caitlin Rosenthal, Accounting for Slavery: Masters and Management (Cambridge, MA: Harvard University Press, 2018). Daina Ramey Berry resists the reduction of values to the singular perspective of enslavers in Daina Ramey Berry, The Price for Their Pound of Flesh: The Value of the Enslaved, from Womb to Grave, in the Building of a Nation (Boston: Beacon Press, 2017).
^^Of course, limits are necessary to the knowledge making process. The important question for the historian and the activist alike is to understand the limits of an existing system with precision and to investigate how those limits came to be.
^^^For a fascinating parallel case, consider the complex social and technological negotiations that went into getting London stock prices out on ticker tape machines. See John Handel, “The Material Politics of Finance: The Ticker Tape and the London Stock Exchange, 1860s-1890s,” Enterprise and Society 23, no. 3 (2022): 857-887.
^^^^On Hayek’s approach and the multifarious, transnational threads that formed in mid-century to critique planning, see Angus Burgin, The Great Persuasion: Reinventing Free Markets since the Depression (Cambridge, MA: Harvard University Press, 2012).
^^^^^See Naomi Oreskes and Erik M. Conway, Merchants of Doubt: How a Handful of Scientists Obscured the Truth on Issues from Tobacco Smoke to Global Warming (New York: Bloomsbury Press, 2010); Robert N. Proctor and Londa Schiebinger (eds.), Agnotology: The Making and Unmaking of Ignorance (Stanford, CA: Stanford University Press, 2008).
^^^^^^This might sound a bit like a theory of “performativity,” and to an extent it is. But it takes the performativity observed in recent financial markets to be merely a species of how statistics and data systems generally work: Donald MacKenzie and Yuval Millo, “Constructing a Market, Performing Theory: the Historical Sociology of a Financial Derivatives Exchange,” American Journal of Sociology 109:1(July, 2003): 107-145.