Are there hidden cycles in literary production in the 18th and 19th centuries? Have literary historians missed, for example, telltale repetitions of certain literary forms? Franco Moretti in "Graphs, Maps, Trees" (2003) raises this question in calling for a new kind of research in literary history. This research would take account of the thousands of surviving literary works neglected by current literary history (works now accessible thanks to the digitization of library collections). Rather than focusing on the one or two hundred novels that regularly circulate in research and classrooms, the history of the novel would include the expanse of literary production---tens of thousands of titles. Inspiration and precedents for this approach are found in social history and in the sociology of literature.
"Graphs" also includes a demonstration of what this research might look like. Moretti examines the history of arrival and disappearance of "novelistic genres" in Britain between 1740 and 1900. Examples of these genres include historical, gothic, and epistolary novels, as well as less familiar categories, such as silver fork novels and Newgate novels. Over the 160 year period, Moretti identifies 44 genres. And he detects a unexpected pattern: genres seem to arrive and depart in clusters.
I'm interested in reconstructing Moretti's list of 44 novelistic genres. (For those interested in the details of Moretti's argument, the best place to start is Cosma Shalizi's essay.) At first glance this task looks simple, since Moretti provides a list of sources for the genres and their periodizations in an appendix. After looking closely, however, I ran into a number of obstacles. Working through them provided a taste of the methodological challenges in store if we take Moretti's research agenda seriously—as I think we should.
44 British Novelistic Genres
To assemble the list, Moretti and Brad Pasanek surveyed over a hundred expert studies on 18th and 19th century British novels. When experts disagreed about the precise periodization of a genre, Moretti "opted for the periodization arising out of the more convincing morphological argument" (18).1 They visualized their combined list of periodizations in a figure (reproduced here as figure 1).
Table 1: Sample of British Novelistic Genres, 1740-1900
In the data Moretti detects an unexpected pattern: genres seem to appear and disappear in clusters. He writes:
Forty-four genres over 160 years; but instead of finding one new genre every four years or so, as a random distribution would have it, over two thirds of them cluster in just thirty years, divided in six major bursts of creativity: the late 1760s, early 1790s, late 1820s, 1850, early 1870s, and mid-late 1880s. And the genres also tend to disappear in clusters: with the exception of the turbulence of 1790-1810, a rather regular changing of the guard takes place, where half a dozen genres quickly leave the scene, as many move in, and then remain in place for twenty-five years or so. (GMT, 18)
To make the point even clearer, I've used the dates mentioned to highlight these six "bursts of creativity" (figure 2).
Let's leave to one side two questions: how much credit to give this eyeball detection of clusters and what might explain the clustering (if it stands up to scrutiny). Here I want to focus on task of reconstructing the 44 specific periodizations—e.g. 1789-1805 for the Jacobin novel or 1814-1848 for historical fiction. I'll also bracket the question of defining "novelistic genre," not because it isn't a vital question but rather because I think the simplicity of Moretti's dataset is one of its virtues. We should care about the reconstructability of the periodizations for two reasons. First, any corrections or adjustments might well diminish the observed clustering of the genres, weakening important claims in "Graphs" (see pp. 17-30). Second, for interested students and researchers any difficulty in reconstructing the data is unwelcoming. Those who, having read through "Graphs," want to pursue the project further will likely turn to the expert studies cited. Differences between Moretti's periodizations and those of the genre experts he cites risk creating confusion. For a nascent research field within literary studies, this would be regrettable.
And the periodizations are indeed difficult to reconstruct. The primary challenge comes from discrepancies between Moretti's periodizations and those of the experts he cites. For example, Adburgham (1983), gives 1814-1840 as the period for the silver fork novels and Kelly (1976) gives 1780-1805 for the Jacobin novels, but "Graphs" reports 1825-1842 and 1789-1805 respectively. I trust Moretti has good reasons for reporting different periodizations. I can imagine excellent reasons for modestly different periodizations for all of the genres on the list. For example, two experts may disagree on whether a specific novel published in the early days of a category counts as a member of the category or as some kind of related (but distinct) precursor.2 In any case, the differences between Moretti's periodizations and those of the experts cited in the appendix make a quick reconstruction of the list of 44 genre periodizations used in "Graphs" impossible.
Two additional concerns also lurk in the background. First, although Moretti provides the list of expert studies he used and tells us how he settled on these particular studies (he opted for experts making the best "morphological argument" for their periodization), he doesn't divulge what the relevant morphological arguments were or even which experts disagreed (with one exception: Gallagher over Cazamian for the industrial novels). Without additional research, we have no way of knowing, for example, if the balance of scholars writing on the Regional Novel argue for a periodization that "Graphs" rejects. This situation is unfortunate. Moretti tells us that one of quantitative research's virtues is that it provides data which is "ideally independent of interpretations" (9) yet the entire dataset depends on his interpretation of which expert periodizations make the better morphological argument.
The second concern is the possibility, also discussed by Shalizi, that experts themselves may consciously or unconsciously adjust the start and end dates of periodizations towards "focal dates"—such as convenient or historically significant years (years ending a decade or 1789, 1848, etc.). Such adjustments could easily increase the perception of clustering. Let's look at possible end-of-decade adjustments—e.g. the courtship novels with a periodization of 1740 to 1820. We can get a sense of the prevalence of such adjustments by examining the frequency of the final digit of the ending year of the 44 periodizations. (The digit '0' corresponds to years like 1820, '1' corresponding to years like 1841, and so forth.) I've plotted these frequencies in figure 3. There is no reason why any ending digit should appear more often than any other (unless some truly extraordinary forces are acting on novelistic production), so the frequency of each digit should be distributed uniformly and randomly, with an expected value around 4 or 5 occurrences (44 / 10 final digits = 4.4). Rather than around 4 or 5 occurrences of the digit '0', however, there are 10. Given the assumptions, this counts as unusual.3
An Alternative Periodization
One in four genres ending on a decade boundary seems too neat. I'm convinced that some periodizations are adjusted. Motivated by concerns about biases in the 44 periodizations, I've been considering a method that allows for anyone to reconstruct a genre's period given only the list of novels in a category.
Consider characterizing a novelistic genre's period by the interval which includes the genre's peak publication year(s) as well as 95% of its novels. Call it the genre's 95%-period. This approach allows anyone to calculate a periodization while preserving the intuition that a novelistic genre's period is the time when its morphology is in wide circulation. Of course those using this method would still have to trust experts for the relevant lists of novels, but this seems preferable to trusting experts for the periodization tout court. Preferable because it is easier for an individual to assess whether or not a specific novel exhibits a certain morphology than to come up with a periodization for an entire category. None of the genres are so mysterious that an interested student with some knowledge of British history and a handful of exemplars could not evaluate the case for the inclusion of a novel in a particular category.4
What do these 95%-periods look like? Let's consider the gothic novels and the silver fork novels. We have two comprehensive bibliographies listing 394 gothic novels (Lévy 1968) and 99 published silver fork novels (Adburgham 1983). Constructing the 95% intervals is straightforward (figures 4 and 5).
For the gothic novels, the 95% period is 1789-1821. This is surprisingly close to 1790-1820, the period Moretti uses, citing Garside's "The English Novel in the Romantic Era". For the silver fork novels, the 95% period is 1822 to 1842. Compare this with Adburgham's 1814-1840 and Moretti's 1825-1842.
Do these two adjustments in the periodizations of silver fork and gothic novels make the clustering more or less pronounced? Shalizi uses the average inter-arrival period to assess clustering. An inter-arrival period is simply the number of years that pass between one genre arrival and the next, and the average inter-arrival period offers a convenient summary of how clustered a sequence of arrivals is.5 While the two minor adjustments for the silver fork and gothic novels do yield different inter-arrival periods, the average does not change—which isn't an big surprise, the changes being so minor. By this measure, the two revised periodizations do not increase or decrease the clustering. The revisions do, however, knock off one of the more suspicious periodizations: 1790-1820 for gothic novels. The new periodization, 1788-1821, is an improvement. And anyone can use this method to reproduce the periodization.
Ideally, I would like to recalculate the periodizations for all 44 of the genres with this 95% method. It would be fascinating to see if the clustering remained. Unfortunately few expert studies have comprehensive bibliographies like those of Adburgham and Lévy. Such an undertaking is not impossible, but it would take time. For anyone interested in such a project, there are considerable resources available, including the superb bibliographic databases of Garside and Bassett.
I found this exercise worthwhile. Needless to say, browsing expert studies covering 160 years of novelistic production tends to increase one's comfort with the idea of a literary longue durée. There's also something to be said for wrestling with data that comes in the form of periodizations. A list of periods can be viewed as describing a process consisting of a sequence of arrivals and departures (or births and deaths). This kind of data occurs often in historical research.
Some have questioned the purpose of this exercise. The stakes may seem awfully low given the amount of time required to sort and sift even this modest amount data. I hang around a skeptical bunch of humanities graduate students and professors, many of whom are unimpressed by the methods of the social and natural sciences, fields where reconstruction of data or results is generally held to be a virtue. They point out, rightly, that contemporary scientific practice all too often doesn't involve or encourage reproducing results. Checking someone's data isn't a requirement for collaboration or for working on a research project.
I've settled on the following justification for my desire to be able to reconstruct data and results in quantitative literary history: any research program benefits from some "worked examples" that are reproducible in detail. Reproducing a specific result is one way for others to learn about the object and methods of a research program. Such results may be modest. In the natural sciences, consider the experiments done in first year undergraduate chemistry laboratory classes. In literary studies I think some of the canonical exercises of "close reading" in a poetry class serve a similar purpose. One might also consider a student's experience of "reconstructing" the reading of Plato's Phaedrus described by reader-response criticism.6 Having these detailed exemplars is pedagogically important, as Kuhn (1970) observes. Since quantitative sociology of literature is a nascent field, it stands to benefit from having detailed, reproducible examples that illustrate the methods and objects of the research program. If the dataset and analysis in "Graphs, Maps, Trees" could be improved and somehow combined with the corrections that Shalizi offers, it could become such an exemplar.
Adburgham, Alison. 1983. Silver Fork Society. London: Constable.
Bassett, Troy J. 2011. “At the Circulating Library.” URL http://www.victorianresearch.org/atcl.
English, James F. 2010. “Everywhere and Nowhere: The Sociology of Literature After “the Sociology of Literature”.” New Literary History, 41(2): v-xxiii. URL doi:10.1353/nlh.2010.0005.
Garside, P. D., J. E. Belanger, and S. A. Ragaz. 2004. “British Fiction, 1800-1829: A Database of Production, Circulation & Reception.” URL http://www.british-fiction.cf.ac.uk/.
Goodwin and Holbo, eds. 2011. Reading Graphs, Maps, and Trees: Responses to Franco Moretti. Parlor Press. URL http://www.parlorpress.com/moretti.
Kuhn, Thomas. 1970. The Structure of Scientific Revolutions. Chicago: University of Chicago Press.
Moretti, Franco. 2003. Graphs, Maps, Trees: Abstract Models for Literary History—1. New Left Review 24: 67-93.
———. 2005. Graphs, Maps, Trees. London: Verso.
R Development Core Team. 2010. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org.
Shalizi, Cosma. 2006. “Graphs, Trees, Materialism, Fishing.” URL http://www.thevalve.org/go/valve/article/graphs_trees_materialism_fishing/ (accessed 2009-09-08).
Tuchman, Gaye and Nina E. Fortin. 1984. “Fame and Misfortune: Edging Women Out of the Great Literary Tradition.” The American Journal of Sociology, 90(1): 72-96.
Wickham, Hadley. 2009. ggplot2: elegant graphics for data analysis. Springer New York. URL http://had.co.nz/ggplot2/book.
Data and R code
Moretti's periodizations and the bibliographies of the silver fork and gothic novels are available the following address in machine readable format: https://github.com/ariddell/datasets. The R code for figures is also available.
What morphology might define a "novelistic genre" is an important question. Moretti says they are "morphological arrangements that last in time, but always for some time" (17). But what kind of morphological arrangements? On what basis are the Jacobin or the Chartist novels considered the same kind of population as the historical novels? These questions need to be addressed at some point. ↩
I'm sympathetic to the idea that even well-informed readers might differ on a genre's start and end dates. People may agree on the general features of a category while differing on specifics such as the inclusion of specific novels or authors in a category. This could explain why Adburgham considers novels published between 1814 and 1824, including O'Donnell by Lady Morgan (1814) and Sayings and Doings by Theodore Hook (1824) relevant for the periodization whereas Moretti does not. Having some familiarity with the silver fork novels, I suspect a later year like 1825 was chosen as the starting date because it better reflected the period during which the genre became a "market category"—and was marketed as such by the main publisher of the novels, Henry Colburn. ↩
With these assumptions, we expect the frequency of each digit to behave like a random variable with a binomial distribution (parameters n = 44 and p = 1/10). In this case, the odds of observing a frequency of 10 or more are 97 to 1 against. In
Rone can calculate this probability with:
Here is Adburgham's description of Thomas Lister's Granby (1826), a novel having the "essential facets" of the silver fork novel: "[T]here are some politics, some gambling scenes and a duel; there are dazzling balls in the London season, and country-house parties in the winter; the characters include a dandy, a toad-eater, a scheming high-society villain, a pair of lovers ill-starred until towards the end of the third volume. There are social climbers clambering towards Almack's, provincial belles at a race meeting ball in Doncaster Assembly Rooms; there is satire at the expense of the middle class and the rich roturiers. But above all, there are semi-flirtatious drawing-room conversations and dinner-table repartee." (92-3) ↩
Smaller inter-arrival periods are associated with clustering. For example, the period 1840-1850 witnesses the arrival of 10 novelistic genres in 1846 (2 genres), 1847, 1848, 1849, and 1850 (3 genres). The 9 inter-arrival periods are 0,0,1,1,1,1,0,0,0 years. So the average inter-arrival period is 4/9 or approximately .4. Had there been a genre appearing every year starting in 1841, the interarrival periods would all be 1 year, yielding an average of 9/9 = 1. ↩
Of all the ways to communicate the idea that the boundaries of "literature" are porous and that many objects can be productively understood as a "text." Consider, for example, Leo Spitzer's brilliant reading of a poster advertising Sunkist oranges. ↩