Data mining as literary criticism

December 18, 2009 by

At THATCamp, I will be displaying Distant Readings I (text visualization, 2009), an installation that explores the aesthetics of data mining.  “Distant reading” is the term invented by the literary critic Franco Moretti to mean the opposite of “close reading,” which is the very focused analysis of the elements of a single text.  With “distant reading,” Moretti means identifying—and visually depicting– larger patterns in a text, and also patterns among a large group of texts.

“Distant reading” is a kind of data mining, if we understand that latter term to mean extracting patterns from data.  For my installation, I will use Wordle to “read” a number of classic texts as word clouds; in arranging the words in a text according to their frequency of occurrence, we will be reading in another manner.  The installation will juxtapose several such data-mined texts  next to each other, in order to see at a distance patterns in these texts that we otherwise could not see/read; extracting new levels of meaning from these texts by reading them in a distant fashion. 

I will ask all readers/viewers of the installation: what patterns do you see in the data?  Is this reading?  I will be eager for your feedback after you have read the installation.

8 Responses to Data mining as literary criticism

  1. jamesdcalder on December 18, 2009 at 6:25 pm

    Very cool! This is an interesting combination for literary theory, art and digital humanities. I’d also like to learn more about Franco Moretti, when did he write and do you think he would have seen something like this as fitting into his theoretical framework?

  2. brooke on December 18, 2009 at 8:44 pm

    hi dave! it will be interesting to see what you came up with this time!

  3. David Staley on December 18, 2009 at 9:44 pm

    Hi Brooke,

    Hey, that’s right, you are thoughtful Wordle user as well! Hope I’m able to match your standards…


  4. Erin Bell on December 22, 2009 at 11:36 am

    Hi David, this sounds pretty cool. I actually had a similar idea for using Wordles to analyze content across each of the national and regional THATCamp websites to see what common/unique themes emerge. I’m a big fan of Wordle and think it’s a great format for visualizing textual patterns.

    I have not read any of Moretti’s writing (though I plan on it now), but from what I can tell, his early work in this area was met with some gasps of disbelief by literary academics who disapproved of this scientific/quantitative approach to the classics. To my mind (not being a literature expert), this kind of analysis seems like a breakthrough that opens up a whole new area of inquiry in a field that, in my estimation, seems to have been more or less static over centuries (not that ideas and approaches in the “traditional” study of literature haven’t changed, only that this seems hugely different).

    Considering some of the academic backlash Moretti received (along with popular praise), I think we can begin to think about Digital Humanities as a field that is kind of “in between”. (I’m arbitrarily associating Moretti with Digital Humanities because I think it’s instructive). To what extent does pursuing this kind of work put you at odds with the “traditional” view of the humanities? How does Digital Humanities — largely still a self-proclaimed, rather than accredited field — fill in a gap within the humanities where things like visualization, textual analysis, and other kinds of quantitative and technical approaches may be met with indifference, skepticism, or hostility? Does Digital Humanities have a broader audience than traditional scholarship or just a more dispersed one? Is it a scholarly audience or public or both?

    This comment got a bit out of hand and is based on very limited information about Moretti. Let that serve as a disclaimer to any bad ideas on my part, but also an indication of your excellent choice of topic!

    Looking forward to another great installation. Let Jim or I know about how you would like to set this up (equipment, placement, etc.).

  5. Lewis Ulman on January 1, 2010 at 9:57 am

    Hi, Dave. Can’t wait to hear more! In my electronic textual editing course this winter, my grad students and I will be working with Laura Mandell (Miami U of Ohio — see her session for THATCamp) on text visualizations. Care to join in the fun?

  6. Boone Gorges on January 12, 2010 at 1:45 pm

    Sounds like a cool idea for a session. Like some of the other commenters, I’m a fan of Wordle. I’m a bit skeptical about using it as a jumping off point for critical analysis of texts, as I think that word frequency is somewhat of a superficial metric for finding or imposing deeper meaning on something. But I would love to hear arguments to the contrary. It’d be especially interesting to talk about the extent to which these kinds of visualization strategies can and cannot be applied to texts from different domains (fiction, blog posts, essays, etc).

  7. skuceyeski on January 12, 2010 at 6:14 pm

    Hi Dave! Looking forward to seeing you again and hearing all about this project. I think it has real applications in some of our TAH grants, although it may really blow some of our teachers minds!

  8. laura mandell on January 12, 2010 at 10:21 pm


    This is a great panel proposal.

    Also, I have used both wordle and juXta on the text of Frankenstein as a way of combining close reading with distant reading.

    I put up a panel about my poetry visualizations, but maybe we could combine?

    Best, Laura

Skip to toolbar