December 16, 2009

Mining Ancient Texts

Researchers will create a database to track topics in a million primary and secondary Greco-Roman documents

By Suzanne McInroy

A Tufts classicist will lead a team of international researchers in exploring how scholars in the humanities can use data analysis to track topics about the Greco-Roman world as they appear in a million documents, spanning thousands of years. Gregory Crane, a professor and chair of classics in the School of Arts and Sciences, will be joined by researchers from three other universities on the project, which will be funded by one of eight recently awarded Digging into Data Challenge grants.

Using existing collections of Greek and Latin texts, the researchers will create a system that uses data mining—a process that employs computerized analysis to search large amounts of data—to determine patterns and other connections in the primary and secondary sources. Researchers will be able to connect various editions and translations of classical texts to identify differences as well as possible errors in translations. In addition, they plan to create a database that lists when Greek and Latin authors or particular works are mentioned, quoted or alluded to in other sources spanning more than 2,000 years.

“This Digging into Data work will pave the way for undergraduate research,” says Crane, the Winnick Family Chair of Technology and Entrepreneurship at Tufts. “Students have access to tons of materials, but this project will provide tools that will enable a new generation of undergraduate research in the classics and the humanities,” says Crane.

“This database will allow young scholars to locate patterns in primary and secondary sources and then add their own critical thinking to go beyond the visualizations to determine what the patterns mean,” he says.

Crane is also the developer and editor-in-chief of the Perseus Project, a separate electronic database on Archaic and Classical Greece that was designed to expand the ways in which ancient Greek literature, history, art and archaeology can be studied.

Other researchers involved on the Digging into Data project include John Darlington from Imperial College in London, Bruce Robertson from Mount Allison University in Canada, and David A. Smith and David Mimno from the University of Massachusetts, Amherst.

The Digging into Data Challenge is an international grant competition sponsored by four leading research agencies in three countries: the Joint Information Systems Committee from the United Kingdom, the National Endowment for the Humanities and the National Science Foundation from the United States and the Social Sciences and Humanities Research Council from Canada.

Applicants were asked to answer the question, “What do you do with a million books?” The grant recipients were announced on December 3. The eight winning teams were selected from applications submitted by 22 scholars and scientists from the U.S., Canada and the United Kingdom. Each team includes researchers from at least two of the participating countries. With their awards, the teams will demonstrate how data mining and data analysis tools now used in the sciences can improve scholarship in the humanities and social sciences. The total project funding by all four agencies for all the projects is approximately $2 million; the Tufts team will receive $300,000.

Suzanne McInroy can be reached at

Article Tools

emailE-mail printPrint