Foundational monographs & books
-
Moretti, F. (2013). Distant Reading. Verso.
— Seminal short book collecting Moretti’s essays that popularized “distant reading” and network/graph approaches to literary history. -
Jockers, M. L. (2013). Macroanalysis: Digital Methods and Literary History. University of Illinois Press.
— Practical, method-focused treatment of large-scale literary computing and macroanalysis. -
Ramsay, S. (2011). Reading Machines: Toward an Algorithmic Criticism. University of Illinois Press.
— Reflexive, methodological argument for integrating computational procedures into literary interpretation. -
Manovich, L. (2020). Cultural Analytics. MIT Press.
— Presents visual/cultural analytics methods (esp. for image/visual corpora) that strongly overlap with computational comparative work. -
Underwood, T. (2019). Distant Horizons: Digital Evidence and Literary Change (if you want interpretive+method syntheses — see his articles and public scholarship for methodology and case studies). (see Underwood articles below)
Influential papers
Aiden, E. L., & Michel, J.-B. (2013). Uncharted: Big Data as a Lens on Human Culture. Riverhead Books.
Algee-Hewitt, M., et al. (2016). Canon/archive: Large-scale dynamics in the literary field.
Baillot, A., et al. (2018). Scholarly digital editions and computational reuse.
Bamman, D., Underwood, T., & Smith, N. A. (2014). A Bayesian mixed effects model of literary character. In Proceedings of ACL 2014.
Bizzoni, Y., & Lappin, S. (2018). Modeling literary characters using distributional semantics.
Bizzoni, Y., et al. (2014). Character interaction modeling in European theatre.
Bode, K. (2012). Reading by Numbers: Recalibrating the Literary Field. Anthem Press.
Bode, K., & Dixon, R. (2009). Resourceful reading.
Borek, L., et al. (2016). Open science and FAIR data in European literary corpora.
Börner, I., et al. (2020). Linked open data and the DraCor API.
Brody, S., & Lapata, M. (2008). Unsupervised aspect modeling for literary texts.
Burnard, L., et al. (2020). ELTeC Encoding Guidelines (Level 0).
Casanova, P. (1999). La République mondiale des lettres. Seuil.
CLS INFRA Consortium. (2021–2025). Computational Literary Studies Infrastructure. European Union Horizon Programme.
COST Action CA16204. (2017–2022). Distant Reading for European Literary History.
Craig, H., & Kinney, A. (Eds.). (2009). Shakespeare, Computers, and the Mystery of Authorship. Cambridge University Press.
Damrosch, D. (2003). What Is World Literature? Princeton University Press.
DARIAH-EU. (2014–). Digital Research Infrastructure for the Arts and Humanities.
Digital Humanities Quarterly. (Various issues).
Digital Scholarship in the Humanities. (Various issues).
Eder, M. (2016). Rolling stylometry. Digital Scholarship in the Humanities, 31(3), 457–469.
Eder, M., Rybicki, J., & Kestemont, M. (2016). Stylometry with R: A package for computational text analysis. The R Journal, 8(1), 107–121.
Edmond, J., et al. (2020). Research infrastructures and digital humanities in Europe.
ELTeC Consortium. (2020–). European Literary Text Collection.
Evert, S., et al. (2017). Understanding topic models in literary research.
Fischer, F., et al. (2017). Le drame comme réseau de relations: L’analyse de réseau appliquée à l’histoire du théâtre. Revue d’historiographie du théâtre.
Fischer, F., et al. (2019). Programmable corpora: Introducing DraCor. In Proceedings of DH 2019.
Fischer, F., & Börner, I. (2021). Linked data modeling in DraCor.
Fiormonte, D., et al. (2015). The Digital Humanist: A Critical Inquiry.
Fokkens, A., et al. (2014). Computational modeling of narrative perspective.
Gerlach, M., & Font-Clos, F. (2019). A standardized Project Gutenberg corpus for quantitative literary studies.
Gold, M. K. (Ed.). (2012). Debates in the Digital Humanities. University of Minnesota Press.
Heuser, R., & Le-Khac, L. (2012). A quantitative literary history of 19th-century fiction.
Hoover, D. L. (2004). Testing Burrows’s Delta. Literary and Linguistic Computing.
Hoover, D. L. (2007). Corpus stylistics and authorship attribution.
Jacobs, A. M. (2015). Towards a neurocognitive poetics model.
Jannidis, F., et al. (2015). Improving Burrows’s Delta. Digital Humanities Quarterly.
Jannidis, F., Kohle, H., & Rehbein, M. (Eds.). (2017). Digital Humanities: Eine Einführung.
Jockers, M. L. (2013). Macroanalysis: Digital Methods and Literary History. University of Illinois Press.
Juola, P. (2013). Authorship attribution. Foundations and Trends in Information Retrieval.
Kestemont, M. (2014). Function words in authorship attribution: From black magic to theory? In CLFL Workshop Proceedings.
Kestemont, M., et al. (2018). Cross-domain authorship attribution.
Koppel, M., Schler, J., & Argamon, S. (2009). Computational methods in authorship attribution.
Kuhn, J., et al. (2018). Computational linguistics and literary corpora.
Labatut, V., & Bost, X. (2019). Extraction and analysis of fictional character networks.
Liu, A. (2013). Where is cultural criticism in the digital humanities? In Debates in the Digital Humanities.
Luyckx, K., & Daelemans, W. (2011). Authorship attribution and verification.
Manovich, L. (2020). Cultural Analytics. MIT Press.
Michel, J.-B., et al. (2011). Quantitative analysis of culture using millions of digitized books. Science, 331(6014), 176–182.
Moretti, F. (2000). Conjectures on world literature. New Left Review, 1, 54–68.
Moretti, F. (2013). Distant Reading. Verso.
Odijk, J. (2016). CLARIN infrastructure and literary research.
Odebrecht, C., et al. (2019). ELTeC: Building a European literary corpus. DH 2019.
Patras, R., et al. (2022). Named entity linking in ELTeC corpora. In LDL Workshop (ACL).
Piper, A. (2015). Novel deviance and genre change.
Piper, A. (2018). Enumerations: Data and Literary Study. University of Chicago Press.
Plecháč, P. (2018). Czech verse stylometry.
Ramsay, S. (2011). Reading Machines: Toward an Algorithmic Criticism. University of Illinois Press.
Reiter, N. (2015). Automated analysis of plot structure.
Reiter, N., et al. (2017). Computational approaches to narrative modeling.
Rybicki, J. (2012). The great mystery of the (almost) invisible translator. Digital Humanities Quarterly.
Sahle, P. (2016). Digitale Editionswissenschaft.
Schöch, C. (2013). Topic modeling genre in French drama.
Schöch, C., & Eder, M. (Eds.). (2020–2022). The Distant Reading Compendium.
Schöch, C., et al. (2021). Creating the European Literary Text Collection (ELTeC): Challenges and perspectives. Modern Languages Open.
Schreibman, S., Siemens, R., & Unsworth, J. (Eds.). (2004/2016). A Companion to Digital Humanities.
Skorinkin, D., et al. (2021). Network analysis of European dramatic traditions.
Sprugnoli, R., & Tonelli, S. (2019). Event extraction in historical texts.
Tangherlini, T. (2013). Big data and folklore studies.
Terras, M., Nyhan, J., & Vanhoutte, E. (Eds.). (2013). Defining Digital Humanities.
Trilcke, P. (2013). Netzwerkanalyse dramatischer Texte.
Trilcke, P., et al. (2015). Social network analysis in dramatic texts. Digital Humanities Quarterly.
Underwood, T. (2016). The life cycles of genres. Journal of Cultural Analytics.
van Cranenburgh, A. (2018). Genre classification in Dutch novels.
van Zundert, J. (2012). If you build it, will we come? Large-scale digital infrastructures in DH.
Major projects, corpora & infrastructures (essential for comparative + computational work)
-
ELTeC (European Literary Text Collection) — balanced, comparable national subcorpora (TEI) intended specifically for comparative, multilingual distant-reading research; curated corpora across many European languages.
-
DraCor — The Drama Corpora Project — open infrastructure / API for >4,000 TEI-encoded dramatic texts (Ancient → 20th C); emphasizes “programmable corpora.”
-
PoeTree — open infrastructure for computational analysis of poems from several traditions
-
Eighteenth Century Collections Online (ECCO / ECCO-TCP) — large archive of 18th-century English-language printed works; used in long-span literary-historical corpora.