Tools and bibligraphy

Links to tools

Corpus Linguistics MOOC run by Lancaster University (next session in September)

GraphColl – free downloadable tool for creating visual representations of collocational networks from corpora

KWords (V. Cvrček and P. Vondřička)  - a web-based keyword analysis tool for Czech and English.

SYN2015 - a representative corpus of written Czech. The Institute of the Czech National Corpus, Charles University in Prague.

Laurence Anthony’s website which includes several free downloads for corpus and text/file manipulation software including AntConc, AntPConc, FireAnt, ProtAnt


Paper Machines

Sketch Engine – online tool for analysing large corpora including clever ways of looking at grammatical relationships within collocates

Straka, M. and J. Straková. MorphoDiTa: Morphological Dictionary and Tagger, LINDAT/CLARIN digital library at Institute of Formal and Applied Linguistics, Charles University in Prague.

WordSmith – the classic corpus analysis software by Mike Scott


Baker, P. and McEnery, T. (eds) (2015) Corpora and Discourse: Integrating Discourse and Corpora. London: Palgrave.

Baker, P. (2014) Using Corpora to Analyse Gender. London: Bloomsbury.

Baker, P. Gabrielatos, C. and McEnery. T. (2013) Discourse Analysis and Media Attitudes: The Representation of Islam in the British Press. Cambridge: Cambridge University Press.

Baker, P. (2010) Sociolinguistics and Corpus Linguistics. Edinburgh: Edinburgh University Press.

Baker, P. (ed.) (2009) Contemporary Corpus Linguistics. London: Continuum.

Baker, P. (2006) Using Corpora in Discourse Analysis. London: Continuum.

Goldstone, Andrew and Ted Underwood. "What can topic models of _PMLA_ teach us about the history of literary scholarship?" The Stone and the Shell.

Graham, Shawn, Scott Weingart, and Ian Milligan. "Getting Started with Topic Modeling and MALLET" Programming Historian

Hunston, S. Corpora in Applied Linguistics (2002). Cambridge: Cambridge University Press.

Journal of Digital Humanities. (2002) 2(1)  (the entire issue devoted to topic modeling)

McEnery, T., R. Xiao, R. and Y. Tono. (2006) Corpus-Based Language Studies.

Partington, A., Duguid, A. and Taylor, C. (2013) Patterns and Meanings in Discourse: Theory and Practice in Corpus-Assisted Discourse Studies (CADS). Amsterdam: John Benjamins.

Rhody, Lisa M. "Some Assembly Required: Understanding and Interpreting Topics in LDA Models of Figurative Language." Lisa Marie Rhody.

Stubbs, M. (1996) Texts and Corpus Analysis. London: Blackwell.

Underwood, Ted. "Topic modeling made just simple enough." The Stone and the Shell.

Weingart, Scott. "Topic Modeling for Humanists: A Guided Tour." The scottbot irregular.