A Vector Space Model for Koine Greek Lexicography: An Exploration in Linguistic Categorisation
Despite its long and well documented history, Koine Greek lexicography has been slow to adopt techniques for lexical analysis that are truly grounded in modern linguistic theory and method. While the publication of Louw and Nida’s Greek-English Lexicon (1988) is often hailed as a linguistic breakthrough in this regard, promising a reassessment of Koine Greek in light of lexical field theory and componential analysis, major theoretical and methodological issues seriously undercut this lexicon’s claims to linguistic rigor. A number of recent advances in distributional semantics and Natural Language Processing (NLP) present promising new directions for lexicographical tasks. This thesis makes use of one such NLP tool, the vector space model Word2Vec (Mikolov et al., 2013). Word2Vec is an unsupervised learning algorithm that assigns vectors to word tokens based on the distributional profile of each token within a corpus. Model outputs are represented in vector space, and a cosine similarity metric can be used to compute similarity between words. This effectively operationalises Zellig Harris’ (1954) distributional hypothesis—the notion that words appearing in similar contexts will have similar meanings. I seek to demonstrate the utility of Word2Vec for Koine Greek lexicography, specifically for issues relating to linguistic categorisation. I show how categorisation based on corpus data cannot be intuited through a process of logical taxonomic delineation. Instead, vector space modelling shows how categorisation reflects prototypical encyclopaedic knowledge. Since Koine Greek is a dead language—methods of introspection and elicitation being unavailable to the lexicographer—vector space modelling offers a uniquely empirical basis for researching Koine Greek categorisation.
Advisor: Overall, Simon; Marcar, Katie
Degree Name: Master of Arts
Degree Discipline: English and Linguistics
Publisher: University of Otago
Keywords: Koine Greek; lexicography; distributional semantics; categorisation; Word2Vec
Research Type: Thesis