Logo image
A Vector Space Model for Koine Greek Lexicography: An Exploration in Linguistic Categorisation
Graduate Thesis/Dissertation   Open access

A Vector Space Model for Koine Greek Lexicography: An Exploration in Linguistic Categorisation

Nicholas List
Master of Arts - MA, University of Otago
University of Otago
2021
Handle:
https://hdl.handle.net/10523/10941

Abstract

Koine Greek lexicography distributional semantics categorisation Word2Vec
Despite its long and well documented history, Koine Greek lexicography has been slow to adopt techniques for lexical analysis that are truly grounded in modern linguistic theory and method. While the publication of Louw and Nida’s Greek-English Lexicon (1988) is often hailed as a linguistic breakthrough in this regard, promising a reassessment of Koine Greek in light of lexical field theory and componential analysis, major theoretical and methodological issues seriously undercut this lexicon’s claims to linguistic rigor. A number of recent advances in distributional semantics and Natural Language Processing (NLP) present promising new directions for lexicographical tasks. This thesis makes use of one such NLP tool, the vector space model Word2Vec (Mikolov et al., 2013). Word2Vec is an unsupervised learning algorithm that assigns vectors to word tokens based on the distributional profile of each token within a corpus. Model outputs are represented in vector space, and a cosine similarity metric can be used to compute similarity between words. This effectively operationalises Zellig Harris’ (1954) distributional hypothesis—the notion that words appearing in similar contexts will have similar meanings. I seek to demonstrate the utility of Word2Vec for Koine Greek lexicography, specifically for issues relating to linguistic categorisation. I show how categorisation based on corpus data cannot be intuited through a process of logical taxonomic delineation. Instead, vector space modelling shows how categorisation reflects prototypical encyclopaedic knowledge. Since Koine Greek is a dead language—methods of introspection and elicitation being unavailable to the lexicographer—vector space modelling offers a uniquely empirical basis for researching Koine Greek categorisation.
pdf
Nicholas List_MA_Thesis_Final.pdfDownloadView

Metrics

763 File views/ downloads
372 Record Views

Details

Logo image