Usenet newsgroups’ profile analysis utilising standard and non-standard statistical methods
Sallis, Philip; Kassabova, Diana
The paper explores building profiles of Newsgroups from a corpus of Usenet E-mail messages employing some standard statistical techniques as well as fuzzy clustering methods. A large set of data from a number of Newsgroups has been analysed to elicit some text attributes, such as number of words, length of sentences and other stylistic characteristics. Readability scores have also been obtained by using recognised assessment methods. These text attributes were used for building Newsgroups’ profiles. Three newsgroups, each with similar number of messages were selected from the processed sample for the analysis of two types of one-dimensional profiles, one by length of texts and the second by readability scores. Those profiles are compared with corresponding profiles of the whole sample and also with those of a group of frequent participants in the newsgroups. Fuzzy clustering is used for creating two-dimensional profiles of the same groups. An attempt is made to identify the newsgroups by defining centres of data clusters. It is contended that this approach to Newsgroup profile analysis could facilitate a better understanding of computer-mediated communication (CMC) on the Usenet, which is a growing medium of informal business and personal correspondence.
Publisher: University of Otago
Series number: 97/11
Research Type: Discussion Paper
Please note that this is a searchable PDF derived via optical character recognition (OCR) from the original source document. As the OCR process is never 100% perfect, there may be some discrepancies between the document image and the underlying text.