Extracting Features and Sentiment from Text Posts and Comments Relating to Polycystic Ovary Syndrome

Rebecca H.K. Emanuel; Paul D. Docherty; Helen Lunt; Rebecca E. Campbell; Knut Moeller

doi:10.1016/j.ifacol.2024.11.005

Back

Extracting Features and Sentiment from Text Posts and Comments Relating to Polycystic Ovary Syndrome

Journal article

Open access

Peer reviewed

Extracting Features and Sentiment from Text Posts and Comments Relating to Polycystic Ovary Syndrome

Rebecca H.K. Emanuel, Paul D. Docherty, Helen Lunt, Rebecca E. Campbell and Knut Moeller

IFAC-PapersOnLine, Vol.58(24), pp.19-24

2024

DOI: https://doi.org/10.1016/j.ifacol.2024.11.005

Handle:

https://hdl.handle.net/10523/43764

Abstract

artificial intelligence

convolutional neural networks

machine learning

polycystic ovary syndrome

social media research

Text classification

women's health

The PCOS subreddit is a cache of posts and comments detailing people's experiences with polycystic ovary syndrome (PCOS). This paper details an ensemble machine learning approach to extract feature and sentiment information relating to PCOS from the subreddit. Ensemble classifiers, which utilized CNNs, key word searches, and Bayesian theory, were created. Individual outputs from the pieces of the ensemble classifier were weighted using their specificities or sensitivities on the testing dataset and added together. Thresholds were calculated using probability theory to decide how high an output needed to be for a feature to be deemed present in the input text. The machine learning output labels were randomly sampled for each feature to calculate precision. Overall, most features of interest were able to be identified with suitably high precision. Over 100 different features were identified among the users, leading to hundreds of thousands of feature labels in the user dataset. Sentiment classification CNNs were also created and typically performed with high accuracy on the testing datasets. A complete dataset of approximately 100,000 PCOS subreddit users, the list of features they presented with, and the sentiments they expressed, was created. This large and detailed dataset has significant clinical potential.

Files and links (2)

pdf

1-s2.0-S2405896324021323-main461.76 kBDownload View

Published (Version of record)CC BY-NC-ND V4.0, Open Access

url

https://doi.org/10.1016/j.ifacol.2024.11.005View

Published (Version of record) Open

Metrics

4 File views/ downloads

12 Record Views

Details

Record Identifier: 9926653716501891
Title: Extracting Features and Sentiment from Text Posts and Comments Relating to Polycystic Ovary Syndrome
Creators: Rebecca H.K. Emanuel
Paul D. Docherty
Helen Lunt
Rebecca E. Campbell
Knut Moeller
Publication Details: IFAC-PapersOnLine, Vol.58(24), pp.19-24
Academic Unit: Medicine (UOC); Physiology
Publisher: Elsevier Ltd
Date published ; e-published: 2024
Copyright: Copyright © 2024 The Author(s). This work was first published in IFAC-PapersOnLine (Elsevier). This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial use, distribution and reproduction in any medium, provided the original work is properly attributed to the creator(s) and the source, is not altered, transformed, or built upon in any way, and a link to the Creative Commons license is provided.
Language: English
Resource Type ; Subtype: Journal article