Trauma and Stressor Related Disorders and Disasters
Examining Thematic Elements and Sentiment Differences in Trauma-Oriented Subreddits Using Machine Learning
Katherine Wislocki, B.A.
Graduate Student
University of California, Irvine
Irvine, California
Alyson Zalta, Ph.D.
Associate Professor
University of California, Irvine
Irvine, California
Previous work has illustrated the use of machine learning to analyze the sentiment and prevalent themes in online communities dedicated to mental health phenomena. In doing so, research has demonstrated distinct textual and thematic features in unstructured text related to psychological wellbeing and overall health. Less work has focused on communities with individuals exposed to traumatic events. The diagnostic similarity and conceptual debate concerning PTSD and C-PTSD has led to research focused on differentiating between the two conditions. Identifying differences in communication in online communities dedicated to these conditions can provide further insight into distinguishing features and mechanisms. This is particularly relevant during COVID-19, as limited research has assessed the impact of the pandemic comparatively in individuals with PTSD and C-PTSD. 149,679 Reddit posts from two trauma-oriented subreddits: r/ptsd (N = 27,407), r/CPTSD (N = 122,272) from November 2018 - January 2022 were collected, processed, and assessed. Themes were identified using k-means clustering. Similar qualitative themes emerged across subreddits and time, related to interpersonal relationships, trauma, communication, general pathology, contextual descriptions, coping, and symptom information.
Sentiment information was assessed using rule-based and automated models. Compound sentiment score and sentiment classification differences within and across groups were examined to account for potential changes due to the COVID-19 pandemic using independent t-tests and chi-square tests, respectively. Given the size of the samples, effect sizes are prioritized in analysis. Negligible to small differences in effect size were seen in subreddits across time periods with both rule-based (Hedges g = .006-.05) and automated models (Cramer’s V = .02-.03), suggesting minimal differences in subreddit sentiment as a function of COVID-19. Significant differences in the magnitude and frequency of negative sentiment between subreddits were found before (p < .0001; Hedges g = .13-.27, Cramer’s V = .09) and during (p < .0001; Hedges g = .08-.21, Cramer’s V = .04) COVID-19, with differences in effect size seen by model. Initial findings suggest that the sentiment for trauma-oriented subreddits is minimally different in r/ptsd than in r/CPTSD, with models demonstrating varying levels of slight increases in the magnitude of negative sentiment and the frequency of posts classified as negative in r/ptsd, compared to r/CPTSD. Implications and future directions will be discussed.