Brooke McGinley1, Gabriela Rabasa2, Tuhina Neogi3, David Felson2 and Michael LaValley4, 1Boston University School of Public Health, Boston, MA, 2Boston University, Boston, MA, 3Boston University School of Medicine, Boston, MA, 4Boston University School of Public Health, Arlington, MA
Background/Purpose: Knee replacement surgery (KR) is prevalent in the US with estimated lifetime risk from age 25 of 7% for males and 9.5% for females, and it accounts for substantial health care costs. Knee osteoarthritis is increasingly considered to consist of different phenotypes often leading to the same end stage KR outcome. To explore how these phenotypes might be defined, we used an unsupervised machine learning approach to cluster KR patients into subgroups.
Methods: We used data from the Osteoarthritis Initiative (OAI) for analysis. We identified all OAI participants undergoing any KR during the 9 years of follow-up and extracted data on demographic and clinical characteristics from the exam before their first KR. Data collected included age, sex, race and ethnicity, weight, height, WOMAC scales, PASE score, Kellgren & Lawrence grade, education, CESD depression scale, insurance status, Heberden's nodes, history of knee injury or surgery, NSAID use, opioid use and Charlson comorbidity index. Principal components analysis (PCA) was used to identify dimensions based on combinations of characteristics that account for most of the variance in the data. K-means clustering was used to assemble KR patients into the most homogenous subgroups possible. Evaluation of the clustering is based on the sizes and clinical meaningfulness of the resulting subgroups.
Results: We identified 414 KR subjects. PCA indicated that up to 14 dimensions could be used capturing 72% of the variance. The first dimension (Dim1) loaded on anthropomorphic characteristics (current weight, self-reported max and min weights, height, sex, and body mass index (BMI)), Dim2 on WOMAC scales (disability, pain, and stiffness), Dim3 on Kellgren & Lawrence grade, NSAID and opioid use. For the k-means clustering, average silhouette width suggested between 2 and 6 clusters, although the gap width statistic suggested as many as 12. We selected the 4-cluster result for presentation as it maintained at least 25 subjects per group and resulted in interpretable clusters. Table 1 shows key characteristics by cluster membership for the 4-cluster result. Cluster 1 is mainly older women with low BMI and Heberden's nodes; cluster 2 is mostly men with greater average weight, height, and physical activity than the other clusters; cluster 3 is a smaller cluster with a mixture of sexes with history of knee injury or surgery, and low use of opioids; cluster 4 is the most racially diverse, consisting of relatively younger women with high BMI, pain, disability, and depression scores.
Conclusion: We found clear clusters along dimensions of sex, age, history of knee injury or surgery, and race with differences in activity, pain, disability, and depression among clusters. These clusters will be further validated and refined using KR subjects from the Multicenter Osteoarthritis Study (MOST). Cells in table give cluster-specific means for continuous variables and percentages for categorical variables.
The 4 clusters plotted against principal components dimension 1 (X-Axis) and dimension 2 (Y-Axis). Clusters 1,2, and 3 show good separation in these dimensions.
The 4 clusters plotted against principal components dimension 1 (X-Axis) and dimension 3 (Y-Axis). Cluster 4 shows good separation from the other clusters using dimension 3. Disclosures: B. McGinley, None; G. Rabasa, None; T. Neogi, Novartis, Pfizer/Lilly, Regeneron; D. Felson, None; M. LaValley, None.