112 Views

Symposia

Research Methods and Statistics

Opening the door to open science: Identifying barriers to uptake and generating creative solutions to move cognitive behavioral science forward

1 - (Sym 132) Scorekeeper: A Data Cleaning Package for Reproducible, Algorithmic Scoring of Multi-item Assessments

Sunday, November 20, 2022

12:00 PM – 1:30 PM EST

Location: Plymouth/Royale, 6th Floor

Keywords: Research Methods, Training / Training Directors, Publishing
Recommended Readings: Tackett, J. L., Brandes, C. M., King, K. M., & Markon, K. E. (2019). Psychology's replication crisis and clinical psychological science. Annual review of clinical psychology, 15, 579-604.
Open Science Collaboration. (2015). Psychology. Estimating the reproducibility of psychological science. [Sa]. Science, 349, Article aac4716. https://doi.org/10.1126/science.aac4716
Dougherty, M. R., Slevc, L. R., & Grand, J. A. (2019). Making research evaluation more transparent: Aligning research philosophy, institutional values, and reporting. Perspectives on Psychological Science, 14, 361–375. https://doi.org/10.1177/1745691618810693

Presenter(s)

KS

Katherine Schaumberg, Ph.D. (she/her/hers)

Assistant Professor
University of Wisconsin-Madison
Madison, Wisconsin

Multi-item measures are common across social and psychological research, including self-report scales, diagnostic interviews, and neuropsychological testing. Multi-item measures require significant phenotypic data cleaning and aggregation, such as recoding, renaming, managing missing data, creation of mean and sum scores, and algorithmic scoring procedures. Analysts often report a majority of their time is spent on data cleaning procedures, and open sharing of data cleaning code and procedures is currently uncommon. As many labs and researchers across the field rely on similar measures, documented, reproducible scoring of multi-item assessments can facilitate OS and aid in collaboration. Organized scoring procedures can specifically allow others to easily reproduce data and offer traceable metadata for derived variables. The current presentation will discuss Scorekeeper, a recently developed R-package available on GitHub, which aims to (1) make documented, reproducible scoring approachable, (2) harmonize coding practices across analysts, and (3) simplify data cleaning. Scorekeeper is designed for those with little coding experience, but basic familiarity with R. I will discuss the implementation of the scorekeeper program to manage data from various sources, including epidemiological data, multi-modal assessment data, and longitudinal clinical data. I will review the uptake and implementation of Scorekeeper in a lab context that is undergraduate focused, offer an example of how Scorekeeper can be used to develop a useful and sharable data dictionary, and discuss how code developed through scorekeeper can be implemented across the field. This presentation will provide a brief live vignette to develop a reproducible scoring algorithm and introduce the scorekeeper package as a way for researchers to implement OS practices and improve the readability of their datasets while reducing, rather than expanding, time spent on data cleaning tasks.