Symposia
Research Methods and Statistics
Katherine Schaumberg, Ph.D. (she/her/hers)
Assistant Professor
University of Wisconsin-Madison
Madison, Wisconsin
Multi-item measures are common across social and psychological research, including self-report scales, diagnostic interviews, and neuropsychological testing. Multi-item measures require significant phenotypic data cleaning and aggregation, such as recoding, renaming, managing missing data, creation of mean and sum scores, and algorithmic scoring procedures. Analysts often report a majority of their time is spent on data cleaning procedures, and open sharing of data cleaning code and procedures is currently uncommon. As many labs and researchers across the field rely on similar measures, documented, reproducible scoring of multi-item assessments can facilitate OS and aid in collaboration. Organized scoring procedures can specifically allow others to easily reproduce data and offer traceable metadata for derived variables. The current presentation will discuss Scorekeeper, a recently developed R-package available on GitHub, which aims to (1) make documented, reproducible scoring approachable, (2) harmonize coding practices across analysts, and (3) simplify data cleaning. Scorekeeper is designed for those with little coding experience, but basic familiarity with R. I will discuss the implementation of the scorekeeper program to manage data from various sources, including epidemiological data, multi-modal assessment data, and longitudinal clinical data. I will review the uptake and implementation of Scorekeeper in a lab context that is undergraduate focused, offer an example of how Scorekeeper can be used to develop a useful and sharable data dictionary, and discuss how code developed through scorekeeper can be implemented across the field. This presentation will provide a brief live vignette to develop a reproducible scoring algorithm and introduce the scorekeeper package as a way for researchers to implement OS practices and improve the readability of their datasets while reducing, rather than expanding, time spent on data cleaning tasks.