Despite how much fun manual data entry can get (joking!), automating processes can be more effective and efficient when dealing with recurring data analyses, especially for catching minor inconsistencies, bulk editing, and adjusting formatting. So, this raises the question, how can institutions automate and standardize the way data is cleaned? One solution is by harnessing the power of OpenRefine, an open-source data cleanup and transformation (a.k.a. “data wrangling”) application. OpenRefine processes data similarly to Excel but has other robust advantages. We will discuss the benefits of using OpenRefine over Excel to meet some data wrangling needs. UNCW cataloging librarians exported circulation data for use in a data-driven collection development project to better understand how its collections are being used to best meet specific institutional budget and audience needs. Using this (messy!) data as an example, UNCW’s Research Data Librarian leverages the OpenRefine tool and suggests how OpenRefine can be used in your own projects.
Learning Objectives:
Navigate some features within OpenRefine to complete basic cleaning tasks
Identify opportunities to automate their workflow when cleaning data, especially for library e-resources and collection data
Recognize when to use Excel versus OpenRefine for data cleaning and analysis, and how to move between the tools
Standardize data cleaning processes for recurring tasks