Toward a predictive model for anticipated brownfields clean-up costs: Insights from 25 years of ACRES data
Thursday, August 5, 2021
Link To Share This Poster: https://cdmcd.co/bYGbB8 Live Discussion Link: https://cdmcd.co/3wqGX7
Caitlin M. Augustin, Emily Yelverton and Mallory Sheff, DataKind, Shannon Loomis and Danielle Getsinger, Community Lattice
Presenting Author(s)
Caitlin M. Augustin
DataKind
Background/Question/Methods All communities face complex social and environmental challenges. Rehabilitating underutilized properties can be an opportunity to both clean up environmental contamination and stimulate community economic growth. However, the potentially high cost of cleanup on these properties (known as brownfields) can also deter investment. We believe insights that could demystify these costs and encourage investments are contained within little-examined public databases. Through this project, we investigated one such database, the Assessment, Cleanup, and Redevelopment Exchange System (ACRES) database for cost indicators for brownfields cleanup. Bolstered by this access to 25 years of brownfields data, we derived insights foundational to building an open-source, open-access analytics approach to brownfields remediation costing. Once we pre-processed and understood the data, we conducted feature engineering on the ACRES dataset. We attempted several prototype modeling approaches to isolate variables most important to the cost of a brownfield cleanup, including linear regression, logistic regression, and random forest modeling techniques. We used Natural Language Processing (NLP) classification techniques on free-text fields to derive more detailed information about the category of property (school, hospital, railway, etc.) undergoing a cleanup. Lastly, we used clustering techniques to produce groupings of similar cleanup sites.
Results/Conclusions We were able to identify four key insights using data science approaches. First, while all properties were classified as brownfields, we found the assessment results and which sites required cleanup differed by state. Second, we found when predicting the cost of cleanup, both previous use (e.g., industrial vs. commercial) and future planned use (e.g., residential vs. nonresidential) may impact the cost of cleanup. Third, we observed that sites with contaminants found in sediments, soil, groundwater, and surface water are, on average, costlier than sites with contaminants found in building materials. Last, we identified that specific contaminants (e.g., pesticides, other metals), and economic indicators (e.g., percent population below the poverty line, vacant housing, median income), may be predictive of cleanup cost. Critically, this project yielded information about brownfields projects based on known characteristics so community developers can better estimate the time and cost of revitalization efforts. Immediately, we will use this information to support community-based organizations leveraging public incentives (such as federal grants) to address environmental issues and fulfilling their vision for revitalization. Long-term, this dataset investigation, preparation and modeling was a crucial first step toward building a generalizable and accessible tool for community brownfields development.