970460 - Using Surface Observations to Select Features and Predict Extreme Ozone in Texas via Generalized Additive Modeling, Synthetic Minority Oversampling Technique (SMOTE), Tail Dependence Optimization Method
We present three methods for predicting high-ozone events from surface observations of ozone and meteorological data (2005 – 2019) in the El Paso and Houston-Galveston-Brazoria regions within Texas. The first method is a standard Generalized Additive Model (GAM) approach, which can outperform standard linear modeling approaches, but fails to adequately predict the highest ozone days due to the preponderance of lower ozone days in the time series. The second method combines the GAM approach with the Synthetic Minority Oversampling Technique (SMOTE) to balance the high and low ozone days, which generally improves the prediction of high ozone days at the cost of a slight degradation in the ability to predict low ozone days. The third method is a feature selection and high tail dependence optimization technique that performs feature selection on meteorological covariates and optimizes a model that uses these features for high tail ozone prediction. This method uses data transformation techniques and cross-validation to identify the optimal set of meteorological covariates for prediction of the highest ozone levels. We also test these selected features within the GAM and SMOTE frameworks. While there is no single set of meteorological covariates that can best-predict high ozone events, there are many covariate sets that show comparable ability to predict ozone. These feature selection methods and statistical techniques can offer insight into the meteorological drivers of high ozone behavior and be used to predict and characterize high ozone events. In addition, these techniques can identify regional differences in ozone extremes and the structural relationships between ozone and meteorology.