– Senior Researcher, KI Research Institiute, Kfar Malal, Israel
Background: Large datasets of adverse drug effects (ADEs) typically attain their size by relying on automatic, or semi-automatic, methods for generation. Naturally, this often comes with a compromise on the precision of the generated data. Hence, one would like experts to curate such datasets, but this often grows costly and time consuming the larger the dataset is. Alternatively, laypersons with access to the internet may search for corroboration of listed ADEs, using resources such as Wikipedia. Importantly, Wikipedia integrates knowledge from numerous online resources, including commercial ones, like drugs.com.
Objectives: Here we suggest automating this hypothetical curation process – that is, to augment the automatic extraction of ADEs by an automatic curation step based on the knowledge accumulated in Wikipedia.
Methods: To curate a dataset of ADEs, we suggest retrieving the Wikipedia page associated with each drug, and checking whether its listed ADEs appear in the sections describing adverse effects. We further label ADEs as common or rare according to their frequency as indicated in the drug’s Wikipedia page. Drug indications, typically described in the opening paragraph of the page, can be similarly identified, and then filtered out.
Results: We use the suggested method to curate two large ADE datasets (SIDER and OFFSIDES) and compare the “distilled” results to two small, handcrafted ADE datasets, which we consider as the “gold standard”. We show that the distilled datasets have a much higher precision relative to their originating ones, suggesting that the corroborations automatically found in Wikipedia indeed tend to highlight ADEs of interest.
Conclusions: Algorithms which aim to infer drug-ADE relations – for example, from electronic health records (EHRs) – should at the very least be able to identify the “clear cut” cases. Moreover, in many contexts it is unrealistic for them to correctly identify rare ADEs, and trying to do so may lead to inaccurate conclusions about their quality. The high-precision benchmark constructed herein may therefore be a valuable resource for the evaluation of such algorithms. Methodologically, the task of identifying the adverse effects of drugs and vaccines from EHRs are essentially the same, and so this benchmark could be useful in calibrating and evaluating algorithms designed with both goals in mind. A similar approach may be applied to related tasks, such as identifying drug contraindications.