Accelerating Statistical Research in Drug Safety

TitleAccelerating Statistical Research in Drug Safety
Publication TypeJournal Article
Year of Publication2006
JournalPharmacoepidemiology and Drug Safety
Subsidiary AuthorsBilker, W., V. Gogolak, D. Goldsmith, M. Hauben, G. Herrera, A. Hochberg, S. Jolley, M. Kulldorff, D. Madigan, R. Nelson, A. Shapiro, and G. Shmueli
Type of ArticleLetter to Editor

While discussions continue about the precise role of data mining in drug safety, there can be little doubt that data mining algorithms have established themselves as a component of the pharmacovigilance “tool-kit”. The drug safety community should facilitate research into the development and testing of novel data mining and statistical algorithms that match the importance and complexity of the pharmacovigilance challenge. Nonetheless, significant barriers currently exist that impede the progress of research and development in this area. Below we identify these barriers and suggest a course of action.

At a meeting of the DIMACS Working Group on Adverse Event/Disease Reporting, Surveillance and Analysis in February 2006, we observed that the rate of progress in pharmacovigilance data mining research has been surprisingly slow. The literature generally focuses on a small number of closely related and relatively simplistic algorithms. Hauben et al. (2005) and others have discussed the strengths and limitations of these algorithms. Future generations of algorithms will acknowledge the true multivariate and temporal nature of adverse event databases, as well as incorporating drug information from the Structured Product Label, better reflecting the clinical relevance of various drug-safety signals, and appropriately accommodating the uncertainty that derives from data quality issues.

The data mining and statistical research communities revel in challenges such as these. However, notwithstanding the attention drug safety has attracted in recent years, the data mining research community and relevant funding agencies have, for the most part, shunned the area, not for lack of interest but due to logistical hurdles. We contend that this lack of progress stems largely from the lack of a publicly available appropriately reviewed and cleaned data source that can be used for research on drug-safety data mining algorithms. Data mining research requires access to data, and the availability of public data sets can strongly encourage research, as in the case of the National Cancer Institute’s sharing of microarray data.

The FDA does make a version of the Adverse Event Reporting System (AERS) data available via the web. However, two serious flaws render this “FOI” version essentially useless for methodological research. First, the FOI version does not include the original adverse event narratives. Without the narrative data the validity of the adverse event codes cannot be assessed and it is recognized that in many instances the international guidelines for coding practices have not been closely adhered to. Second, the drug identifiers in the FOI version include generic names, brand names, dose levels, and ingredient names in myriad combinations with and without misspellings. In fact, the drug dictionary contains over 330,000 verbatim terms; the actual number of unique drugs is closer to 10,000.

The Working Group seeks solutions to these two issues. Concerning the adverse event narratives, legitimate privacy concerns require that the narratives be redacted for personal identifiers. The reports however, are inherently structured to ‘blind’ the patient and reporter. Only ‘leakage’ need be addressed. The group believes that the effort required to redact the data has been overstated, and that means exist to do a credible job on past data and to start immediately to screen reports as they come in. Similarly with the drug identifiers, a significant one-time effort is required to deal with historical data, but minimal effort is required going forward.

The lateness of the reports is also a major concern. For example, the data after July 1, 2005 had still not been released as of March 1, 2006. This extraordinary delay, while not impeding data mining research, has significant public health consequences.

In short, we seek a timely and clean version of AERS, complete with adverse event narratives (redacted of proprietary information and personal identifiers), and we want to make this available to data mining and statistical methods researchers at no charge. We recommend that a joint government, academic, and private sector group be formed to address these issues and to recommend solutions to the FDA. Opening the door to a wider research community will not only result in improved methodology, but as a consequence, could also result in substantive findings that benefit the public.

Hauben, M., Madigan, D., Gerrits, C., and van Puijenbroek, E. (2005). The role of data mining in pharmacovigilance. Expert Opinion in Drug Safety., 4(5), 929-948.


Biblio Tags: