DIMACS Workshop on Algorithmic Medical Decision Making: Bridging data sources for drug safety monitoring

May 5, 2011
The Cancer Institute of New Jersey (CINJ)
195 Little Albany Street
New Brunswick, NJ 08903

Ching-Hua Chen-Ritzo, IBM, chenritzo at us.ibm.com
Jianying Hu, IBM Research, jyhu at us.ibm.com
David Madigan, Columbia University, davidbmadigan at gmail.com
Guna Rajagopal, Cancer Inst. of NJ, rajagogu at umdnj.edu
Presented under the auspices of the Special Focus on Algorithmic Decision Theory.


Andrew Bate, Senior Director, Analytics Team Lead, Epidemiology, Pfizer

Title: The need for, and use of, multiple data streams in drug safetysurveillance

Traditional safety for marketed products has focused on analysis ofspontaneous reports for signal detection, and observational databasesand de novo studies for hypothesis testing of signals. Analyses haveprimarily been performed in individual data sets. With increasedavailability of observational data sources, and capability for linkingthe data sources, there is increased attention on analyzing multipleobservational data set and even data streams for safety surveillance. Quantitative signal detection on specific data sets of spontaneousreports has been done for many years but there remain challenges to itsoptimal use and fertile ground for ongoing research.

Surveillance inobservational databases is an emerging research field. Results will bepresented that illustrate the opportunities and research challenges thatremain in the surveillance of individual observational data sets,whether claims databases or EMR databases. The concept of multiple data set analysis is more matured for hypothesistesting studies, but even in this area there are challenges; and theanalysis of multiple data streams is even more problematic. Thereforewhile routine analysis across different data streams is routine it iscurrently done qualitatively. To illustrate this, a case study of ananalysis of sildenafil and cardiovascular outcomes is presented, withdata from spontaneous reports, RCT and from observational studies.

In conclusion, while it is essential to glean as much information aspossible from all possible sources as part of a holistic strategy tooptimize safety surveillance, the explicit and implicit heterogeneitypresent in the different data sources make the application ofquantitative approaches across data streams challenging and fraught withpitfalls. The potential benefits of being able to apply quantitativeapproaches readily and reliably suggest that research needs to beactively pursued in this area.

A. Lawrence Gould, Senior Director, Inv. Research, Merck

Title: Detecting Potential Safety Issues in Large Clinical or Observational Trials by Bayesian Screening when Event Counts Arise from Poisson Distributions

Patients in large clinical trials and in studies employing large observational databases report many different adverse events, most of which will not have been anticipated at the outset. Conventional hypothesis testing of between group differences for each adverse event can be problematic: Lack of significance does not mean lack of risk, the tests usually are not adjusted for multiplicity, and the data determine which hypotheses are tested. This paper describes a Bayesian screening approach that does not test hypotheses, is self-adjusting for multiplicity, provides a direct assessment of the likelihood of no material drug-event association, and quantifies the strength of the observed association. The approach directly incorporates clinical judgment by having the criteria for treatment association determined by the investigator(s). Diagnostic properties can be evaluated analytically. Application of the method to findings from a vaccine trial yield results similar to those found by methods using a false discovery rate argument and using a hierarchical Bayes approach.

Rick Lawrence, Machine Learning Group, IBM Research ,Yorktown Heights, NY

Title: Watson, DeepQA, Wikipedia, Twitter (and Drug Safety)

This talk will describe various applications of machine learning to extract insight from social media such as blogs and twitter. We will briefly cover algorithms to detect emerging topics across a subset of the blogosphere, as well as new predictive-based techniques to determine influence in the blog and twitter networks. We discuss ideas around using auxiliary sources of information (e.g. Wikipedia, DeepQA systems) to provide useful insight to supervised machine-learning algorithms, and discuss applications to document classification and filtering of streaming textual data like twitter. We conclude with a quick overview of the Watson (Jeopardy!) system, and how such a system might interact with structured data mining algorithms to detect potential unexpected drug interactions.

Jeremy Rassen, Assistant Professor of Medicine, Harvard Medical School

Title: Confounding, Cohorts, and Computers: New Approaches to Propensity Score Matching

Matching in cohort studies has traditionally been limited to heuristics due to limited computing power and lack of advanced algorithms.  In this talk, I will present new approaches to propensity score matching, a technique for creating sets of patients who are exchangeable within each set.  These approaches employ true nearest-neighbor matching for two important cases: n:1 matched cohorts and 1:1:1 matched cohorts.  n:1 matched cohorts allows for optimal use of data in cases where the number of treated and untreated patients varies widely, while 1:1:1 matched cohorts allow for head-to-head analysis of three treatment alternatives.  Use of these techniques can improve validity and precision in non-randomized drug safety and comparative effectiveness research.

Sheila Weiss Smith, Center for Drug Safety, University of Maryland School of Pharmacy

Title: A holistic safety approach means getting the right answer faster: What do we need to get there?

There are pharmacovigilance experts and pharmacoepidemiology experts; one is occupied with the intake, evaluation, and monitoring of adverse effects and the other with designing and conducting research studies to confirm or refuse associations between a drug and an adverse event. For the most part, these two groups of professionals have worked with different data and with different techniques. The drive to improve the drug safety system, and particularly to address some of the key limitations of spontaneous reporting systems, has led to the FDA's Sentinel Initiative and the Observational Medical Outcomes Partnership (OMOP). However, epidemiological studies also have their own set of limitations and may produce incorrect results if not well designed and executed. Doing it right, even in the advent of large databases, involves significant time and resources. Here I outline a holistic safety approach and discuss how we can - by integrating techniques and methods from both fields ? get to the right answer faster and more efficiently.

Paul Stang, Senior Director of Epidemiology, Johnson & Johnson

Title: How do we tell folks what we are up to? Insights and lessons about risk, benefit, perceptions and the ultimate consumer

We all do wonderful research, play with complex statistics, large databases...but how do we make decisions with it? How does it fit into the 'evidence' base? Do the users/consumers of the output understand the underpinnings enough to know how to interpret and communicate it effectively. This brief presentation will review the critical considerations of these research endeavors and how 'risk' information plays out once it leaves our computers. Will discuss a little bit about data sources for risk/benefit, the psychology of perceptions of data, and some ideas to think about moving forward.

Marcello Trovati, Léa Deleris, Carlo Spaccasassi, and Brian White, IBM Research, Ireland

Title: Automatically Constructing Risk Networks using Text-mining Techniques: Framework and Application to Drug Safety Monitoring

Bayesian Belief Networks (BBNs) are graphical models that capture both dependence and independence relationships among random variables. They are often used as a modelling framework for risk management and medical decisions. The construction of a BBN can be done either through data, or when unavailable, through literature review or expert elicitation. While the first approach can be automated, the others require a significant amount of manual work which makes them impractical on a large scale.

Our research focuses on leveraging natural language processing (NLP) techniques to build BBNs from texts, focusing on the medical domain, with particular emphasis to drug safety monitoring and profiling issues. An essential aspect of our approach to learning BBNs from text is the accurate identification of dependence (and independence) relations between concepts. Our objective in the present research is to discover such relations from text sources, in particular those contained in specialised databases such as VAERS, DrugBank and SIDER but also possibly from scientific papers, web pages and blogs. As well as simply discovering the relations, we attempt to use the text to determine the associated probabilities. Finally, our approach addresses the management of conflicting information.

Previous: Program
Workshop Index
DIMACS Homepage
Contacting the Center
Document last modified on April 14, 2011.