DIMACS Workshop on Combinatorial Group Testing

May 17 - 19, 2006
DIMACS Center, CoRE Building, Rutgers University

Organizers:
Ding-zhu Du, University of Texas, dzdu@utdallas.edu
Frank Hwang, Chiatong University, fhwang@math.nctu.edu.tw
Presented under the auspices of the Special Focus on Computational and Mathematical Epidemiology.

Abstracts:

Andan Chen, National Chiao Tung University, Taiwan

Title: Group testing on complexes

Pooling designs are used in screening experiments in molecular biology. In some applications, the property to be screened is defined on subsets of items, instead of on individual items. Such a model is usually referred to as the complex model. In this talk we give some constructions of matrices for solving such a problem on the complex model where experimental errors are allowed.


Dechang Chen *, Uniformed Services University of the Health Sciences

Title: Group testing in the development of an expanded cancer staging system

Based on the availability of large cancer patient datasets, there is need for new and expanded TNM (Tumor, Lymph Node, Metastasis) that can incorporate additional prognostic factors. In this talk we show how the idea of group testing can be used to add new prognostic factors including molecular markers while preserving the TNM staging system. Our approach starts with survival and evaluates the association of all potential prognostic factors individually and in various combinations. A demonstration is given for lung cancer.

*Contributions by K. Xing, D. E. Henson, and X. Cheng


Bhaskar DasGupta, University of Illinois at Chicago

Title: "Grouped" String Barcoding and related problems

String barcoding is a technique in computational biology that has applications in several areas such as database compression and fast database search for DNA sequences, DNA microarray designs for efficient virus identification in which the immobilized DNA sequences at an array element are from a set of barcodes and designing minimal oligonucleotide probes needed for analyzing populations of ribosomal RNA gene (rDNA) clones by hybridization experiments on DNA microarrays. Several such related problems can be combined into the so-called "Test Set" problem's framework. A natural theoretical generalization of these problems involves using a minimum number of "groups of distinguishers" to barcode the sequences. In simple terms, our results show that with unrestricted group size, the problem is as hard as the graph coloring problem and thus not approximable unless P=NP.


Dan Hirschberg, University of California, Irvine

Title: Improved Adaptive Group Testing Algorithms

In group-testing problems, we are asked to identify all the defective items in a set of items when we can test arbitrary subsets of items for "contamination." In a standard group-testing problem, the result of a test is binary-- the tested subset either contains defective items or it does not. In the versions we study here, the result of each test is non-binary. For example, it may indicate whether the number of defective items contained in the tested subset is zero, one, or at least two (i.e., the results are 0, 1, or 2+).

We give adaptive group testing algorithms that are provably more efficient than previous group testing algorithms (even for generalized response models). We show how our algorithms can be applied to resolve conflicts on a multiple access channel and to solve a dead sensor diagnosis problem. We also give lower bounds for generalized group testing.


Yingshu Li, Georgia State University

Title: Protein-Protein Interaction and Group Testing in Bipartite Graphs

The interactions between bait proteins and prey proteins are often critical in many biological processes, such as the formation of macromolecular complexes and the transduction of signals in biological pathways. Thus, identifying all protein-protein interactions is an important task in those processes, which can be formulated as a group testing problem in bipartite graphs. We take the advantages of the characteristics of bipartite graphs and present in this talk two non-adaptive algorithms for this problem. Furthermore, we illustrate a generalization of the solution in a more general case.


Tony Macula, SUNY

Title: Group Testing to Annihilate Complexes

Group testing (aka pooling) is a widely used biotechnical technique to identify a relatively few number objects from an ensemble that individually produce an observable result. This paper gives a group testing method that identifies portions of cohorts of objects from an ensemble that collectively have an observable function. One reason for doing this is that these cohorts may collectively produce an undesirable result or collectively have a detrimental function. Thus, if a just a portion such cohorts could be identified and subsequently eliminated or neutralized, then their undesirable function could be disrupted or destroyed. This paper addresses the development and analysis of a probabilistic group testing method that lead to the identification of portions of cohorts or complexes that collectively produce an observable result.


Hung Q. Ngo, State University of New York at Buffalo

Title: On an error tolerance pooling design and a finite-field hyperplane arrangement problem

We investigate the pooling design using finite vector spaces with the containment method. This leads to two interesting hyperplane arrangement problems on finite fields.


Vyacheslav Rykov and Vladimir Ufimtsev, University of Nebraska at Omaha and
Antony Macula, State University of New York College at Geneseo

Title: New Results in Superimposed Coding Theory with Application to the Two-Stage Group Testing Algorithm and Multiple Access OR Channel

A group testing is concerned with finding a small number of defective units among a large population, efficiently. We will consider a two-stage group testing algorithm that is applicable to DNA library screening. The efficiency of two-stage group testing is determined by the expected number of tests required to establish the random number of positive clones in a DNA library, We will present a new upper bound on the expected number of tests required to establish the positive clones.

As an application to multiple access OR channel we will consider a system containing a large amount of terminal stations and a multiple access OR channel connecting the terminal stations to the central station, as in the slotted ALOHA system. Our new results in Superimposed Coding Theory provide the possibility of improving the known bounds on the multiple access OR channel capacity.


Alexander Schliep, Max Planck Institute for Molecular Genetics, Germany

Title: Group testing DNA Microarrays for detection of biological agents

The reliable identification of presence or absence of biological agents (targets), such as viruses or bacteria, is crucial for many applications from health care to biodiversity. If genomic sequences of targets are known, hybridization reactions between oligonucleotide probes and targets performed on suitable DNA microarrays will allow to infer presence or absence from the observed pattern of hybridization. Targets, for example all known strains of HIV, are often closely related and finding unique probes becomes impossible.

We employ a group testing approach, where groups of targets are defined by the same, non-unique, oligonucleotide probe binding to them. In combination with decoding techniques from statistical group testing, this allows to detect known targets with great success. Note, that here the group testing design cannot be chosen arbitrarily. It is selected from all the sets defined by the non-unique probes.

Of great relevance, however, is the problem of identifying the presence of previously unknown targets or of targets that evolve rapidly. We present the first approach to decode hybridization experiments using non-unique probes when targets are related by a phylogenetic tree. By use of a Bayesian framework and a Markov chain Monte Carlo approach we are able to identify over 95% of known targets and assign up to 70% of unknown targets to their correct clade in hybridization simulations on biological and simulated data.

This is joint work with Sven Rahmann (Bielefeld University) and David Torney (Los Alamos National Laboratory)


My T. Thai, University of Texas - Dallas

Title: Dealing with Inhibitors and Error-Tolerant Separable Matrix

Pooling designs are used in DNA library screening to efficiently distinguish positive clones from negative clones. One of the challenges of pooling designs is to design decoding algorithms for determining whether a clone is positive based on the test outcomes and a binary matrix representing the pools. This problem becomes more difficult in practice due to experimental errors and the presence of inhibitors in a sample whose effect is to neutralize positives. This talk will present a 1-round decoding algorithm identifying all positive clones with the persence of inhibitors and experimental errors using the separable matrix.


Nicolas Thierry-Mieg, CNRS and LSR-IMAG laboratory, Grenoble, France

Title: A new smart-pooling strategy for high-throughput screening: the Shifted Transversal Design

In binary high-throughput screening projects where the goal is the identification of low-frequency events, beyond the obvious issue of efficiency, false positives and false negatives are a major concern. Pooling constitutes a natural solution: it reduces the number of tests, while providing critical duplication of the individual experiments, thereby correcting for experimental noise. The main difficulty consists in designing the pools in a manner that is both efficient and robust: few pools should be necessary to correct the errors and identify the positives, yet the experiment should not be too vulnerable to biological shakiness. For example, some information should still be obtained even if there are slightly more positives or errors than expected. This is known as the group testing problem, or pooling problem.

In this talk, I will first present a new non-adaptive combinatorial pooling design: the "shifted transversal design" (STD). It relies on arithmetics, and rests on two intuitive ideas: minimizing the co-occurrence of objects, and constructing pools of constant-sized intersections. I will prove that it allows unambiguous decoding of noisy experimental observations. This design is highly flexible, and can be tailored to function robustly in a wide range of experimental settings (i.e., numbers of objects, fractions of positives, and expected error-rates). I will then compare STD, in terms of efficiency, to several previously described non-adaptive combinatorial pooling designs.

The method is currently being validated by field-testing in the context of yeast-two-hybrid interactome mapping, in collaboration with Marc Vidal's lab at the Dana Farber Cancer Institute. In the second part of this talk, I will briefly present the pilot experiment that we performed and discuss the preliminary results.

The experimental validation is joint work with Marc Vidal, David Hill, Jean-Francois Rual and coworkers (Dana Farber Cancer Institute, Boston).

Thierry-Mieg N. A new pooling strategy for high-throughput screening: the Shifted Transversal Design. BMC Bioinformatics. 2006 Jan 19;7:28. Thierry-Mieg N. Pooling in systems biology becomes smart. Nat Methods. 2006 Mar;3(3):161-2.


Previous: Program
Workshop Index
DIMACS Homepage
Contacting the Center
Document last modified on May 30, 2006.