National Center of Theoretical Science, National Chiao Tung University, Hsinchu, Taiwan

**Organizers:****Frank Hwang**, National Chiao Tung University, fhwang@math.nctu.edu.tw**Fred Roberts**, DIMACS, Rutgers University, froberts@dimacs.rutgers.edu

Tim Ting Chen, University of Southern CaliforniaTitle: Inferring Domain-Domain Interactions from Protein-Protein Interactions

Dietmar Cieslik, University of Greifswald

- The duality of distance and similarity.
- Several combinatorial properties of phylogenetic spaces, and its finite languages, the sequence spaces. Particularly, Phylogenetic spaces are discrete metric spaces; that means: Each bounded subset is a finite one. Hence, an SMT exists for any finite set of given points (words) in the space, and it can be found by a suitable search-algorithm.
- Steiner's Problem is very hard in computational sense. It is at
least
*ΝΡ*-hard. But in specific cases it can be solved with simple and well-known methods. - If we do not allow Steiner points, that means, we only connect certain pairs of the given points, we get a tree which is called a Minimum Spanning Tree (MST). The determination of an MST is simple. Consequently, we are interested in the greatest lower bound for the ratio between the lengths of these both trees: which is called the Steiner ratio. We look for estimates and exact values for the Steiner ratio in phylogenetic and sequence spaces. In any metric space the Steiner ratio is at least ½. Phylogenetic spaces achieves this ratio.

**Miklós Csürös**, Université de Montréal &
Aleksandar Milosavljevic, Baylor College of Medicine

**Ding-Zhu Du**, University of Minnesota

Title: Some New Constructions for Pooling Designs

The non-adaptive and error-tolerance pooling design has gained an important application in DNA library screening. We propose two new classes of this type of pooling designs. These designs induce a construction of a binary code with minimum Hamming distance of at least 2d+2.

This work is joint with Hung Q. Ngo.

**Dannie Durand**, Carnegie Mellon

Title: Validating gene clusters

Large scale gene duplication, the duplication of whole genomes and subchromosomal regions, is a major force driving the evolution of genetic functional innovation. Whole genome duplications are widely believed to have played an important role in the evolution of the maize, yeast and vertebrate genomes. Two or more linked clusters of similar genes found in distinct regions on the same genome are often presented as evidence of large scale duplication. However, as the gene order and the gene complement of duplicated regions diverge progressively due to insertions, deletions and rearrangements, it becomes increasingly difficult to determine whether observed similarities in local genomic structure are indeed remnants of common ancestral gene order, or are merely coincidences. In this talk, I present combinatorial and graph theoretic approaches to validating gene clusters in comparative genomics.

**Frank Hwang**, National Chiao Tong University

Title: Random pooling designs with various structures

Typically, there are two ways to construct pooling designs, one through using some mathematical structure, and the other through randomness. Recently, several constructions which combine both the structure and randomness elements have been proposed. These include the random k-set design, the distinct random k-set design, the random k-size design and several designs by Macula. We will first update the best formulas to compute the success probabilities of identifying a positive clone with the first three models.

For the basic 1-stage Macula's design with parameters (n,k,d), Macula gave an approximate algorithm to identify positive clones from this design, and an analysis of the success probability. For d=2, which is the focus of Macula's analysis, we will propose a new method which provably identifies more positive clones, but the analysis is much more difficult. So far, we have exact result only when the number of positive clones is at most 3.

Macula also gave a 2-stage design and an approximate formula with 2 summation signs, and a further approximation in explicit form. We will gave an exact formula also with 2 summation signs, and an approximation in explicit form, which is more accurate than Macula's explicit approximation. The purpose of the approximation is to facilitate the finding of optimal design parameters.

Finally, we will provide some numerical comparisons between the various random models, and between Macula's analysis and our new analysis.

**Esther Lamken**, Caltech
Title: Two applications of designs in biology
In this talk, I will describe two applications of combinatorial designs in
biology. The first application uses ideas from finite geometry and design
theory to help design pooling experiments. The second application extends
work of Sengupta and Tompa. I will describe how ideas from design theory
and cryptography can be used to deal with the problem of multiple step
failure in the manufacture of DNA microarrays at Affymetrix.

**Wen-Hsiung Li**, Department of Ecology & Evolution, University of Chicago

Title: Methods for Aligning Long Genomic Sequences

Many eukaryotic genomes will soon be completely or near completely sequenced. An extremely challenging problem is how to align genomic sequences, which is usually the first step in comparative analysis. Alignment of genomic sequences is much more difficult than alignment of protein sequences or DNA sequences of protein coding regions because (1) the large demand of computer time and memory space, (2) the sequences often have been scrambled by insertion of transosable elements, translocation, and inversion, and (3) the existence of highly divergent regions. Some methods for aligning genomic sequences will be presented.

**Alan C.H. Ling, Charles J. Colbourn, and Martin Tompa; University of Vermont**

Title: Construction of Optimal Quality Control for Oligo Arrays

Oligo arrays are important experimental tools for the high throughput measurement of gene expression levels. During production of oligo arrays, it is important to identify any faulty manufacturing step. We describe a practical algorithm for the construction of optimal quality control designs that identify any faulty manufacturing step. The algorithm uses hillclimbing, a search technique from combinatorial optimization. We also present the results of using this algorithm on all practical quality control design sizes.

**Peng-An Chen and F. K. Hwang**, Department of Applied mathematics,
National Chiao-Tung University, Hsinchu, Taiwan 30050 R.O.C.

Title: On a Conjecture of Erdos-Frankl-Furedi on Nonadaptive Group Testing

d-disjunct matrices become the dominating tool in studying nonadaptive
group testing, which has attracted a lot of attention due to its
recent application to pooling designs in DNA mapping. By testing the
items one by one, clearly we can identify n items in n tests. Huang
and Hwang(1999) raised the fundamental question what is the maximum n
such that a d-disjunct matrix has at least n rows? Namely, let N(d)
denote this maximum n for given d. Then for all *n* £ *N(d)* the
individual testing algorithm is the best. Erdos, Frankl and
Furedi(1985) studied the same problem in their study of extremal
set theory. They conjuctured that *N(d)* ³ *(d+1) ^{2}* and claimed they can
prove for

**Anthony J. Macula**, SUNY Geneseo

Let [t] represent a finite population with t elements. Suppose we have an unknown d-family of k-subsets C of [t]. We refer to C as the set of positive k-complexes. In the group testing for complexes problem, C must be identified by performing 0, 1 tests on subsets or pools of [t]. A pool is said to be positive if it completely contains a complex; otherwise the pool is said to be negative. In classical group testing, each member of C is a singleton. In this talk we discuss adaptive and nonadaptive algorithms that identify the positive complexes. We also discuss the problem in the presence of testing errors. Group testing for complexes has been applied to DNA physical mapping and has potential application to data mining.

**Fred Roberts**, Rutgers University - Piscataway, NJ, USA

Title: Consensus List Colorings of Graphs and Physical Mapping of DNA

Abstract: In graph coloring, one assigns a color to each vertex of a graph so that neighboring vertices get different colors. We shall talk about a consensus problem relating to graph coloring and discuss the applicability of the ideas to the DNA physical mapping problem. In many applications of graph coloring, one gathers data about the acceptable colors at each vertex. A list coloring is a graph coloring so that the color assigned to each vertex belongs to the list of acceptable colors associated with that vertex. We consider the situation where a list coloring cannot be found. If the data contained in the lists associated with each vertex are made available to individuals associated with the vertices, it is possible that the individuals can modify their lists through trades or exchanges until the group of individuals reaches a set of lists for which a list coloring exists. We describe several models under which such a consensus set of lists might be attained. In the physical mapping application, the lists consist of the sets of possible copies of a target DNA molecule from which a given clone was obtained.

**Lusheng Wang**, City University of Hong Kong
Title: Approximation Algorithms for Distinguishing Substring Selection Problem
Consider two sets of strings, *B* (bad genes) and *G* (good genes), as well
as two integers *d _{b}* and

**Chih-wen Weng** and **Tayuan Huang**, National Chiao Tung University
Title: Constructions of Non-adaptive Pooling Designs
We show how to construct non-adaptive pooling designs from a ranked
atomic (meet) semi-lattice. All these pooling designs are *e*-error
detecting for some e which can be chosen to be very large compared to
*d*, the maximal number of defective items(or positives). Eight new
classes of non-adaptive pooling designs are given, which are related
to the Hamming matroid, the attenuated space, and six classical polar
spaces. The general construction of ranked atomic semi-lattices from
known ones are discussed.

**Jer-Shyan Wu**, Department of Computer Science & Information Engineering,
Chung Hua University, Hsinchu, TAIWAN
Title: Efficient Algorithms for Weighted Genome Rearrangements
Given two different genomes with the same numerical labels, the goal
of genome rearrangements is to find the minimum number of reversal
operations that leads from one genome to the other. Consider the
biological reality, each reversal has different evolutionary time and
probability, i.e., each weight is distinct not identical. In this
paper, We discuss the simple weighted model : each reversal weight is
portional to its length, and try to propose some efficient algorithms
to solve this probelm.