|
Dr. Gurvich has four problems to work on, students selected to work with Dr. Gurvich may work on one or more of these problems.
I. Let us consider the game of Nim. There are n piles of k1,k2, ..., kn stones. Two players move alternatingly. For one move it is allowed to take any number of stones from any, but only from ONE, pile. In the standard version of Nim the player who takes the last stone wins. In the so-called misere version, this player lost. In 1902, exacly one hundred years ago, Bouton published in Annals of Mathematics his paper: Nim, a game with a complete mathematical threory, where he gave an elegant solution of the game. Rather surprisingly the winning strategies are ``almost'' the same for the standard and misere Nim. More presisely, there are only 2n positions where the optimal moves differ, while in the rest of (potentially infinite set of) the positions the optimal moves in Nim and misere Nim always coincide. We conjecture that this is not a paradox but rather general property of games with perfect information and suggest to verify this conjecture for different Nim-like and other types of games.
II. It is known that, given two mxn matrices A and B, there exist m+n potentials p1,...,pm; q1,...,qn. Such that
III. Difference graphs. Given n finite sets S1,...,Sn , let us define a graph G with n vertices v1,...,vn in which 2 vertices vi and vj are connected by an edge IFF both differences Si \ Sj and Sj \ Si have at least k elements each. EVERY graph can be realized in such a way if k is arbitrarily large. But is it still true if k is bounded by a constant? Even for k=2 no counter-example is known. If k=1 then only co-occurrence graphs can be realized.
IV. In 1912 Zermelo proved that games with perfect information (e.g. Chess) can be solved in pure stratagies, that is without randomizing. This theorem generalizes the case of n players as follows. In 1951 Nash introduced his famous concept of equilibrium and in 1953 Kuhn proved that every finite n-person game with perfect information has at least one Nash equilibrium in pure strategies.
Yet, positional games, even with perfect information, may have cycles, that is the same position can appear twice. A strategy is called STATIONARY if in every position the move may only depend on this position but not on the preceding moves. Does Zermelo and Kuhn's theorems generalize this case? Is it true that games with perfect information can be solved in PURE STATIONARY strategies? In general, it is NOT true. Yet, if if the local cost function is non-negative for each player and move, or even weaker, if for every player and for every directed cycle the sum of the local costs is non-negative, then the question is open. Moreover, a positive answer is known for some special cases.
Project title: Text Categorization
Text categorization is the automated assigning of documents to predefined content-based categories. Applications of text categorization include controlled vocabulary indexing (in both traditional bibliographic databases and newer Web-based directories), content filtering of email and Web pages, subsetting information feeds, and data mining on records with both textual and nontextual fields. This project will involve developing and evaluating novel Bayesian statistical methods for text categorization and applying them to very large collections of documents. I am especially interested in algorithms that deal with one document at a time rather than large batches of documents.
Prerequisites for this work are a basic proficiency in computer programming as well as an undergraduate course in probabiliity and statistics.
Project title: String similarity methods
String edit distance has been studied for over 30 years, the minimum number of character insertions or deletions necessary to transform one string into another[1]. Recently, there has been renewed interest in questions beyond simply calculating edit distance thanks to its application in analyzing biological sequences, intercepted messages, and advanced (web) searching applications. A number of new questions present themselves: what about approximating these distances, allowing substring moves, computing a distnace by communicating or with small space usage. Can string edit distance, or something like it, be used to cluster sequences, search collections based on string similarity, and so on? There are many interesting questions in this area, many of which it will be practical to make progress on.
A course in Algorithms and Data Structures is very helpful. Information Theory, or knowledge of Compression would be useful but not vital.
[1] See, for example, "Algorithms on Strings, Trees and Sequences" by Dan Gusfield for more information.
Project Title: Dualization of Posets
Project title: Protein-induced DNA Looping
Many genetic processes are mediated by proteins that bind at separate, often widely spaced, sites on the double helix, tethering the intervening DNA into a loop [1, 2]. Examples of these processes include gene expression and its control, DNA replication, genetic rearrangements, multi-site cutting by restriction enzymes, and DNA packaging. A DNA loop stabilized by such long-range protein-protein contacts constitutes a discrete topological unit. As long as the ends of the DNA stay in place and the duplex remains unbroken, the linking number, Lk, or number of times the two strands of the double helix wrap around one another, is conserved. This constraint in Lk underlies the well known supercoiling of DNA: the stress induced by positioning the ends of the polymer in locations other than the natural (relaxed) state perturbs the overall coiling of the chain axis and/or the twisting of successive base-pairs in the intervening parts of the chain [3]. As a first step in understanding the effects of specific proteins and drugs on DNA looping, we propose to study the imposed bending and twisting of neighboring base pairs [4] in known complexes of proteins and drugs with double helical DNA stored in the Nucleic Acid Database [5]. By subjecting a DNA segment of the same chain length as that found in a given complex to the observed orientation, displacement, and superhelical stress and setting the elastic moduli to sufficiently high values, we can use existing programs, e.g., [6], to simulate the presence of a rigidly bound molecule at arbitrary positions on circular DNA molecules or to model specific systems in which DNA looping plays an important role, e.g., the lac repressor-operator assembly in EscherichiaA0coli [7]. One student could devote a summer project to the collection of geometric information. Other students could apply this information to simulations of DNA loops and folds in subsequent years. Prerequisites: Students should have an interest (but not necessarily formal training) in molecular biology, familiarity with linear algebra in order to understand the parameters used to describe DNA structure, and knowledge of basic chemistry and physics to understand the nature of the DNA molecules and the elastic treatment of the double helix.
[1] Halford, S. E., Gowers, D. M., and Sessions, R. B., ``Two are Better than One,'' Nature Struct. Biol., 7, (2000), 705-707.
[2] Schleif, R., ``DNA Looping,'' Annu. Rev. Biochem., 61, (1992), 199-223.
[3] Bauer, W. R. ``Structure and Reactions of Closed Duplex DNA,'' Annu. Rev. Biophys. Bioeng., 7, (1978), 287-313.
[4] Olson, W. K., Gorin, A. A., Lu, X.-J., Hock,
L. M., and Zhurkin, V. B., ``DNA Sequence-dependent Deformability
Deduced from Protein-DNA Crystal Complexes,''
Proc. Natl. Acad. Sci. USA, 95, (1998), 11163-11168.
[5] Berman, H. M., Olson, W. K., Beveridge, D. L.,
Westbrook, J., Gelbin, A., Demeny, T., Hsieh, S.-H., Srinivasan,
A. R., and Schneider, B., ``The Nucleic Acid Database: A
Comprehensive Relational Database of Three-dimensional Structures of
Nucleic Acids,'' Biophys. J., 63, (1992), 751-759.
[6] Westcott, T. P., Tobias, I., and Olson, W. K.,
``Modeling Self-contact Forces in the Elastic Theory of DNA
Supercoiling,'' J. Chem. Phys., 107, (1997), 3967-3980.
[7] Muller-Hill, B., The Lac Operon, Walter
de Gruyter, Berlin, 1996.
Project title: Experimental Yet Rigorous Mathematics
The research program of
Doron Zeilberger (see http://www.math.rutgers.edu/~zeilberg/)
consists in using computer algebra to, experimentally, find rigorous
proof of open problems in mathematics, especially in combinatorics.
A good knowledge of Maple, and programming experience (in Maple or
other languages) is preferred.
Project Title: Monitoring methods for topic drift in message streams
Consider the series of emails one receives. Hot
topics appear and topics that were once hot disappear over time.
How to develop mathematics needed to track the topics of current
interest, design algorithms for analyzing this process and build
systems to monitor the email stream? These issues are relevant to many
applications including in homeland security.
All aspects of this problem need to be understood: Markov Modelling
methods in Mathematics, Streaming algorithms for tracking topics,
Usable software for this problem, are some of the examples.
I am looking for REUs who can address any of these aspects of this
problem.
Consider the following problem. We are given a necklace with N
beads, where each bead is colored one of k colors. We want to cut up
the necklace into as few pieces as possible, in such a way that for
each color i, the number of beads of color i that is in the odd
piece(s) is the same as the number of beads of color i that is in
the even piece(s). For example, if every bead in the neckace is the
same color (k =1), then all we need to do is to cut the necklace
into only 2 pieces of the same size. On one hand, it has been shown
by Goldberg and West that there is always such a way of cutting the
necklace that uses only k+1 pieces. On the other hand, for large
k, there is no efficient algorithm for actually finding how to cut
the necklace using even a much larger number of cuts, say, even k
log N cuts; all we know how to do now is check many of the
possibilities, which takes too long. Thus, the open problem is to
come up with an efficient algorithm for finding such a way of cutting
the necklace that uses only a 'small' number of pieces.
Project Title: External Memory Graph Algorithms
Description: When a graph does not fit in RAM many of the classical
algorithms break down. Operations that we take for granted, like
graph traversing, get wretched when they are faced with the I/O
bottleneck. Efficient external memory algorithms have been devised for
Connectivity and Minimum Spanning Trees. However, no satisfactory I/O
results exist for very fundamental problems like BFS and DFS. We would
like to proceed with our investigations on these and related graph
problems like strongly connected components, shortest path trees and
quasi-clique computations.
From the experimental side we want to apply the developed algorithms
to the DIMACS and ACM members graphs and to some segments of the
internet repository.
Requirements: Students interested in joining this research should have
background on Algorihtms and Data Structures. Some proficiency in
C/C++ and Java is preferable.
References:
[3] J. Abello, J. Vitter(eds),
External Memory Algorithms, Vol 50 of the AMS-DIMACS Series, 1999.
Project Title: Computational Geometry
Description: One interesting growing set of problems has emerged in
the area of Geometric Graphs. From the complexity point of view some
of these problems are known to be only in PSPACE. The type of
questions that we are addressing ask for the existence of a geometric
configuration that satisfies some a priori combinatorial condition. An
example of this is the problem of Visibility Graph Recognition. It is
not even known to be in NP or NP-Hard. We want to tackle this and
similar realizability questions.
Requirements: Some knowledge and taste of computational geometry and
combinatorics are good starting points for this research.
References: Visibility Graphs of Staircase Polygons and The Weak Bruhat Order,
Discrete and Computational Geometry, Vol. 14, No 3, 1995, pp. 331-358.
Project Title: Visualization and Graph Mining
Description: A variety of massive data sets have an underlying graph
structure. Examples are the Web, the Internet and Call detail
graphs. They have low diameter, are sparse and obey a power law. We
are just beginning to understand how these characteristics can be
exploited to drive a data mining engine in charge of extracting useful
information from the data. A central question of interest is how to
represent, navigate and visualize a graph with millions of nodes and
edges. The main problem to be solved is termed the Screen
bottleneck. A good set of techniques to attack this question are based
on Graph Hierarchy Trees. We want to further develop these
techniques.
A collection of exploratory algorithms will be applied to a variety of
DIMACS and ACM member graphs and to some segments of the internet
repository.
Requirements: Students interested in joining this research should have
background on Algorithms and Data Structures. Knowledge of C/C++ and
Java is preferably. Some graphics expertise is recommended but no
necessary.
References:
Project Title: Constructions and performance of binary codes from bipartite
graphs
Recent progress in construction of good binary code families
is associated with codes from bipartite graphs decodable by simple
iterative procedures. This project is concerned with bipartite-graph
codes for which convergence of decoding algorithms is proved relying
upon expanding properties of the underlying graph. Several steps in
theoretical understanding of these codes were made in [1],[2],[3] and
subsequent papers.
The purpose of this project is to construct and establish performance
of several specific examples of expander codes. In the first stage
the student will have to study and implement in software constructions
of expander graphs. Completion of this stage will result in good
understanding of expander graphs and codes, which both are in the
forefront of research in theoretical CS.
The second stage suggests to simulate error probability of several
codes from the family described and to compare their performance with
another popular code family, the so-called low-density parity-check codes
decoded by a message-passing iterative algorithm, for which the error
probability of decoding is known and appears in the literature.
The second stage, depending on several factors, may result in a research
publication.
Prerequisites: Good programming skills. Some prior exposure to graph
theory, algebra, and coding theory would be beneficial, but can be
obtained through reading in the first part of the project.
References:
Project #: DIMACS2003-06
Mentor: Doron Zeilberger, zeilberg@math.rutgers.edu, Math Dept.
Project #: DIMACS2003-07
Mentor: S. Muthu Muthukrishnan, muthu@cs.rutgers.edu, Dept. of Computer Science
Project #: DIMACS2003-08
Mentor: Michael Capalbo, mcapalbo@dimacs.rutgers.edu, DIMACS postdoc
Project #: DIMACS2003-09
Mentor: James Abello, abello@dimacs.rutgers.edu, DIMACS
Project #: DIMACS2003-10
Mentor: James Abello, abello@dimacs.rutgers.edu, DIMACS
Project #: DIMACS2003-11
Mentor: James Abello, abello@dimacs.rutgers.edu, DIMACS
Project #: DIMACS2003-12
Mentor: Alexander Barg, abarg@dimacs.rutgers.edu, DIMACS
[1] M. Sipser and D. Spielman, "Expander codes" IEEE Trans. Inform.
Theory, vol. 42, pp. 1710-1722, 1996.
[2] G. Zemor, On expander codes, ibid., vol. 47, pp. 835-837, 2001.
[3] A. Barg and G. Zemor, Error exponents of expander codes, ibid.,
vol. 48, pp. 1725-1729, 2002.
Index of REU program
DIMACS Home Page
Contacting the Center
Document last modified on December 9, 2002.