DIMACS/DIMATIA/Mathematics REU Final Student Presentation Abstracts

Why BDDs need coins

Zdenek Dvorak, DIMATIA REU, Charles University, Prague, Czech Republic

We investigate the problem of determining whether edges of a given hypergraph form a partition of the set of vertices. Binary decision diagrams (BDD) form a computational model (or data structure) for representation of Boolean functions. We show that this problem for $k$-uniform hypergraphs cannot be solved by a polynomial size $m$-BDD for $m=\theta(\log k)$, i.e., that any polynomial size BDD solving this problem needs to query some of edges at least $\log k$ times. We also show that, somewhat surprisingly, randomized $1$-BDD of polynomial size is sufficient to solve the problem.

Agressive Writeback in the Linux Kernel

Robert Renaud, DIMACS REU, Rutgers University, New Brunswick, NJ

The current Linux kernel paging algorithm uses an LRU algorithm to decide which pages to evict from main memory. LRU and many similar algorithms ignore the cost of eviction, selecting pages based only on the pages access pattern. Aggressive write back is an optimization to LRU, whereby pages that are nearby other pages which are evicted are also written back. This opportunistic write back has low cost due to the locality of the data on the disk, and should yield a significant performance improvement.

Author Identification Results

Diana Michalek, DIMACS REU, University of California, Berkeley
Ross Sowell, DIMACS REU, Sewanee: The University of the South, Sewanee, TN

We will give a brief review of the ideas presented in our first presentation. Then, we will quickly move to a description of our experiments, and an analysis of the results of our work.

Mathematical Models Incorporating HIV Mutation Data

Andrew Hodges, DIMACS REU, Manchester College, North Manchester, IN

The goal of this research was to construct mathematical models that incorporate HIV mutation data. These models involve systems of coupled nonlinear autonomous differential equations. Our five-dimensional model focuses on the interactions amongst uninfected CD4+ cells, two HIV virus strains (mutant and wild-type variety), and T-lymphocytes infected with one (but not both) of the viral strains. Furthermore, we consider the implications of HIV mutations between the single mutant virus strain and the wild-type HIV virus. Steady states are analyzed for variations in mutation rate, virus/T-cell presence, and drug interactions. The current model is derived from models proposed by DeLeenheer and Smith, Nowak and May, et al.

Generalized Model for Medical Imaging Geometries

Vidya Venkateswaran, Mathematics REU, Rutgers University, New Brunswick, NJ

Computed Tomography is the non-destructive method for reconstructing the internal structure of an object that is the basis of all medical imaging modalities This is done by sending x-rays through the body and measuring how much they are absorbed or attenuated. This data, referred to as the ray sum is modeled as a line integral along the path of the photons; the reconstruction problem is to find the attenuation at one point given this set of line integrals. In practice, Fourier analysis is used to .back-project. or reconstruct the image (of attenuation characterisitics).

Central to tomography and medical imaging is the model of the source/object/detector relationship; reconstruction algorithms are dependent upon descriptions of the relative positions, orientations and motions. Traditionally, Euclidean geometry is used to model these systems. However, this method becomes cumbersome for more complex imaging geometries.

It is a widely held belief that there is really a continuum of medical imaging systems, that reconstruction algorithm changes continuously with imaging geometry. Our aim is to model this continuum with a unifying theory describing the imaging geometry based on Geometric Algebra. This new approach will enable us to model the geometry and thereby explain how images are reconstructed for an arbitrary medical imaging system. Moreover, we hope this new method will be notationally clearer and allow us to more succinctly describe reconstruction algorithms.

A Biologically Significant Mathematical Model of rtPCR

Jana Gevertz, Mathematics REU, Rutgers University, New Brunswick, NJ

rtPCR (real-time PCR) is a very powerful tool to measure gene expression in a cell. Quantification techniques used in rtPCR have the underlying assumption that the efficiency of the system is constant. Data analysis has shown, however, that the efficiency of PCR is NOT constant, but is instead a function of PCR cycle number. In order to get more accurate quantification results from rtPCR, an understanding of how efficiency behaves with respect to cycle number is required. To this end, we have developed a mathematical model that predicts the efficiency of PCR at each cycle. This model will allow us to predict optimal experimental conditions for rtPCR experiments and will allow us to develop more reliable techniques to quantify gene expression from rtPCR data.

Three Colors, n Balls, and Two Optimal Algorithms

Vitek Jelinek, DIMATIA REU, Charles University, Prague, Czech Republic

Consider the following setting: you are given n balls, and each ball is colored with one of three colors. You are unable to distinguish the colors, but in each step, you may pick two balls and ask the oracle whether the two balls are of the same color (and the oracle always gives you the correct answer). Our aim was to determine the smallest number of steps needed to solve the following two problems:

1) The plurality problem: find a ball which is colored by the most common color.
2) The partition problem: split all the balls into three groups according to their colors.

We found an optimal deterministic algorithm for the partition problem, as well as a probabilistic algorithm for the same problem, whose expected number of steps is optimal. We also derived some bounds for the expected number of steps of a probabilistic algorithm for the plurality problem.

Cliques and Edge-colored Graphs

Bianca Viray, DIMACS REU, University of Maryland, College Park

Consider an complete graph, G, edge-colored with three colors. Is it possible to find three maximal cliques such that there intersection is empty? Many subgraphs must be resolved for this to be possible. We consider if all subgraphs can be resolved if Delta is contained in G.

Quiver Representations

Stephan Curran, Mathematics REU, Rutgers University, New Brunswick, NJ
Andrew Dudzik, Mathematics REU, University of Chicago, Chicago, IL

We will discuss quivers and their representations, and the light these shed on the McKay correspondence. Other topics include the Z3 quiver and applications to string theory, and topics in Lie theory and differential forms if time permits.

TBA

Shiri Azenkot, DIMACS REU,Pomona College, Claremont, CA

TBA

Analysis of Quantile Summary and Frequency Count Algorithms

Yinmeng Zhang, DIMACS REU, Carnegie Mellon University, Pittsburg, PA

Recently there has been a lot of interest in using small samples to approximate very large distributions. As technology has advanced, we are faced with massive data streams, especially from the internet, e.g. email data and IP traffic logs, from which we wish to glean useful information. Basically, we have lists of numbers streaming by, so phenomenally huge that storing all of them is out of the question, yet for which we would like to know estimates for the median and other such quantiles. Algorithms, such as that of Greenwald and Khanna, address those issues. This project examines various algorithms given assumptions about the distribution of data.

An Exploration of Combinatiorial Problems with Restrictions

Susan Hope, DIMACS REU & RISE, East Carolina University, Greenvile, NC

In mathematics and computer science, research problems encountered are often combinatorial in configuration. These problems often contain restrictions that allow certain parameters, but not others. I examined two such problems.

The first problem I investigated involves maximizing the size of restricted subset S, where Given U={1,2,.n} determine the maximum size of S such that if The solution is (3/4)n. What happens if we further restrict our subset so that if and

The other problem I examined was nicknamed .The Chair Problem.. This problem involves a two-player game played at a round table. Player A sits in the first chair, and declares she wishes to move j seats. Player B then instructs player A to move j seats clockwise or counterclockwise. Player A wishes to sit in as many different seats as possible, and player B wishes to confine player A as few seats as possible. There are n seats at the round table.

I examined two unique cases of this problem. In case 1, n=2^k (where k=1,2,3.). Case 2 involves n=3k. Case 1 is advantageous to player A who can indeed sit in all n seats, and case 2 is advantageous to player B who can confine Player A to (2/3)n seats.

Automated Proofs of Combinatorial Identities

Yuriy Choliy, Mathematics REU, Rutgers University, New Brunswick, NJ

It is possible to prove a variety of combinatorial identities in completely algorithmic fashion. Major advances in our understanding of this topic lead to development of WZ method which finds and utilizes a certain rational function R (different for every identity)to prove existing combinatorial identities. However, obtaining function R using existing procedure zeil is impossible for many complicated identities due to limitations in computer speed and memory. We are interested in developing new methods which will be more efficient in terms of speed and memory usage. We hope that these methods will enable uss to obtain WZ certificate functions in cases where zeil algorithm fails.

Calculation of Stokes Constants for the First and Second Painleve Equations

Michael Alfare, Mathematics REU, Rutgers University, New Brunswick, NJ

We want to obtain rigorous bounds for the Stokes Constants in the first two Painleve Equations. The method produces a quickly converging sequence for these constants.

Quantum Behavior of Time Dependent Systems; Ionization Problems on the Half Line

Michael Grabchak, Mathematics REU, Rutgers University, New Brunswick, NJ
Matthew Kohut, Mathematics REU, Rutgers University, New Brunswick, NJ

The physical situation we are examining is a particle in a quantum mechanical system (such as an atom) which is free to move on the positive real axis. The region to the left of zero is taken to represent a physical barrier (such as a wall) which is impossible to cross. The particle is subject to a periodic external force given by Omega*delta(x=a)sin(wt) and a purely spatially dependent potential given by V*delta(x=a), where V,w, a, and Omega are parameters. Our objective is to determine the long term behavior of such a system; specifically, we would like to determine when ionization (the phenomenon where a particle eventually leaves a system) occurs, and how this depends on the parameters of the system.

Floating Point Error & Symmetry Break

Christopher Ross, Mathematics REU, Rutgers University, New Brunswick, NJ

As computer simulation of physical phenomena becomes more commonplace, the degree of precision available in the calculations begins to limit the depth of research. While it is true that, with sufficient resources, we can allocate memory for arbitrary precision, it would be far superior to make changes in the algorithms to eliminate the effect of rounding on sets of data.

In this specific scenario, a model of Bose-Einstein Condensation is producing unexpected results. Is this an accurate representation of the system, or do these anomalies stem from computational error? And in the latter case, what is the best way to remove said error?

DNA Packing: Characterizing Intermolecular Contacts of DNA

Bryson Finklea, DIMACS REU,St. John's College, Annapolis, MD

The structure of DNA and the packing of its molecules have been primarily studied through X-ray diffraction patterns of DNA crystals grown in the lab. A crystal is a solid with a repeating arrangement of molecules, and there is a mathematical notation used to describe the 230 possible patterns or space groups. Some of the possible space groups occur more often than others. Much of the structural information found worldwide on DNA is standardized and stored on the Nucleic Acid Database (NDB), which is based here at Rutgers. I have helped revise a program that recreates sufficient atomic coordinates of crystals from more limited information found on the NDB in order to then find distances between molecules in the crystal. The program involves simple linear algebra. It will later include other characterizations of the intermolecular contacts, such as angles between molecular axes. DNA in living cells will hopefully turn out to pack similarly.

Protein Encoding Models

Logan Everett, DIMACS REU, Binghamton University, Binghamton, NY

A common task in Bioinformatics is the search of sequence databases for relevant information. In protein sequence databases, searching is further complicated by both the increased amount of data and the complexity of sequence similarity. Protein similarity is not simply a matter of character matching, but rather is determined by a matrix of scores assigned to every match and mismatch.

One strategy to increase search speed is to map sequences into a binary space where the hamming distance between strings is comparable to the similarities of the original sequences. Within binary hamming spaces, statistically proven sampling methods can be used for fast, reliably sensitive searches. However, mapping the protein alphabet to a binary hamming space often comes with a certain level of distortion.

We have employed Linear Programming techniques to model and study the nature of these mapping schemes. Specifically, we have found the theoretically minimum distortion achievable for several biological scoring matrices, as well as corresponding encoding weights. We have also analyzed the use of these encoding weights to intelligently generate pseudo-random encoding schemes that approach the theoretical minimum distortions.

Deploying and Optimizing Squid in a P2P Filesharing Application

Daniel Halperin, DIMACS REU, Harvey Mudd College, Claremont, CA

Squid is a scalable peer-to-peer infrastructure for performing flexible keyword-based queries with guaranteed results. I implemented a filesharing application on top of the Squid infrastructure and modified Squid itself to take advantage of application-specific assumptions that enable certain components of Squid to be optimized.

Specifically, I replaced the Hilbert Space-Filling Curve which has better performance in arbitrary applications with a modified Z-Curve customized to the search data being indexed. I then compared the performance of Squid using the Hilbert curve and the Z-curve in base 26 (letters only) and base 36 (letters and numbers).

Data Depth

Jason Burrowes-Jones, DIMACS REU,Howard University, Washington, DC

Questions from Statistics have inspired fundamental research in Computer Science. One example involves the notion of data depth - depth defined via a given set of data points. Many applications - mostly from Statistics - require an analogous notion for the depth of points in d dimensions. Several interesting generalizations of one-dimensional depth have been suggested. They pose a variety of challenges to computer science. On the algorithmic side, it is important to understand the complexity of computations involving depth, and to have efficient algorithms for these tasks. My project deals with two notions of depth: Tukey and Simplicial. I will attempt to improve upon computational costs of the simplicial median or find a more efficient way to find a deep point.