We investigate the problem of determining whether edges of a given
hypergraph form a
partition of the set of vertices. Binary decision diagrams (BDD) form a
computational model
(or data structure) for representation of Boolean functions. We show that
this problem for
$k$-uniform hypergraphs cannot be solved by a polynomial size $m$-BDD for
$m=\theta(\log
k)$, i.e., that any polynomial size BDD solving this problem needs to
query some of edges at
least $\log k$ times. We also show that, somewhat surprisingly,
randomized $1$-BDD of
polynomial size is sufficient to solve the problem.
The current Linux kernel paging algorithm uses an LRU algorithm to decide
which pages to evict from main memory. LRU and many similar algorithms
ignore the cost of eviction, selecting pages based only on the pages
access pattern. Aggressive write back is an optimization to LRU, whereby
pages that are nearby other pages which are evicted are also written
back. This opportunistic write back has low cost due to the locality of
the data on the disk, and should yield a significant performance
improvement.
We will give a brief review of the ideas presented in our first
presentation. Then, we will quickly move to a description of our
experiments, and an analysis of the results of our work.
The goal of this research was to construct mathematical models that
incorporate
HIV mutation data. These models involve systems of coupled nonlinear
autonomous differential equations. Our five-dimensional model focuses on
the
interactions amongst uninfected CD4+ cells, two HIV virus strains (mutant
and
wild-type variety), and T-lymphocytes infected with one (but not both) of
the
viral strains. Furthermore, we consider the implications of HIV mutations
between the single mutant virus strain and the wild-type HIV
virus. Steady
states are analyzed for variations in mutation rate, virus/T-cell
presence, and
drug interactions. The current model is derived from models proposed by
DeLeenheer and Smith, Nowak and May, et al.
Computed Tomography is the non-destructive method for reconstructing the internal structure of an object that is the basis of all medical imaging modalities This is done by sending x-rays through the body and measuring how much they are absorbed or attenuated. This data, referred to as the ray sum is modeled as a line integral along the path of the photons; the reconstruction problem is to find the attenuation at one point given this set of line integrals. In practice, Fourier analysis is used to .back-project. or reconstruct the image (of attenuation characterisitics).
Central to tomography and medical imaging is the model of the source/object/detector relationship; reconstruction algorithms are dependent upon descriptions of the relative positions, orientations and motions. Traditionally, Euclidean geometry is used to model these systems. However, this method becomes cumbersome for more complex imaging geometries.
It is a widely held belief that there is really a continuum of medical
imaging systems, that reconstruction algorithm changes continuously with
imaging geometry. Our aim is to model this continuum with a unifying
theory describing the imaging geometry based on Geometric Algebra. This
new approach will enable us to model the geometry and thereby explain
how images are reconstructed for an arbitrary medical imaging system.
Moreover, we hope this new method will be notationally clearer and allow
us to more succinctly describe reconstruction algorithms.
rtPCR (real-time PCR) is a very powerful tool to measure gene expression
in a cell. Quantification techniques used in rtPCR have the underlying
assumption that the efficiency of the system is constant. Data analysis
has shown, however, that the efficiency of PCR is NOT constant, but is
instead a function of PCR cycle number. In order to get more accurate
quantification results from rtPCR, an understanding of how efficiency
behaves with respect to cycle number is required. To this end, we have
developed a mathematical model that predicts the efficiency of PCR at each
cycle. This model will allow us to predict optimal experimental conditions
for rtPCR experiments and will allow us to develop more reliable
techniques to quantify gene expression from rtPCR data.
Consider the following setting: you are given n balls, and each ball is colored with one of three colors. You are unable to distinguish the colors, but in each step, you may pick two balls and ask the oracle whether the two balls are of the same color (and the oracle always gives you the correct answer). Our aim was to determine the smallest number of steps needed to solve the following two problems:
1) The plurality problem: find a ball which is colored by the most common
color.
2) The partition problem: split all the balls into three groups according
to their colors.
We found an optimal deterministic algorithm for the partition problem, as
well as a probabilistic algorithm for the same problem, whose expected
number of steps is optimal. We also derived some bounds for the expected
number of steps of a probabilistic algorithm for the plurality problem.
Consider an complete graph, G, edge-colored with three colors. Is it
possible to find three maximal cliques such that there intersection is
empty? Many subgraphs must be resolved for this to be possible. We
consider if all subgraphs can be resolved if Delta is contained in G.
We will discuss quivers and their representations, and the light these
shed on the McKay correspondence. Other topics include the Z3 quiver and
applications to string theory, and topics in Lie theory and differential
forms if time permits.
Recently there has been a lot of interest in using small samples to
approximate very large distributions. As technology has advanced, we are
faced with massive data streams, especially from the internet, e.g. email
data and IP traffic logs, from which we wish to glean useful
information. Basically, we have lists of numbers streaming by, so
phenomenally huge that storing all of them is out of the question, yet for
which we would like to know estimates for the median and other such
quantiles. Algorithms, such as that of Greenwald and Khanna, address those
issues. This project examines various algorithms given assumptions about
the distribution of data.
In mathematics and computer science, research problems encountered are often combinatorial in configuration. These problems often contain restrictions that allow certain parameters, but not others. I examined two such problems.
The first problem I investigated involves maximizing the size of restricted subset S, where Given U={1,2,.n} determine the maximum size of S such that if The solution is (3/4)n. What happens if we further restrict our subset so that if and
The other problem I examined was nicknamed .The Chair Problem.. This problem involves a two-player game played at a round table. Player A sits in the first chair, and declares she wishes to move j seats. Player B then instructs player A to move j seats clockwise or counterclockwise. Player A wishes to sit in as many different seats as possible, and player B wishes to confine player A as few seats as possible. There are n seats at the round table.
I examined two unique cases of this problem. In case 1, n=2^k (where
k=1,2,3.). Case 2 involves n=3k. Case 1 is advantageous to player A who
can indeed sit in all n seats, and case 2 is advantageous to player B who
can confine Player A to (2/3)n seats.
It is possible to prove a variety of combinatorial identities in
completely algorithmic fashion. Major advances in our understanding of
this topic lead to development of WZ method which finds and utilizes a
certain rational function R (different for every identity)to prove
existing combinatorial identities. However, obtaining function R using
existing procedure zeil is impossible for many complicated identities due
to limitations in computer speed and memory. We are interested in
developing new methods which will be more efficient in terms of speed and
memory usage. We hope that these methods will enable uss to obtain WZ
certificate functions in cases where zeil algorithm fails.
We want to obtain rigorous bounds for the Stokes Constants in the first
two Painleve Equations. The method produces a quickly converging sequence
for these constants.
The physical situation we are examining is a particle in a quantum
mechanical system (such as an atom) which is free to move on the positive
real axis. The region to the left of zero is taken to represent a physical
barrier (such as a wall) which is impossible to cross. The particle is
subject to a periodic external force given by
Omega*delta(x=a)sin(wt) and a purely spatially dependent potential given
by V*delta(x=a), where V,w, a, and Omega are parameters. Our objective is
to determine the long term behavior of such a system; specifically, we
would like to determine when ionization (the phenomenon where a particle
eventually leaves a system) occurs, and how this depends on the parameters
of the system.
As computer simulation of physical phenomena becomes more commonplace, the degree of precision available in the calculations begins to limit the depth of research. While it is true that, with sufficient resources, we can allocate memory for arbitrary precision, it would be far superior to make changes in the algorithms to eliminate the effect of rounding on sets of data.
In this specific scenario, a model of Bose-Einstein Condensation
is producing unexpected results. Is this an accurate representation of
the system, or do these anomalies stem from computational error? And in
the latter case, what is the best way to remove said error?
The structure of DNA and the packing of its molecules have been primarily
studied through X-ray diffraction patterns of DNA crystals grown in the
lab. A crystal is a solid with a repeating arrangement of molecules, and
there is a mathematical notation used to describe the 230 possible
patterns or space groups. Some of the possible space groups occur more
often than others. Much of the structural information found worldwide on
DNA is standardized and stored on the Nucleic Acid Database (NDB), which
is based here at Rutgers. I have helped revise a program that recreates
sufficient atomic coordinates of crystals from more limited information
found on the NDB in order to then find distances between molecules in the
crystal. The program involves simple linear algebra. It will later
include other characterizations of the intermolecular contacts, such as
angles between molecular axes. DNA in living cells will hopefully turn
out to pack similarly.
A common task in Bioinformatics is the search of sequence databases for relevant information. In protein sequence databases, searching is further complicated by both the increased amount of data and the complexity of sequence similarity. Protein similarity is not simply a matter of character matching, but rather is determined by a matrix of scores assigned to every match and mismatch.
One strategy to increase search speed is to map sequences into a binary space where the hamming distance between strings is comparable to the similarities of the original sequences. Within binary hamming spaces, statistically proven sampling methods can be used for fast, reliably sensitive searches. However, mapping the protein alphabet to a binary hamming space often comes with a certain level of distortion.
We have employed Linear Programming techniques to model and study
the nature of these mapping schemes. Specifically, we have found the
theoretically minimum distortion achievable for several biological scoring
matrices, as well as corresponding encoding weights. We have also
analyzed the use of these encoding weights to intelligently generate
pseudo-random encoding schemes that approach the theoretical minimum
distortions.
Squid is a scalable peer-to-peer infrastructure for performing flexible keyword-based queries with guaranteed results. I implemented a filesharing application on top of the Squid infrastructure and modified Squid itself to take advantage of application-specific assumptions that enable certain components of Squid to be optimized.
Specifically, I replaced the Hilbert Space-Filling Curve which has
better performance in arbitrary applications with a modified Z-Curve
customized to the search data being indexed. I then compared the
performance of Squid using the Hilbert curve and the Z-curve in base 26
(letters only) and base 36 (letters and numbers).
Questions from Statistics have inspired fundamental research in Computer Science. One example involves the notion of data depth - depth defined via a given set of data points. Many applications - mostly from Statistics - require an analogous notion for the depth of points in d dimensions. Several interesting generalizations of one-dimensional depth have been suggested. They pose a variety of challenges to computer science. On the algorithmic side, it is important to understand the complexity of computations involving depth, and to have efficient algorithms for these tasks. My project deals with two notions of depth: Tukey and Simplicial. I will attempt to improve upon computational costs of the simplicial median or find a more efficient way to find a deep point.