DIMACS/DIMATIA/Mathematics REU Final Student Presentation Abstracts

Continuations and Such
Avi Rashin and Adam Pantel, Rutgers University, New Brunswick, NJ
Mentors: Chung-Chie Shan, Rutgers Department of Computer Science, Simon Thomas, János Komlós, Rutgers Department of Mathematics

The Curry-Howard Isomorphism is an important theorem in computer science that provides a close correspondence between types (in programming) and propositions (in mathematical logic), whereby expressions of a given type correspond to proofs of the associated proposition. Under this correspondence, inductive proofs correspond to recursive programs, and proofs by contradiction correspond to programs that include backtracking.

Discussion of Hilbert's 19th Problem and Solution
Jordan Ledvina, Rutgers University, New Brunswick, NJ
Mentor: Eduardo Teixeira, Rutgers Department of Mathematics

In order to solve the 19th problem, a crucial result is needed concerning the regularity of a solution to a particular differential equation. Given very weak assumptions on the coefficients of this equation, Ennio De Giorgi was able to show using geometric arguments that the derivative of this solution is Holder continuous. And from this result, everything falls into place.

Limiting Free Boundary Problems: The One-Dimensional Profile
Jeremy Engel, Rutgers University, New Brunswick, NJ
Mentor: Eduardo Teixeira, Rutgers Department of Mathematics

The limiting free boundary problem is an important problem in applied mathematics. We consider the one-dimensional profile for systems with fully nonlinear elliptic operators and describe a process of homogenization for fully nonlinear elliptic equations.

Towards a Positive Rule for the Equivariant Cohomology of Lagrangian Grassmanians
Matthew Samuel and Charles Siegel, Rutgers University, New Brunswick, NJ
Mentor: Anders Buch, Rutgers Department of Mathematics

For every topological space with a group action, there is a rule called Equivariant Cohomology which assigns a ring. This ring captures the intersection theory of the space and takes into account the group's action. For the special case of the Lagrangian Grassmanian, this ring is determined by the structure constants with respect to a basis of Schubert classes. In this case, certain positivity conditions are known for the structure constants, which are actually polynomials in several variables, but no formula for computing them is known which explicitly exhibits these conditions. We present a formula for a special class of structure constants which manifestly satisfies the known positivity conditions.

Standard Affine Lie Algebra Modules, Vertex Operator Algebras, and the Function Ä(H,x)
Chris Sadowski, Rutgers University, New Brunswick, NJ
Mentor: Bill Cook, Yi-Zhi Huang, Rutgers Department of Mathematics

There are certain modules for affine Lie algebras which happen to be modules for a vertex operator algebra also. We examined how the module structure changes when our vertex operator is changed by a certain function called delta, and when our modules are what are called "standard" or "highest weight integrable" irreducible modules for an affine Lie algebra. It is well know that elements of the coroot lattice, when applied with this delta, cause delta to give us a module isomorphic to the module with which we started. We examined the more general case, in which we used elements of the coweight lattice (an important set containing the coroot lattice as a subset) with delta and determined what module structure delta had given us.

Type I and Type II Error Rates for Factor Analysis Applied to Data from the Latent Class Model
James Long, Columbia University, New York, NY
Mentor: John Kolassa, Rutgers Department of Statistics

Researchers frequently use goodness of fit statistics from factor analysis applied to categorical data despite model assumptions requiring multivariate normal manifest variables and continuous latent variables. When these model assumptions are violated, hypothesis testing regarding the number of latent factors may produce erroneous type I and type II error rates. In this paper, we compare type I and type II error rates generated by factor analysis applied to categorical manifest variables with error rates produced by latent class analysis applied to the same data. In general factor analysis applied to categorical data produces correct type I error rates and has roughly the same power as latent class analysis.

Streaming Entropy Computation
Justin Thaler, Yale University, New Haven, CT
Mentors: S. Muthukrishnan, Rutgers Department of Computer Science, Graham Cormode, AT&T Labs

A massive data stream is a sequence of information that is so long it cannot be stored completely in memory, and as a result algorithms on data streams should not just be time efficient, but space efficient as well. According to "Estimating entropy and entropy norm on data streams" by Chakrabarti, Ba, and Muthukrishnan, "Sublinear space is mandatory and polylogarithmic space is often the goal." Chakrabarti et al. present a space-efficient algorithm for estimating the entropy of a data stream in the paper "A Near-optimal algorithm for computing the entropy of a stream". We implemented three versions of this algorithm. In this talk, we will present the results of controlled experiments with these implementations.

Picking Teams of Warfighters: Beyond the Playground
Nicole Scholtz, Denison University, Granville, OH
Mentor: Nina Fefferman, Tufts University/DIMACS

The completion of any type of special mission requires selecting the best team for the job. Here, we examine the best way to simultaneously compose such teams from a limited population. In doing so, each team must meet certain competency and resiliency requirements, as dictated by the mission. Our aim is to create an efficient algorithm to assign people to teams such that these requirements are met, while minimizing the number of people assigned to teams.

Modeling the Spread of Respiratory Disease in a New Jersey State Prison
Anne Eaton, Grinnell College, Grinnell, IA and Johanna Tam, California State Polytechnic University, Pomona, CA
Mentor: Nina Fefferman, Tufts University/DIMACS

We are building two different models of the spread of a respiratory disease at CRAF, which is the New Jersey State intake prison. Johanna Tam is using SIR modeling techniques and dividing the population into 16 subgroups while Anne Eaton is using network theory and analyzing at the individual level. These models follow the same daily schedule and the same population. We hope to compare our models and analyze our data.

Passing the Buck of Sick Children: A Game
Barton J. Willage, Beloit College, Beloit, WI
Mentor: Nina Fefferman, Tufts University/DIMACS

Sick children have a multi-faceted impact on society, adversely affecting schools, business and the income of parents. The entities with the most responsibility and power in keeping the spread of illness among children are schools and parents. However, parents and schools interest are neither completely aligned nor completely divergent: This gives rise to a non-zero-sum game. However, a sequence of games in matrix form where parties have imperfect information leads to poor outcomes for both parties. On the other hand, games in tree form are sometimes unreasonably large and tend to give rise to outcomes that are impractical from a policy perspective.

Optimization of Sequencing and Threshold Levels of Detection Systems
Tsvetan Asamov, Kenyon College, Gambier, OH
Mentors: E.A. Elsayed, Rutgers Department of Electrical Engineering, Ming Xie, Rutgers Department of Statistics

6-critical graphs in the Klein bottle
Daniel Kral, Jan Kynčl and Bernard Lidický, Charles University, Prague
Mentors: Mike Saks and Van Vu, Rutgers Department of Mathematics, Eric Allender, Rutgers Department of Computer Science

A 6-critical graph is an inclusion-minimal graph which is not 5-colorable. We describe a complete list of nine 6-critical graphs for the graphs embeddable in the Klein bottle, together with their nineteen nonisomorphic embeddings. This gives us a polynomial algorithm for deciding 5-colorability of graphs embedded in the Klein bottle.

Logspace and st-reachability
Jan Kynčl, Marek Sterzik and Tomáš Vyskočil, Charles University, Prague
Mentors: Mike Saks and Van Vu, Rutgers Department of Mathematics, Eric Allender, Rutgers Department of Computer Science

An input of the st-reachability problem is a directed graph G together with a pair of vertices s and t. The goal is to determine if s and t are connected by a directed path in G.

L (logspace) is the class of decision problems solvable by a Turing machine restricted to use an amount of memory logarithmic in the size of the input, n. (The input itself is not counted as part of the memory.) The class NL is a non-deterministic version of L.

It is known that st-reachability is a complete problem for NL, whereas the undirected version lies in L. The complexity of the planar st-reachability problem, where the input is a planar graph, is an open question.

We present a logspace reduction from st-reachability on a fixed genus surface to the st-reachability in the plane. Previously, this reduction was known only for the torus.

Generating New Potential Functions for DNA Base-Pair Steps
Mario De Franco, Princeton University, Princeton, NJ
Mentor: Wilma Olson, Rutgers Department of Chemistry

This project explores the sequence dependence of the structural properties of DNA at lengths of a few hundred base pairs. A dimeric base-pair step can be modeled by two planar slabs; its configuration in three dimensional space can then be determined by rotational and translational parameters known as twist, tilt, roll, shift, slide, and rise. With step parameter data obtained from the Nucleic Acid Database, empirically-derived potential functions have been created to gauge the likelihood of a particular step type conforming to a set of the six parameters. My work mainly is concerned with applying clustering analysis to the step parameter data, devising new functions that more accurately assess the data distribution, and testing the functions with nucleosome threading, i.e., determining the affinity of a sequence for binding to a nucleosome.

Privacy-preserving distributed data mining of hybrid fragmented data
Aleksandar Nikolov, Saint Peter's College, Jersey City, NJ
Mentor: William Pottenger, DIMACS

Two important directions for data mining research are distributed data mining and privacy-preserving data mining. Potentially useful data is often distributed among multiple parties and it is not always feasible to aggregate all the data in one site before it is mined. Privacy issues are one major barrier to aggregation and unrestricted sharing of data. For these reasons, it is useful to have efficient techniques for mining data in a distributed environment while preserving private information. Current research has focused on data which is horizontally or vertically fragmented. However, in many real-world situations, data comes from heterogeneous sources and is neither horizontally, nor vertically distributed. In this talk, we will present an overview of some privacy-preserving data mining techniques and we will discuss the challenges of privacy-preserving mining of hybrid fragmented data.

Determining Common Authorship Among Documents
Paul Bonamy, Lake Superior State University, Sault Ste. Marie, MI
Mentor: Paul Kantor, SCILS

Traditionally, author identification focuses on assigning a document to a single author with a high level of confidence. Using existing techniques and software as a starting point, we attempt to determine whether two documents share a common author without considering the identity of that author. Initial results show high accuracy using both raw feature vectors and existing author identification models. Analyses using these existing models appear to be particularly effective at identifying pairs of documents with a common author.

Classification of Epileptic Brain Activity
Latoya Clay, Clark Atlanta University, Atlanta, GA
Mentor: Wanpracha (Art) Chaovalitwongse, Rutgers Department of Industrial Engineering

Epilepsy is the second most common brain disorder after stroke and currently afflicts at least 2 million Americans. The most common symptom of epilepsy is spontaneous recurrent seizures. There has been recent research indicating promising signs of seizure predictability. The objective of this study is to determine whether the correlations/synchronizations among different brain areas during normal and pre-seizure periods is significantly different. This difference should be demonstrated through a higher correlation among brain areas during the period closer to a seizure.

An Algorithm for Functional siRNA Detection
Liyang Diao, Wake Forest University, Winston-Salem, NC
Mentor: Stanley Dunn, Rutgers Department of Biomedical Engineering

Short-interfering RNA, or siRNA, has much potential in the area of gene silencing. The current problem lies in determining which strings of siRNA are effective in suppressing gene expression and which strings are not: Given a string of siRNA of twenty nucleotides, there are 4^20 number of potential strings of siRNA, only a few of which may actually be useful. We created an algorithm based on a weighted levenshtein distance such that the distance from functional siRNAs to other functional siRNAs is less than the distance from functional to nonfunctional siRNAs.

Requirements for Bistability in Particular Enzyme Driven Reaction Networks
AJ Stewart, Humboldt State University, Arcata, CA
Mentor: Stanley Dunn, Rutgers Department of Biomedical Engineering

For a given reaction network there exists a graph called the species reaction graph, which corresponds to the network. This graph contains two sets of nodes, reaction nodes and species nodes. In this talk we will present some conjectures involving the requirements for bistability in reaction networks as related to the algebraic properties of its species reaction graph and the labeling of its reaction nodes.