2010 Proposed REU Project Descriptions

Proposed projects for the 2010 DIMACS and DIMACS/DIMATIA REU Programs

Please revisit this page frequently, additional projects will be posted through January.

Project #: DDD2010-01

Mentor: James Abello, abello at dimacs dot rutgers dot edu, DIMACS

Project Title: Anomaly Detection

This project addresses the problem of finding persistent patterns in evolving networks. A central question is the characterization of patterns that can be used as the basis to detect anomalous activities in time-evolving networks.

Requirements: Background in Statistics and SQL is desired. Participants are expected to develop scripts to statistically analyze a variety of data sets.

Reference:
http://www.cs.cmu.edu/~christos/TALKS/SIGMOD-07-tutorial/

* You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DDD2010-02

Mentor: James Abello, abello at dimacs dot rutgers dot edu, DIMACS

Project Title: DIMACS Graph Mining

In this project we will build a variety of multi-graphs that can be extracted from this data set. We will then use our current set of graph analysis tools to parse, navigate, visualize and synthesize the findings. One central challenge is to devise methods that are privacy preserving.

Requirements: Some Knowledge of SQL, fundamental graph algorithms and C/C++ programming is a plus. However, there are several analysis tasks that can be performed using our plug in graph visualization system and that require only minimal amount of programming.

Reference:
[AvHK06] J. Abello, F. van Ham, and N. Krishnan, "Ask-GraphView-: A Large Scale Graph Visualization System", IEEE Transactions in Visualization and Computer Graphics, 12(5): 669-676 (2006).

* You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DDD2010-03

Mentor: James Abello, abello at dimacs dot rutgers dot edu, DIMACS

Project Title: Name That Cluster

Given a user query, search engines generally return a very sizeable collection of possible answers. Clustering has been proposed as a tool to partition the possible answer set into more manageable subsets of related results. There is no current agreement on the preferred mode of presentation of these clusters. Currently, most search engines display the set of results on almost a pure textual form. However, relatively recently we have witnessed some timid attempts to use some graphical representations. This study is a first step to elucidate when and why text appears to outperform graphics for certain fundamental clustering related tasks. To this end we designed three interfaces to display flat clusters of user queries. The interfaces are enhanced with mechanisms by which users provide feedback about the relevance of a cluster for a pre-specified input query. Subsequently, users are asked to provide a name for a given cluster that best describes the cluster contents. In this project, we will analyze the results obtained from a web user study that is planned to run from Feb to May 2008.

Requirements: Basic Statistics, reading clustering related papers and preparing summary of findings.

Reference:
[ASGT07] J. Abello, J, Schulz, H, Gaudin, B, and Tominski, C (2007). Name That Cluster - Text vs. Graphics, IEEE InfoVis Conference, Sacramento, November 2007.

* You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DDD2010-04

Mentors: Ming Xie, mxie at stat.rutgers.edu, Department of Statistics and E.A. Elsayed, elsayed at rci.rutgers.edu, Department of Electrical Engineering

Project Title: Optimum Strategies of Container Inspection at Port-of-Entry

Containers arriving at ports may contain undesirable contents such as drugs, radioactive material or chemical and biochemical agents. Inspection of a fraction of these containers (randomly or based on some risk factor) might lead to an improved security system. Clearly, there is a probability that a container with undesirable contents might enter into the system without detection or a "clean" container might be subjected to a thorough inspection.... The objective of this project is determine the optimum inspection strategy that minimizes the occurrence of the above probabilities. Approaches such as the development of adaptive threshold levels of the inspection sensors based on the assigned risk factor might prove to an efficient inspection strategy. The research investigates new and efficient inspection strategies.

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2010-05

Mentor: Hui Xiong, xionghui at gmail.com, MSIS

Project Title: Cross-Channel Financial Fraud Detection with Data Mining Techniques

Recent years have witnessed increased interests in financial fraud detection and prevention. This is driven by the ever-worsening financial crisis and an increased awareness of the importance of financial risk management. Indeed, the wide availability of fine-grained financial data enables unprecedented opportunities to change the computing paradigm for financial fraud detection and prevention. However, as these financial data become more detailed and multi-dimensional, it becomes ever more difficult for analysts to sift through the data even though it may contain valuable information. Data Mining holds great promise to address this challenge by providing efficient techniques to uncover useful information hidden in the large data repositories. In this project, we will investigate the unique features that distinguish data mining techniques from traditional analytic techniques for fraud detection and prevention. Also, as a pilot feasibility study, we will be focused on cross-channel fraud detection, in which a fraudster can gain access to necessary information in one channel and exploits this knowledge to commit fraud through a different channel. Cross-channel fraud is difficult to identify, track and resolve since the activities of the fraudster are usually similar to the routine business activities.

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2010-06

Mentor: Paul Kantor, kantor at scils.rutgers.edu , School of Communication, Information and Library Studies

Project Title: Game Theoretic Aspects of Homeland Security

The effort to protect our country form the threat of smuggled weapons is a non-zero sum game, in which the opponent's estimates of the value of a success are different from our own estimates of the cost of failure. The problem is known under the name "Inspector Game". The summer project will examine this problem in the context of an ongoing research program on the problem of optimal detection of nuclear threats at borders, and within the country.

* You must be enrolled at a U.S. university to be eligible for this project.

Project #: DDD2010-07

Mentor: Endre Boros, boros at rutcor.rutgers.edu, RUTCOR

Project Title: Equivalent Quadratic Pseudo-Boolean Functions

Unconstrained quadratic binary optimization is a widely used technique for numerous practical problems, and is also known to be a hard optimization problem. We know that every real-valued set-function on an n-set has a unique multilinear polynomial representation (called a pseudo-Boolean function). Quadratic pseudo-Boolean functions also have many other representations, one of which is a nonnegative combination of 2-variable parity functions. This form can be equivalently viewed as a weighted signed graph. Every spanning tree in this signed graph gives rise to a different pseudo-Boolean function, all of which, however, share the same value-set (same range). This implies that they have the same minimum (maximum) values, and we can use any one of these functions to determine the minimum of the originally given input function. Let us call two pseudo-Boolean functions equivalent if they have the same value-set. The project will explore the equivalent family of functions corresponding to the different spanning trees under the weighted signed graph representation. We plan to study several questions, in particular how the graph structure determines the availability of alternative quadratic multilinear polynomials, and how different equivalent representations change the set of local minima. Of course, the global minimum value remains the same, as noted above, but the different functions may have different local minima.

Prerequisites: An introductory course in O.R. or linear programming would be helpful, but not necessary.

Reference:
E. Boros, K. Elbassioni, V. Gurvich, L. Khachiyan, and K. Makino, On the Complexity of Some Enumeration Problems for Matroids," SIAM J. on Discrete Mathematics, 19 (2005), 966-984.

E. Boros, V. Gurvich, Perfect Graphs Are Kernel Solvable," Discrete Mathematics, 159 (1996), 35-55.

E. Boros, V. Gurvich, L. Khachiyan, and K. Makino, On Maximal Frequent and Minimal Infrequent Sets in Binary Matrices," Annals of Mathematics and Artificial Inteligence, 39 (2003) 211-221.

E. Boros, V. Gurvich, and R. Meshulam, Difference Graphs," Discrete Mathematics, 276 (2004), 59-64.

E. Boros, I. Lari, and B. Simeone, Block Linear Majorants in Quadratic 0-1 Optimization," Discrete Applied Mathematics, 145 (2004), 52-71.

E. Boros, and V. Menkov, Exact and Approximate Discrete Optimization Algorithms for Finding Useful Disjunctions of Categorical Predicates in Data Analysis," Discrete Applied Mathematics, 144 (2004), 43-58.

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2010-08

Mentor: Naftaly Minsky, minsky at rutgers.edu, Computer Science

Project Title: Decentralized Internet Communities

This project will explore forming communities of agents, operating over the Internet, that interact with each other subject to a common law in the absence of a centralized server. Such decentralized communities constitute multi-agent systems that can be used for social networking, as well as for general computational tasks. Communities of this kind can be more scalable and more secure than traditional centralized Internet communities; they are also more flexible and more "democratic" than communities like Facebook, because the law of a given community is selected by its own members. Students will design decentralized communities of their choice - from social networking to open commercial markets.

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2010-09

Mentor: Bahman Kalantari, kalantar at cs.rutgers.edu, Computer Science

Project Title: Exploring Polynomiography and Its Algorithmic and Mathematical Applications

Polynomiography is a mathematically-inspired computer medium based on algorithmic visualizations of one of the most basic and fundamental tasks in sciences and math: solving a polynomial equation. Polynomiography serves as a powerful medium for creativity, discovery, and learning, with numerous applications in education, math, science, art and design. Interested students will select a particular problem and explore related theoretical and/or practical applications of polynomiography. In particular, some specific problems will be described from computational geometry, numerical analysis, discrete math, education, or algorithmic mathematical art.

Reference:
B. Kalantari, I. Kalantari, and F. Andreev, "Animation of Mathematical Concepts Using Polynomiography," Proceedings of the International Conference on Computer Graphics and Interactive Techniques ACM SIGGRAPH 2004 Educators program, 2004.

B. Kalantari, "Polynomiography and Applications in Art, Education, and Science," Computers & Graphic 28:3, (2004) pp. 417-430.

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2010-10

Mentor: Midge Cozzens, midge6930 at comcast.net, DIMACS

Project Title: A Game Theory Approach to Cascading Behavior in Networks

This project will study the rapid flow of information in large social networks. This is important in "viral marketing" and comes out of early work done on the spread of epidemics and social network theory. A graph is built with nodes representing individuals in a population and an edge between two nodes if they have some form of communication. Each node has a choice of two behaviors, an old behavior and a new behavior, and an incentive (payoff) for matching behaviors. Each node plays the coordinated game with each neighbor and the payoff for a node is the sum of the individual game payoffs. A sample problem is to determine the k most influential people in the network, an NP-hard problem. We seek variations on the model, such as weighting the edges with other payoffs (e.g., with marketing strategies of price reductions) to better determine the most influential players (often those to start marketing to).

References:
E. Adar, L. Zhang, A. Adamic, and R. M. Lukose, "Implicit structure and the dynamics of blogspace," Workshop on Weblogging Ecosystem, 2004.

D. Kempe, J. Kleinberg, and E. Tardos, "Maximizing the spread of influence in a social network". Proc. of the 9th International Conference on Knowledge Discovery and Data Mining, pp. 137-146. 2003.

J. Leskovec, L. Adamic, and B. Huberman, "The Dynamics of Viral Marketing," Proc. 7th ACM Converence on Electronic Commerce, 2006.

N. Nisan, T. Roughgarden, E. Tardos, and V. Vazirani, (eds.) "Cascading Behavior in Networks: Algorithmic and Economic Issues", Algorithmic Game Theory, Cambridge: Cambridge University Press, 2007 pp. 613-632.

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2010-11

Mentor: Alexander Schliep, schliep at cs.rutgers.edu, BioMaps Institute for Quantitative Biology

Project Title: Algorithm animation for bioinformatics algorithms

Gato (http://gato.sf.net), an open source software system written in Python, animates graph algorithms to show their dynamic behavior. A wide range of bioinformatics algorithms - such as aligning two sequences to assess their similarity, assembling complete genomes from fragments produced by sequencing, answering questions about the origin of species by constructing phylogenetic trees - can either be phrased as graph algorithms or can operate on graphs. Developing animations for selected algorithms will introduce participants to algorithmic aspects of bioinformatics and provide deep insights into the workings of selected algorithms.

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2010-12

Mentors: Nina Fefferman, fefferman (at) aesop.rutgers.edu, Ecology and Tamra Carpenter, tcar at dimacs.rutgers.edu, DIMACS

Project Title: Allocation of Monetary Resources in HIV-Infected Communities

In many impoverished nations, monetary resources to treat HIV are severely limited, so communities are unable to treat all of the infected population. While much research focuses on specific treatment strategies [18] or more macro-scale modeling [19], recent research is also looking at finer models of equitability [20] and economics [29]. We will develop and refine a mathematical epidemiological model that ties community economics with HIV progression. Through computer simulation, the model will be used to gain intuition and potentially inform policy decisions on how to allocate treatment resources. We will refine this model to capture features of populations in specific areas (e.g., sub-Saharan Africa) and use it to explore the impact of different treatment strategies with the goal of identifying those that best assure the stability and long-term survival of the community.

Reference:
R. F. Baggaley, G. P. Garnett, and N. M. Ferguson, "Modeling the Impact of Antiretroviral Use in Resource-Poor Settings," PLoS Medicine 3(4):e124, 2006.

Cuddington, J. Hancock, and C. Rogers, "A Dynamic Aggregative Model of the AIDS Epidemic with Possible Policy Interventions," Journal of Policy Modeling 16(5) (1994), 473-496.

D. Wilson, and S. Blower, "Designing Equitable Antiretroviral Allocation Strategies in Resource-Constrained Countries," PLoS Medicine 2(2):e50, 2005.

N. H. Fefferman, and J. N. Kibambe, "A Household Resource Model for Epidemic Control in Financially Constrained Populations," (Under Review).

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2010-13

Mentors: Linda Ness, (Telcordia Applied Research) and Tamra Carpenter, tcar at dimacs.rutgers.edu, DIMACS

Project Title: Mathematical Models for Cognitive Systems

There is an increasing need for real-time networked computing systems that fuse streaming data from multiple sensors, perceive events, act on them and communicate effectively to other people and systems. This project will a) conceive and document examples of such systems, b) attempt to develop mathematical models for these systems by identifying suites of relevant mathematical models and algorithms, c) identify the strengths and limitations and d) glimpse the research forefront in this area.

References:
D. Bassu and C. Behrens, "Distributed LSI: Scalable Concept-based Information Retrieval with High Semantic Resolution," Proceedings of the 3rd SIAM International Conference on Data Mining, 2003.

R. R. Coifman, S. Lafon, A. B. Lee, M. Maggioni, B. Nadler, F. Warner, and S. W. Zucker, "Geometric Diffusions as a Tool for Harmonic Analysis and Structure Definition of Data: Multiscale Methods", PNAS 102:21 (2005), pp. 7432-7437.

P. W. Jones, M. Maggioni, and R. Schul, "Manifold Parametrizations by Eigenfunctions of the Laplacian and Heat Kernels," PNAS 105:6, pp. 1803-1808.

E. Liberty, F. Woolfe, P. G. Martinsson, V. Rokhlin, and M. Tygert, "A Fast Randomized Algorithm for the Approximation of Matrices", Applied and Computational Harmonic Analysis 25, pp.335-366, 2008.

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2010-14

Mentor: Aaron Jaggard, adj at dimacs.rutgers.edu, DIMACS

Project Title: Pattern Avoidance and Parallels Between Families of Permutations

The broad questions motivating this project involve parallels between different classes of permutations: Given two different classes of permutations---e.g., all permutations and the set of involutions (permutations whose square is the identity permutation)---which combinatorial questions have (approximately) the same answer when asked of both classes? (For example, the probability that a permutation in the symmetric group Sn contains a specified subsequence of k distinct integers is exactly 1/k! , while the probability that an involution in Sn contains the subsequence approaches 1/k! as n approaches infinity [3].)

In this project, we'll focus on questions related to pattern avoidance by permutations; a permutation p:{1,2,..., n}--->{1,2,...,3} (which we write as its list of values p(1)p(2)...p(n)) avoids the pattern 231 if there is no triple (i,j,k) of indices such that i (i.e., the order relationships between p(i), p(j) , and p(k) are exactly the same as those between 2, 3, and 1); avoidance of other patterns is defined analogously. Depending on the interests of the student, the questions that we study may range from concrete (as in [1], answering questions about specific patterns) to more abstract (as in [2], answering questions about families of patterns or, even more generally, about parallels between different families of patterns). We'll use Mathematica or Maple to help suggest/test conjectures, but extensive programming experience is not required.

References:
O. Guibert, E. Pergola, and R. Pinzani, "Vexillary involutions are enumerated by Motzkin numbers," Ann. Comb., 5 (2):153-174, 2001 (link).

A.D. Jaggard, "Prefix exchanging and pattern avoidance by involutions," Elec. J. Comb., 9 (2): Research paper #16, 2003 (link).

B.D. McKay, J. Morse, and H.S. Wilf, "The Distributions of the Entries of Young Tableaux," J. Comb. Th. A, 97 (1):117-128, 2002 (link)

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2010-15

Mentor: Aaron Jaggard, adj at dimacs.rutgers.edu, DIMACS
Project Title: Permutation Statistics and Parallels Between Families of Permutations
The broad questions motivating this project involve parallels between different classes of permutations: Given two different classes of permutations---e.g., all permutations and the set of involutions (permutations whose square is the identity permutation)---which combinatorial questions have (approximately) the same answer when asked of both classes? (For example, the probability that a permutation in the symmetric group Sn contains a specified subsequence of k distinct integers is exactly 1/k! , while the probability that an involution in Sn contains the subsequence approaches 1/k! as n approaches infinity [3].)
In this project, we'll focus on questions related to permutation statistics. These are nonnegative-integer-valued functions on permutations; classical examples of statistics on a permutation p include its descent number des(p) (the number of i such that p(i)>p(i+1)) , the number of inversions inv(p) (the number of i < j such that p(i)>p(j)) , and the major index maj(p) (the sum of the values i such that p(i)>p(i+1)) . One important problem is the determination of the distribution of a statistic (or tuples of statistics)---i.e., the number of permutations for which the statistic takes a certain value---over all permutations of a given length. A number of interesting results have been obtained showing that certain statistics (or tuples of statistics) have the same distribution; see [1] for one recent example of this that also provides a way to decompose classical statistics into sums of other statistics.
References: 1. E. Babson and E. Steingr�msson, "Generalized Permutation Patterns and a Classification of the Mahonian Statistics," S�minaire Lotharingien de Combinatoire, B44b (2000) (link).
* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2010-16

Mentors: Graham Cormode, graham at dimacs.rutgers.edu, At&T Labs and S Muthukrishnan, muthu at cs.rutgers.edu, Computer Science
Project Title: Outsourced Computation Verification
As more computations are outsourced to third parties (as in cloud computing), there is a need for methods which can help the data owners verify that the computation has been done correctly -- without forcing them to perform the full computation themselves. Prior work has addressed how to ensure that the data is *stored* correctly, but this line of work is concerned with showing that the result of a more complex computation is correct, by designing interactive proof systems that both parties can execute efficiently.
Prerequisite: A background in discrete math, number theory, or crypto is helpful but not necessary.
Reference:
Amit Chakrabarti, Graham Cormode, and Andrew McGregor. Annotations in data streams. In International Colloquium on Automata, Languages and Programming (ICALP), 2009.
* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2010-17

Mentors: Liviu Iftode, iftode at cs.rutgers.edu and S. Muthukrishnan, muthu at cs.rutgers.edu, Rutgers University
Project Title: Game-theoretic analysis of climbing social ladder in networks
In social networks (say Facebook), there are rules for how people discover each other and become friends. For example, say person A desires to become friends with B. The rules constraint A's ability to become B's friend outright and induces a game-theoretic component. A has to judiciously make other intermediate friends in order to eventually become B's friend. The project will involve modeling this situation, studying the game theory and dynamics of various parties, and understanding the price of anarchy and stability of the game. The situation above is an example of a more general phenomenon in networks where people learn and gain trust of others incrementally for nefarious (in DHS settings) or potentially harmless (in social networks) purposes.
* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2010-18

Mentor: Ji Young Choi, JYChoi at ship.edu, Shippensburg University
Project Title: The boundaries of the minimum Pk-total weights for simple connected graphs
The sum of the weights on every path of length k in a graph with edges of weight 1 or -1 is called the Pk-total weight of the +/- 1-assigned graph. The minimum of the absolute value of the Pk-total weights of a graph considering every possible +/- 1 assignment is called the minimum Pk-total weight. This project is to study the boundaries of the minimum Pk-total weights for simple connected graphs.
* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2010-19

Mentor: Eugene Fiorini, gfiorini at dimacs.rutgers.edu, DIMACS
Project Title: Extremal Properties of k-Partite Graphs of Large Girth, k = 2, 3
Project description in pdf form
Extremal graph theory is a branch of graph theory that is interested in studying the following scenario: Suppose G is a family of graphs, P a property and an invariant for each . We wish to determine a value m such that whenever for , then G has property P. Those graphs for which and G does not have property P are said to be the extremal graphs with respect to property P and invariant . For example, suppose G is the family of simple graphs on n vertices, n a positive integer. Moreover, suppose P is the property that "G contains a cycle," and the invariant is the number of edges of G, . Then (the number of edges a graph on n vertices can possess and still not contain a cycle) and the extremal graphs are trees on n vertices.
Consider all k-partite simple graphs (k = 2, 3) on a finite set of vertices. The objective of this project is to explore bounds on the number of edges that k-partite graphs can have and still be of large girth. That is, what is the maximum number of edges a bipartite or tripartite graph can have and avoid cycles of length n (n = 3, 4, 5, ?)?
* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2010-20

Mentor: Michael Littman, mlittman at cs.rutgers.edu, Computer Science
Project Title: In Search of Value Equilibria
The gold standard for behavior in game theory scenarios is the Nash equilibrium. A set of strategies is in Nash equilibrium if none can achieve higher expected payoffs by changing behavior. In the machine-learning community, learning algorithms typically do not work this way. Instead of evaluating possible complete strategies, they compute estimates of the values of individual decisions and select decisions with high value. Following this idea, we can say a "value equilibrium" is a collection of value estimates for which the payoffs of the corresponding high value decisions are accurately modeled. In this project, we will attempt to make this notion concrete, prove various properties about it, and compare it to the more common Nash equilibrium concept. Are value equilibria easier to find? The two concepts appear to be identical in zero-sum games and games with pure equilibria. Are there games that have Nash equilibria that are not value equilibria and vice versa?

Project #: DDD2010-21

Mentor: Art Chaovalitwongse, wchaoval at rci.rutgers.edu, Department of Industrial Engineering
Project Title: A New Paradigm for Master Defect Record Retrieval
As electronic records (e.g., medical records, technical defect records) accumulate, the retrieval of a record from a past instance with the same or similar circumstances has become extremely valuable. This value is because a past record may contain insight or the correct diagnosis or solution to the current circumstance. We refer to the two records of the same or similar circumstances as master and duplicate records. Current record retrieval techniques are lacking when applied to this special master defect record retrieval problem. Such an aim poses two challenges: (1) representation of the text records and (2) retrieval of desired record. In this study, the student may develop a new paradigm for master defect record retrieval, including keyword weighting and similarity measures. The student may implement supervised or unsupervised learning techniques. The retrieval paradigm can then be tested on a real-world large-scale defect record database from a telecommunications company. Alternatively, the student may focus on document representation. That is, efficient ways of representing large text files.

Project #: DDD2010-22

Mentor: William Pottenger, billp at dimacs.rutgers.edu, DIMACS and Computer Science
Project Title: Higher Order Learning
In our work we are developing a framework termed Higher Order Learning that builds models by leveraging associations in the form of latent relationships in an entity-relation graph. A general multi-attribute entity-relation graph has nodes that are people, places and things, etc. Higher Order Learning violates the underlying assumption made in traditional machine learning algorithms that records are independent and identically distributed (Taskar et al., 2002). Although this assumption simplifies the underlying mathematics of statistical models and of the corresponding parameter estimation procedures, it actually does not hold for many real world applications (Getoor & Diehl, 2005). Machine learning methods based on Higher Order Learning leverage relations between objects and features in an entity-relation graph and in doing so, operate on a much richer data representation compared to the traditional feature vector form.
In this research, Higher Order Learning techniques have been investigated and applied to a number of interesting datasets including Border Gateway Protocol (cybersecurity), E-Commerce (web mining), and Nuclear Detection (homeland security). The approach has been shown to outperform the popular Support Vector Machine algorithm on benchmark textual data sets as well as real-world streaming data (Ganiz, Lytkin & Pottenger, 2009). DHS summer interns will have an unparalleled opportunity to explore Higher Order Learning algorithms and applications in the context of law enforcement and other domains.
Requirements: Applicants should have a solid Java programming background. Familiarity with XML is a plus, as is experience in text processing with Perl or other scripting languages
* You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DDD2010-23

Mentor: William Pottenger, billp at dimacs.rutgers.edu, DIMACS and Computer Science
Project Title: Sensemaking
Semantic web technologies are making it possible for analysts (such as detectives and other law-enforcement workers) to have smarter databases that not only store data but also model the knowledge contained in that data and reason with it.
As databases become more knowledge-based, analytics systems and their user interfaces should also become smarter in helping the analyst accomplish his or her investigative task. Cognitive science has developed models of the mental processes analysts use when they work to make sense of large amounts of information. We are using these Sensemaking models to build data analytics user interfaces that give more intelligent feedback and facilitate a more efficient work process.
This project provides the chance to work with real-world law enforcement data using semantic web technologies, to build and test software that provides new ways for analysts to interact with data, and to learn about Cognitive Science theories based on computational thinking.
Requirements: Applicants should have a solid Java programming background. Familiarity with XML is a plus, as is experience in text processing with Perl or other scripting languages.
* You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DDD2010-24

Mentor: Eugene Fiorini, gfiorini at dimacs.rutgers.edu, DIMACS
Project Title: Partitioning Graphs into Class-0 Pebbling Subgraphs
Graph pebbling consists of placing "pebbles" or markers on vertices of a simple graph and moving the pebbles according to the following rule: There must be at least two pebbles on a given vertex in order to make a pebbling move. A pebbling move from one vertex to an adjacent vertex consists of removing one pebble from the initial vertex and moving a second pebble from the initial vertex to the terminal vertex. The pebbling number of a graph, p(G), is the minimum number of of pebbles needed to, after a series of pebbling moves, have a pebble at any given vertex, given any initial configuration of the p(G) pebbles. A class-0 graph G on n vertices is one in which p(G) = n. We define a k class-0 graph G as one in which a simple graph G can be partitioned into k class-0 subgraphs on n vertices. Previous results have shown that complete graphs on n vertices, Kn, are k class-0 for a given n > N for some integer N (Fiorini, Verbeck, de Silva). This project will explore additional questions related to pebbling and k class-0 graphs.
* You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DDD2010-25

Mentor: Aaron Jaggard, adj at dimacs.rutgers.edu, DIMACS and Vijay Ramachandran, vijayr at cs dot colgate dot edu, Colgate University
Project Title: Susceptibility of IP Traceroute to Falsification
IP traceroute is a mechanism by which one can determine what Internet path is used to carry traffic for a source-destination pair. Several traceroute-probe methods exist [1], although many computer systems come with a built-in standard traceroute tool. Although networks concerned with revealing their topology in response to traceroute probes can simply drop traceroute-response packets, there may be reasons to falsify the responses (rather than hide them). For example, a network operator may want to lie and convince others that a path offered is much shorter than it actually is.
The purpose of this project is twofold: First, considering the various traceroute methods available, is it possible to design algorithms that can detect a traceroute probe and provide consistent, false answers? Second, if it is possible to do so, is it possible to develop a new probing strategy that can reveal or bypass false responses?
Basic knowledge of IP networking is helpful but not necessary for this project. [1] "Traceroute probe method and forward IP path inference." M. Luckie, Y. Hyun, B. Huffaker. IMC 2008. (http://portal.acm.org/ft_gateway.cfm?id=1452557&type=pdf&coll=portal&dl=ACM)
* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2010-26

Mentor: Aaron Jaggard, adj at dimacs.rutgers.edu, DIMACS and Vijay Ramachandran, vijayr at cs dot colgate dot edu, Colgate University
Project Title: Characterizing the trade-off between network monitoring and network overload
Operators of networks are interested in performance anomalies that may occur, e.g., timing problems or traffic loss. To detect such anomalies, probing devices can be placed throughout the network; these devices can send test packets to each other, measure properties of the transmission (e.g., delay), and alert the operator when something abnormal is detected. However, there is a trade-off among the number of probe devices, amount of test traffic, and overhead that negatively affects regular network traffic.
Some work [1] has been done on algorithms to choose locations for probing devices such that the paths between the devices adequately cover the network). However, this work is only one of many possible ways to approach the trade-off between adequate probing and network overhead. The purpose of this project is to work towards characterizing this trade-off by (1) formalizing metrics by which to evaluate probing strategies and (2) identifying example strategies and studying their effects on such metrics.
Experience with graph algorithms is helpful but not necessary for this project.
[1] "Network Performance Anomaly Detection and Localization." P. Barford, N. Duffield, A. Ron, J. Sommers. INFOCOM 2008. (http://ieeexplore.ieee.org/search/wrapper.jsp?arnumber=5062053)
* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2010-27

Mentor: Wilma Olson, olson at rutchem dot rutgers dot edu, Chemistry, Rutgers University
Project Title: Protein-induced DNA Looping
Many genetic processes are mediated by proteins that bind at separate, often widely spaced, sites on the double helix, tethering the intervening DNA into a loop [1, 2]. Examples of these processes include gene expression and its control, DNA replication, genetic rearrangements, multi-site cutting by restriction enzymes, and DNA packaging. A DNA loop stabilized by such long-range protein-protein contacts constitutes a discrete topological unit. As long as the ends of the DNA stay in place and the duplex remains unbroken, the linking number, Lk, or number of times the two strands of the double helix wrap around one another, is conserved. This constraint in Lk underlies the well known supercoiling of DNA: the stress induced by positioning the ends of the polymer in locations other than the natural (relaxed) state perturbs the overall coiling of the chain axis and/or the twisting of successive base-pairs in the intervening parts of the chain [3]. As a first step in understanding the effects of specific proteins and drugs on DNA looping, we propose to study the imposed bending and twisting of neighboring base pairs [4] in known complexes of proteins and drugs with double helical DNA stored in the Nucleic Acid Database [5]. By subjecting a DNA segment of the same chain length as that found in a given complex to the observed orientation, displacement, and superhelical stress and setting the elastic moduli to sufficiently high values, we can use existing programs, e.g., [6], to simulate the presence of a rigidly bound molecule at arbitrary positions on circular DNA molecules or to model specific systems in which DNA looping plays an important role, e.g., the lac repressor-operator assembly in EscherichiaA0coli [7]. One student could devote a summer project to the collection of geometric information. Other students could apply this information to simulations of DNA loops and folds in subsequent years. Prerequisites: Students should have an interest (but not necessarily formal training) in molecular biology, familiarity with linear algebra in order to understand the parameters used to describe DNA structure, and knowledge of basic chemistry and physics to understand the nature of the DNA molecules and the elastic treatment of the double helix.
[1] Halford, S. E., Gowers, D. M., and Sessions, R. B., ``Two are Better than One,'' Nature Struct. Biol., 7, (2000), 705-707.
[2] Schleif, R., ``DNA Looping,'' Annu. Rev. Biochem., 61, (1992), 199-223.
[3] Bauer, W. R. ``Structure and Reactions of Closed Duplex DNA,'' Annu. Rev. Biophys. Bioeng., 7, (1978), 287-313.
[4] Olson, W. K., Gorin, A. A., Lu, X.-J., Hock, L. M., and Zhurkin, V. B., ``DNA Sequence-dependent Deformability Deduced from Protein-DNA Crystal Complexes,'' Proc. Natl. Acad. Sci. USA, 95, (1998), 11163-11168.
[5] Berman, H. M., Olson, W. K., Beveridge, D. L., Westbrook, J., Gelbin, A., Demeny, T., Hsieh, S.-H., Srinivasan, A. R., and Schneider, B., ``The Nucleic Acid Database: A Comprehensive Relational Database of Three-dimensional Structures of Nucleic Acids,'' Biophys. J., 63, (1992), 751-759.
[6] Westcott, T. P., Tobias, I., and Olson, W. K., ``Modeling Self-contact Forces in the Elastic Theory of DNA Supercoiling,'' J. Chem. Phys., 107, (1997), 3967-3980.
[7] Muller-Hill, B., The Lac Operon, Walter de Gruyter, Berlin, 1996.
* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2010-28

Mentor: Howard Karloff, howard at research dot att dot com, ATT Labs
Project Title: Exploring Topics in Approximation, Online and Randomized Algorithms
Dr. Karloff is a leading researcher with ATT Labs who works in theoretical computer science with specific interests in approximation, online and randomized algorithms. Together with DIMACS/Rutgers faculty member, Dr. Karloff will work with a qualified REU student to identify a new problem from his area of research that will be of interest to both the student and the mentors. This project would appeal to those students who have a creative approach to problem-solving and want to develop their critical thinking skills in research.
* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2010-29

Mentor: Gyan Bhanot, gyanbhanot at gmail.com, Rutgers University
Project Title: Identifying patterns of migration and selection in HapMaP III data
The HapMaP project has collected blood samples from approximately 150 subjects in 11 populationw worldwide, analyzed them for polymorphisms and made the genotype data publicly available. We are developing methods to look for patterns of migration (out of Africa events and beyond) and selection within and between these populations. The REU student will help us in these studies.
* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2010-30

Mentor: Gyan Bhanot, gyanbhanot at gmail.com, Rutgers University
Project Title: Understanding the role of germline selection on patterns of linkage in HapMaP III data
We will develop simple models of germline selection to explain the linkage disequilibrium (correlations) seen in the HapMaP III data. This will involve making simple models and running simulations of these in Matlab.
* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.
Please revisit this page frequently, additional projects will be posted
Index of REU program
DIMACS Home Page
Contacting the Center
Document last modified on May 6, 2010.