2012 DIMACS REU Proposed Project Descriptions

Please revisit this page frequently, additional projects will be posted through January.

Project #: DC2012-01

Mentor: James Abello, abello at dimacs dot rutgers dot edu, DIMACS

Project Title: Anomaly Detection

This project addresses the problem of finding persistent patterns in evolving networks. A central question is the characterization of patterns that can be used as the basis to detect anomalous activities in time-evolving networks.

Requirements: Background in Statistics and SQL is desired. Participants are expected to develop scripts to statistically analyze a variety of data sets.

References:
http://www.cs.cmu.edu/~christos/TALKS/SIGMOD-07-tutorial/ J. Abello, T.Eliassi Rad, N. Devanur, "Detecting Novel Discrepancies in Communication Networks", 10th IEEE International Conference on Data Mining, Australia, Dec 2010.

* You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DC2012-02

Mentor: James Abello, abello at dimacs dot rutgers dot edu, DIMACS

Project Title: DIMACS Graph Mining

In this project we will build a variety of multi-graphs that can be extracted from this data set. We will then use our current set of graph analysis tools to parse, navigate, visualize and synthesize the findings. One central challenge is to devise methods that are privacy preserving.

Requirements: Some Knowledge of SQL, fundamental graph algorithms and C/C++ programming is a plus. However, there are several analysis tasks that can be performed using our plug in graph visualization system and that require only minimal amount of programming.

References:
[AvHK06] J. Abello, F. van Ham, and N. Krishnan, "Ask-GraphView-: A Large Scale Graph Visualization System", IEEE Transactions in Visualization and Computer Graphics, 12(5): 669-676 (2006).
J. Abello, F. V. Ham, N. Krishnan, "AskGraphView: A Large Graph Visualization System", IEEE Transactions in Visualization and Computer Graphics, 12(5): 669-676 (2006)

* You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DC2012-03

Mentor: James Abello, abello at dimacs dot rutgers dot edu, DIMACS

Project Title: Name That Cluster

Given a user query, search engines generally return a very sizeable collection of possible answers. Clustering has been proposed as a tool to partition the possible answer set into more manageable subsets of related results. There is no current agreement on the preferred mode of presentation of these clusters. Currently, most search engines display the set of results on almost a pure textual form. However, relatively recently we have witnessed some timid attempts to use some graphical representations. This study is a first step to elucidate when and why text appears to outperform graphics for certain fundamental clustering related tasks. To this end we designed three interfaces to display flat clusters of user queries. The interfaces are enhanced with mechanisms by which users provide feedback about the relevance of a cluster for a pre-specified input query. Subsequently, users are asked to provide a name for a given cluster that best describes the cluster contents. In this project, we will analyze the results obtained from a web user study that started on May 2008.

Requirements: Basic Statistics, reading clustering related papers and preparing summary of findings.

Reference:
[ASGT07] J. Abello, J, Schulz, H, Gaudin, B, and Tominski, C (2007). Name That Cluster - Text vs. Graphics, IEEE InfoVis Conference, Sacramento, November 2007.

* You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DC2012-04

Mentor: Wilma Olson, olson at rutchem dot rutgers dot edu, Chemistry, Rutgers University

Project Title: Protein-Induced DNA Looping

Many genetic processes are mediated by proteins that bind at separate, often widely spaced, sites on the double helix, tethering the intervening DNA into a loop [1, 2]. Examples of these processes include gene expression and its control, DNA replication, genetic rearrangements, multi-site cutting by restriction enzymes, and DNA packaging. A DNA loop stabilized by such long-range protein-protein contacts constitutes a discrete topological unit. As long as the ends of the DNA stay in place and the duplex remains unbroken, the linking number, Lk, or number of times the two strands of the double helix wrap around one another, is conserved. This constraint in Lk underlies the well known supercoiling of DNA: the stress induced by positioning the ends of the polymer in locations other than the natural (relaxed) state perturbs the overall coiling of the chain axis and/or the twisting of successive base-pairs in the intervening parts of the chain [3]. As a first step in understanding the effects of specific proteins and drugs on DNA looping, we propose to study the imposed bending and twisting of neighboring base pairs [4] in known complexes of proteins and drugs with double helical DNA stored in the Nucleic Acid Database [5]. By subjecting a DNA segment of the same chain length as that found in a given complex to the observed orientation, displacement, and superhelical stress and setting the elastic moduli to sufficiently high values, we can use existing programs, e.g., [6], to simulate the presence of a rigidly bound molecule at arbitrary positions on circular DNA molecules or to model specific systems in which DNA looping plays an important role, e.g., the lac repressor-operator assembly in EscherichiaA0coli [7]. One student could devote a summer project to the collection of geometric information. Other students could apply this information to simulations of DNA loops and folds in subsequent years. Prerequisites: Students should have an interest (but not necessarily formal training) in molecular biology, familiarity with linear algebra in order to understand the parameters used to describe DNA structure, and knowledge of basic chemistry and physics to understand the nature of the DNA molecules and the elastic treatment of the double helix.

[1] Halford, S. E., Gowers, D. M., and Sessions, R. B., ``Two are Better than One,'' Nature Struct. Biol., 7, (2000), 705-707.
[2] Schleif, R., ``DNA Looping,'' Annu. Rev. Biochem., 61, (1992), 199-223.
[3] Bauer, W. R. ``Structure and Reactions of Closed Duplex DNA,'' Annu. Rev. Biophys. Bioeng., 7, (1978), 287-313.
[4] Olson, W. K., Gorin, A. A., Lu, X.-J., Hock, L. M., and Zhurkin, V. B., ``DNA Sequence-dependent Deformability Deduced from Protein-DNA Crystal Complexes,'' Proc. Natl. Acad. Sci. USA, 95, (1998), 11163-11168.
[5] Berman, H. M., Olson, W. K., Beveridge, D. L., Westbrook, J., Gelbin, A., Demeny, T., Hsieh, S.-H., Srinivasan, A. R., and Schneider, B., ``The Nucleic Acid Database: A Comprehensive Relational Database of Three-dimensional Structures of Nucleic Acids,'' Biophys. J., 63, (1992), 751-759.
[6] Westcott, T. P., Tobias, I., and Olson, W. K., ``Modeling Self-contact Forces in the Elastic Theory of DNA Supercoiling,'' J. Chem. Phys., 107, (1997), 3967-3980.
[7] Muller-Hill, B., The Lac Operon, Walter de Gruyter, Berlin, 1996.

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DC2012-05

Mentor: James Abello, abello at dimacs dot rutgers dot edu, DIMACS and Yifan Hu, yifanhu at research dot att dot com

Project Title: Visualization of Time Varying Graphs

This project addresses the problem of visualizing graphs that are being collected in a streaming fashion. A central question is the identification of graph sub-structures and statistics on them that can be used as the basis to depict the graph evolution through time. Another key consideration is to come up with a visualization algorithm that gives stable layout over time, thus preserving the user's mental map.

Requirements: Background in Graph Algorithms, Statistics and OpenGL is desired. Participants are expected to implement "new" algorithms using a variety of graph drawing libraries like graphviz.

References:
J. Abello, F. V. Ham, N. Krishnan, "AskGraphView: A Large Graph Visualization System", IEEE Transactions in Visualization and Computer Graphics, 12(5): 669-676 (2006)

Y. Hu, S. Kobourov and S. Veeramoni, Embedding, Clustering and Coloring for Dynamic Maps, proceedings of IEEE Pacific Visualization Symposium, 2012.
(http://www2.research.att.com/~yifanhu/PUB/cluster_matching.pdf)

Project #: DC2012-06

Mentor: Yifan Hu, yifanhu at research dot att dot com, AT&T Labs

Project Title: Using Clustering Relation to Influence Graph Layout

Given an abstract graph of vertices and edges, how should it be drawing nicely on a piece of paper? There are many algorithms to layout a graph aesthetically. Often, a good drawing conveys clustering relationship between items [1]. And there are reasons to believe that graph layout is a process of graph clustering [2]. However, in practice, it is often the case that even in a good layout, vertices in the same cluster may not be contiguous in space. For example, "countries" in the MusicMap [3] are often fragmented. This could be a reflection of the fact that some vertices may probabilistically belong to multiple clusters. E.g., a music group may be a hybrid of two genres, thus defies hard classification. But for practical visualization, could we use the clustering to influence the layout, devise an algorithm to "pull" the outliers into the main areas, thus making the "countries" contiguous? The project will help the student learn algorithms for graph layout, clustering and visualization.

Prerequisites: Good knowledge in programming (Unix, C) and in linear algebra would be highly desirable.

  [1] http://www2.research.att.com/~yifanhu/GALLERY/GRAPHS/index.html

  [2] Andreas Noack, Modularity clustering is force-directed layout, Phys. Rev. E 79, (2009) 

  [3] http://www2.research.att.com/~yifanhu/MusicMap/index.html

Project #: DC2012-07

Mentor: Yifan Hu, yifanhu at research dot att dot com, AT&T Labs

Project Title: Visualizing Twitter Trends

Twitter offers a window into the minds of millions. What are people talking about? Are there any emerging trends? In this project we will collect and analyse twitter streams, cluster the tweets as well as graphical relations between people, and summarise and visualize the results to get a big picture of what's being talked about. In the process the student will learn topics such as graph algorithms, machine learning (text model) and visualization. The eventual goal of the project is a website that allow users to enter a search term and get a dynamic, clustered and graphical view of tweets related to the term.

Prerequisites: Good knowledge in programming (Unix, C and web programming) and in linear algebra would be highly desirable.

Project #: DC-2012-08

Mentor: Bahman Kalantari, kalantar at cs.rutgers.edu, Computer Science

Project Title: Exploring Polynomiography and Its Algorithmic and Mathematical Applications

Dr. Bahman Kalantari is a leading researcher with the Department of Computer Science who was instrumental in developing the field of polynomiography. Polynomiography is a mathematically-inspired computer medium based on algorithmic visualizations of one of the most basic and fundamental tasks in science and mathematics: solving a polynomial equation. Polynomiography serves as a powerful medium for creativity, discovery, and learning, with numerous applications in education, math, science, art and design. Dr. Kalantari will work with a qualified REU student to identify a new problem from polynomiography that will be of interest to both the student and the mentor. In particular, some specific problems will be described from computational geometry, numerical analysis, discrete math, education, algorithmic mathematical art, fine art, and games. http://www.cs.rutgers.edu/~kalantar/

Reference:

B. Kalantari, A new medium for visual art: Polynomiography, Computer Graphics Quarterly, 38, 2004, 22-24.

B. Kalantari, Polynomiography and Applications in Art, Education, and Science, Computers & Graphic 28:3, (2004) pp. 417-430.

B. Kalantari, Polynomiography: From the Fundamental Theorem of Algebra to Art, LEONARDO, Vol. 38, No. 3, 233-238, 2005.

B. Kalantari, Polynomial Root-Finding and Polynomiography, World Scientific, New Jersey, 2008.

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DC-2012-09

Mentor: Alexander Schleip, schliep at cs.rutgers.edu, BioMaps Institute for Quantitative Biology and Computer Science

Project Title: Algorithm animation for bioinformatics algorithms

Gato (http://gato.sf.net), an open source software system written in Python, animates graph algorithms to show their dynamic behavior. A wide range of bioinformatics algorithms - such as aligning two sequences to assess their similarity, assembling complete genomes from fragments produced by sequencing, answering questions about the origin of species by constructing phylogenetic trees - can either be phrased as graph algorithms or can operate on graphs. Developing animations for selected algorithms will introduce participants to algorithmic aspects of bioinformatics and provide deep insights into the workings of selected algorithms.

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DC2012-10

Mentor: William Pottenger, billp at dimacs.rutgers.edu, DIMACS and Computer Science

Project Title: Higher Order Learning

In our work we are developing a framework termed Higher Order Learning that builds models by leveraging associations in the form of latent relationships in an entity-relation graph. A general multi-attribute entity-relation graph has nodes that are people, places and things, etc. Higher Order Learning violates the underlying assumption made in traditional machine learning algorithms that records are independent and identically distributed (Taskar et al., 2002). Although this assumption simplifies the underlying mathematics of statistical models and of the corresponding parameter estimation procedures, it actually does not hold for many real world applications (Getoor & Diehl, 2005). Machine learning methods based on Higher Order Learning leverage relations between objects and features in an entity-relation graph and in doing so, operate on a much richer data representation compared to the traditional feature vector form.

In this research, Higher Order Learning techniques have been investigated and applied to a number of interesting datasets including Border Gateway Protocol (cybersecurity), E-Commerce (web mining), and Nuclear Detection (homeland security). The approach has been shown to outperform the popular Support Vector Machine algorithm on benchmark textual data sets as well as real-world streaming data (Ganiz, Lytkin & Pottenger, 2009). REU summer interns will have an unparalleled opportunity to explore Higher Order Learning algorithms and applications in the context of law enforcement and other domains.

Requirements: Applicants should have a solid Java programming background. Familiarity with XML is a plus, as is experience in text processing with Perl or other scripting languages.

You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DC2012-11

Mentor: William Pottenger, billp at dimacs.rutgers.edu, DIMACS and Computer Science

Project Title: Entity Resolution

Emergencies invariably require crisis managers to field numerous calls for service such as 911, radio reports by police and fire teams, medical reporting, infrastructure support teams, etc. Technologies for Entity Resolution can be applied to handle the confusion by identifying geospatial, temporal, and content similarities in the messages, enabling first responders to resolve and identify the location and scope of multiple events. Entity Resolution also plays an important role in counter-terrorism, for example, in determining when two terrorist incidents are the same. This project will involve the research and development of Entity Resolution techniques to group incidents from the Internet Crime Complaint Center database (www.ic3.gov). REU summer interns will have an unparalleled opportunity to explore Entity Resolution algorithms and applications in the context of law enforcement and other domains.

Requirements: Applicants should have a solid Java programming background. Familiarity with XML is a plus, as is experience in text processing with Perl or other scripting languages.

You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DC2012-12

Mentor: Boram Park, DIMACS, borampark22 at gmail dot com

Project Title: Developing and Solving Problems in Graph Theory, Combinatorics, and Cooperative Game Theory

Dr. Park is a leading researcher with the Department of Mathematics at Seoul National University and DIMACS visitor who works in the areas of graph theory, combinatorics, and cooperative game theory. Dr. Park will work with a qualified REU student to identify a new problem on graph theory related to her research that will be of interest to both the student and the mentor. This project will appeal to those students who are interested in graph theory and combinatorics.

Project #: DC2012-13

Mentor: Nina Fefferman, DIMACS, fefferman at aseop dot rutgers dot edu

Project Title: Simulation methods for network based evolutionary sociobiol

This project will use network based metrics in self-organizing systems to explore concepts in the evolution of social systems. Topics could range from game theory and kin selection, to network-based epidemiology. Some previous programming experience helpful, but not necessary.

* You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DC2012-14

Mentor: Nina Fefferman, DIMACS, fefferman at aseop dot rutgers dot edu

Project Title: Modeling consensus building in animal groups

Modeling consensus building in animal groups - this project will involve analysis of voting theory models in structured groups to analyze the success of individual-group level coordination games (game theory) for use in animal populations. Biological systems will range from quorum sensing in bacteria to collaborative foraging in animals. A background in game theory will be helpful, but not necessary.

* You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DC2012-15

Mentor: Nina Fefferman, DIMACS, fefferman at aseop dot rutgers dot edu

Project Title: Epidemiological modeling of infectious diseases

This project will use ordinary differential equations to analyze a series of questions about population-level outbreaks of infectious diseases. Possible particular questions range from the impact of mosquito feeding behavior in vector-borne outbreaks to the impact of AIDS on the emergence of antibiotic resistence. Previous experience in either Matlab, Mathematica, or R is necessary.

* You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DC2012-16

Mentor: Nina Fefferman, DIMACS, fefferman at aseop dot rutgers dot edu

Project Title: Game theory of mate selection

This project will use both agent based simulation and game theory to determine threshold effects in evolutionary fitness for individuals hoping to attract the best possible mate while hoping to avoid attracting competitors for those mates. Some previous experience in programming will be helpful, but not necessary.

* You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DC2012-17

Mentor: Nina Fefferman, DIMACS, fefferman at aseop dot rutgers dot edu

Project Title: Modeling conservation of endangered species using linear algebra

Traditional matrix-based methods in conservation biology explore the probabilities of survival of populations in isolation. This project will expand these previous methods to include a broader ecological context (i.e. competition, predation, etc.). Some previous experience with Matlab would be helpful, but is not necessary.

* You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DC2012-18

Mentor: Nina Fefferman, DIMACS, fefferman at aseop dot rutgers dot edu

Project Title: Comparing routes of transmission in multi-vector disease models

This project will involve analyzing a system of differential equations to determine the importance of vector ecology on disease spread in diseases that can be transmitted by multiple vectors (i.e. mostquitoes and fleas).

The goals of the project will be to isolate conditions under which and times in disease cycles when feedback loops may interrupted most effectively. Previous experience with Mathematica, Matlab, or R is necessary.

* You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DC2012-19

Mentor: James Abello, abello at dimacs dot rutgers dot edu, DIMACS, and Darakhshan Mir, mir at cs dot rutgers dot edu

Project Title: Differential Graph Privacy

The notion of differential privacy has become a popular standard for database privacy. Differential privacy tries to formalize the idea that privacy is preserved if the risk of inferring anything sensitive about an individual does not change significantly if he/she participates in a statistical database.

In this project we will examine the possibility of generating synthetic graphs "similar" to an original graph that satisfies this definition of privacy, to enable sharing of potentially sensitive graphs for analysis and research. We will work on examining private estimation of various parameters of several Stochastic graph models (such as, the Kronecker Graph model and the Exponential Random Graph Model), and compare the experimental performance of synthetic graphs across models. A related aim is to study analytically and/or experimentally the growth of a quantity called the "smooth sensitivity" of some graph statistics like the number of triangles in a graph.

Requirements: Comfortable with programming (preferably in C/C++), and a basic course in Probability/Statistics and Discrete Math.

References

[1] D. Mir and R. Wright, "A Differentially Private Estimator for the Stochastic Kronecker Graph Model", to appear in Proceedings of Workshop on Privacy and Anonymity in Information Systems, March 2012.

[2] C. Dwork, F. Mcsherry, K. Nissim, and A. Smith,''Calibrating Noise to Sensitivity in Private Data Analysis'', In Proceedings of the 3rd Theory of Cryptography Conference, pages 265–284, 2006.

[3] C. Dwork, ''Differential privacy'', in Proceedings of the 33rd International Colloquium on

Automata, Languages and Programming (2), pages 1–12, 2006.

[4]J. Leskovec and C. Faloutsos,'' Scalable Modeling of Real Graphs using Kronecker multiplication'', in Proceedings of the 24th International Conference on Machine Learning, pages 497–504, 2007.

[5] K. Nissim, S. Raskhodnikova, and A. Smith. ''Smooth sensitivity and sampling in private data analysis'', in Proceedings of the 33rd Annual ACM Symposium on Theory of Computing, pages 75–84, 2007. (Full version here: http://www.cse.psu.edu/%7Easmith/pubs/NRS07/)

* You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DC2012-20

Mentor: Robert Vanderbei, Princeton, rvdb at Princeton dot edu

Project Title: Climate Change Analysis

I would be happy to advise a summer REU student who is interested in refining the climate change analyses described in a "weather module" that is being developed for the Mathematics of Planet Earth 2013 project. In particular, it would be interesting to make a simplified version of the world climate change map in which each location is analyzed not by solving a daily model with sinusoidal seasonal fluctuations but rather a simpler model that uses annual averages for the input to a linear regression model.

* You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DC2012-21

Mentor: Kevin Chen, kcchen at biology dot rutgers dot edu

Project Title: Statistical Algorithms in Population Genetics

Our group is developing statistical algorithms for several important problems in population genetics. One important problem is local ancestry detection in an admixed genome. For example, African Americans have some European ancestry and some West African ancestry so their genomes are a mosaic of segments of European and West African ancestry and it is important for many applications to figure out the local ancestry of each segment. The specific summer project is to conduct experiments with a number of different algorithms for detecting local ancestry using a set of population genetics data that our group has already worked with extensively. Priority will be given to students who are available in subsequent semesters to continue the project and help to extend the existing methods and develop a novel, improved method for this problem. The main requirements are a solid foundation in statistics and computer science. Knowledge of a scripting language such as Python and an interest in biology would also be helpful.

* You must be a enrolled at a U.S. university to be eligible for this project.

REU Home Page
DIMACS Contact Information
Page last modified on March 5, 2012.

Proposed projects for the 2012 DIMACS and DIMACS/DIMATIA REU Programs