2007 Proposed REU Project Descriptions

Proposed projects for the 2007 DIMACS and DIMACS/DIMATIA REU Programs

Project #: DDD2007-01

Mentor: Wilma Olson, olson at rutchem.rutgers.edu, Chemistry

Project title: Protein-induced DNA Looping

Many genetic processes are mediated by proteins that bind at separate, often widely spaced, sites on the double helix, tethering the intervening DNA into a loop [1, 2]. Examples of these processes include gene expression and its control, DNA replication, genetic rearrangements, multi-site cutting by restriction enzymes, and DNA packaging. A DNA loop stabilized by such long-range protein-protein contacts constitutes a discrete topological unit. As long as the ends of the DNA stay in place and the duplex remains unbroken, the linking number, Lk, or number of times the two strands of the double helix wrap around one another, is conserved. This constraint in Lk underlies the well known supercoiling of DNA: the stress induced by positioning the ends of the polymer in locations other than the natural (relaxed) state perturbs the overall coiling of the chain axis and/or the twisting of successive base-pairs in the intervening parts of the chain [3]. As a first step in understanding the effects of specific proteins and drugs on DNA looping, we propose to study the imposed bending and twisting of neighboring base pairs [4] in known complexes of proteins and drugs with double helical DNA stored in the Nucleic Acid Database [5]. By subjecting a DNA segment of the same chain length as that found in a given complex to the observed orientation, displacement, and superhelical stress and setting the elastic moduli to sufficiently high values, we can use existing programs, e.g., [6], to simulate the presence of a rigidly bound molecule at arbitrary positions on circular DNA molecules or to model specific systems in which DNA looping plays an important role, e.g., the lac repressor-operator assembly in EscherichiaA0coli [7]. One student could devote a summer project to the collection of geometric information. Other students could apply this information to simulations of DNA loops and folds in subsequent years. Prerequisites: Students should have an interest (but not necessarily formal training) in molecular biology, familiarity with linear algebra in order to understand the parameters used to describe DNA structure, and knowledge of basic chemistry and physics to understand the nature of the DNA molecules and the elastic treatment of the double helix.

[1] Halford, S. E., Gowers, D. M., and Sessions, R. B., ``Two are Better than One,'' Nature Struct. Biol., 7, (2000), 705-707.

[2] Schleif, R., ``DNA Looping,'' Annu. Rev. Biochem., 61, (1992), 199-223.

[3] Bauer, W. R. ``Structure and Reactions of Closed Duplex DNA,'' Annu. Rev. Biophys. Bioeng., 7, (1978), 287-313.

[4] Olson, W. K., Gorin, A. A., Lu, X.-J., Hock, L. M., and Zhurkin, V. B., ``DNA Sequence-dependent Deformability Deduced from Protein-DNA Crystal Complexes,'' Proc. Natl. Acad. Sci. USA, 95, (1998), 11163-11168.

[5] Berman, H. M., Olson, W. K., Beveridge, D. L., Westbrook, J., Gelbin, A., Demeny, T., Hsieh, S.-H., Srinivasan, A. R., and Schneider, B., ``The Nucleic Acid Database: A Comprehensive Relational Database of Three-dimensional Structures of Nucleic Acids,'' Biophys. J., 63, (1992), 751-759.

[6] Westcott, T. P., Tobias, I., and Olson, W. K., ``Modeling Self-contact Forces in the Elastic Theory of DNA Supercoiling,'' J. Chem. Phys., 107, (1997), 3967-3980.

[7] Muller-Hill, B., The Lac Operon, Walter de Gruyter, Berlin, 1996.
* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2007-02

Mentor: Michael Littman, mlittman at cs.rutgers.edu , Computer Science

Project title: Solve Pico

Pico 2 is a simple card game based on bluffing. See http://doris-frank.de/PlayPico_de.html . It's *just* small enough that it could be solved optimally by computer, although I'm not aware if anyone has done so. Solving it would require getting familiar with the idea of linear programming and Markov games, and hacking up the rules and some linear programming package. Then, we'd compare the resulting strategy with the online player, and (if we can find one!) a person who can beat the online player.
* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2007-03

Mentor: Michael Littman, mlittman at cs.rutgers.edu , Computer Science

Project title: Visualizations for CS

I'd like to work with someone to develop an interactive Python program to simulate some simple logical circuits that we'll use to build up to an entire (simulated) computer. This visualization of a computer could be used, for example, by instructors in CS courses for non-majors.
* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2007-04

Mentor: Michael Littman, mlittman at cs.rutgers.edu , Computer Science

Project title: POMDP history discounting

Agents acting in the real world are confronted with the problem of making good decisions with limited knowledge. Partially observable Markov decision processes (POMDPs) model decision problems in which an agent tries to maximize its reward in the face of limited sensor feedback. Two popular classes of approaches are (1) estimating the hidden-state dynamics and solving the resulting model and (2) ignoring the hidden-state dynamics and working directly with observable quantities. Past research (Loch and Singh 1998) has shown that learning to work with the observable quantities can be made much more effective with the addition of a form of short-term memory in the learning process called an eligibility trace. In this project, we will compare the eligibility-trace approach to a novel approach based on learning an instance-based model of history directly from experience. We plan to work with simple demonstration problems from the POMDP.org repository and implement the eligibility-trace approach and novel Markov-decision-process-based history approach to allow for a direct comparison between them.
* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2007-05

Mentor: Michael Littman, mlittman at cs.rutgers.edu , Computer Science

Project title: Solving Flats

A great challenge in computer science is connecting formal representations (syntax) to real-world meanings (semantics). Language games are an entertaining venue for studying these issues. I would like to work with a student to develop a program for solving various kinds of "flats". These puzzles blend some very strong logical constraints with a little bit of semantics. Clues come in the form of short poems with two or more blanks. The puzzle type tells you what logical relation links the words that are needed to fill in the blanks. Here's an example from http://www.puzzlers.org/guide/index.php?expand=identity that is a "heteronym (8, 4 4)" (this type means that the answer is the same sequence of letters twice, first as a single 8-letter word, then as two 4-letter words):

You say she pulled your ONE out by
The roots? That TWO, I'll bet. If I
Were you, and had your wife-as craven
As this might sound-I'd stay clean shaven.

ONE = mustache
TWO = must ache

Generating a list of 8-letter words that are associated with "shaven", "roots", etc. isn't too hard. Latent semantic analysis ought to be a good starting point. Then, noticing that one of these words works as two four-letter words would also be pretty easy.

There's lots of different types of flats, some more interesting logically than others. Browsing http://www.puzzlers.org/guide/?expand=browse for a few minutes, gives a sense of the scope. There is a bright and creative community constructing and solving such puzzles, so there is a built-in audience for this kind of work.
* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2007-06

Mentor: Vadim Lozin, lozin at rutcor.rutgers.edu, Operations Research

Project title: Problems in Algorithmic Graph Theory

This project focuses on the study of several fundamental problems of algorithmic graph theory that are of both practical importance and theoretical significance. In a graph, a stable set is a subset of vertices, no two of which are adjacent. The maximum stable set problem is that of finding a stable set of maximum size. Independent domination is the problem of finding a stable set of minimum size so that every vertex not in the set is adjacent to a vertex in it. Vertex colorability can be formulated as the problem of partitioning the set of vertices into a minimum number of subsets, each of which is stable. In general, all three problems are computationally intractable. On the other hand, they admit efficient solutions for graphs in some particular classes, such as bipartite graphs (the stable set problem), split graphs (independent domination) or cubic graphs (vertex colorability). Finding new classes of graphs for which the problems are efficiently solvable is a challenging research direction. It involves both investigation of the structure of graphs in specific classes and development of various algorithmic graph techniques such as graph transformations, graph decompositions, etc..

Prerequisite: The student should have a course in graph theory.

V.V. Lozin, Conic reduction of graphs for the stable set problem, Discrete Mathematics, 222 (2000) 199-211.

R. Boliac and V.V. Lozin, Independent domination in finitely defined classes of graphs, Theoretical Computer Science, 301 (2003) 271-284.

M.U. Gerber and V.V. Lozin, Robust algorithms for the stable set problem, Graphs and Combinatorics, 19 (2003) 347-356.

A. Hertz and V.V. Lozin, The maximum independent set problem and augmenting graphs, in "Graph Theory and Combinatorial Optimization" (Eds D. Avis, A. Hertz, O. Marcotte), Springer, (2005) 69-99.

V.V. Lozin and M. Milanic, A Polynomial Algorithm to Find an Independent Set of Maximum Weight in a Fork-free Graph, Proceedings of the ACM-SIAM Symposium on Discrete Algorithms SODA'06, (2006) 26-30.

V.V. Lozin and R. Mosca, Polar Graphs and Maximal Independent Sets, Discrete Mathematics, to appear.
* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2007-07

Mentor: Paul Kantor, paul.kantor at rutgers.edu, Communication, Information and Library Studies

Project title: Automated Entity Resolution

A challenging problem at the intersection of computation, mathematics and statistics arises when we ask whether two or more real entities are referenced by the same "label". For example, there are hundred of bio-science papers written by "T. Suzuki". How many different real persons does this one label represent? Stephen King writes under several different names: can he be discovered despite this disguise? Analysis is based on finding patterns in the relations between this author, other authors, physical institutions and locations, and abstract concepts such as the "topics of the papers". Students working on this project will have an opportunity to participate in one or more DIMACS efforts related to security and public safety.
* You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DDD2007-08

Mentor: Wanpracha (Art) Chaovalitwongse, wchaoval at rci.rutgers.edu, Department of Industrial Engineering

Title: Classification of Epileptic Brain Activity

Epilepsy, which is the second most common brain disorder after stroke, currently afflicts at least 2 million Americans. It is a chronic condition of diverse etiologies with the common symptom of spontaneous recurrent seizures. The goals of this project are: 1) to explore and develop quantitative techniques to investigate the spatio-temporal properties of the epileptic transition based on brain electrical activity (Electroencephalogram - EEG); 2) to develop novel pattern recognition, classification and clustering algorithms for epileptic brain activity classification based upon the brain electrical connectivity underlying the epileptic processes.
* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2007-09

Mentor: Wanpracha (Art) Chaovalitwongse, wchaoval at rci.rutgers.edu, Department of Industrial Engineering

Title: Mathematical Programming Models for Data Mining Tasks

Data mining seeks to discover hidden patterns in a data set without explicit knowledge about the data. Most of data mining tasks attempt to measure similarity or dissimilarity coefficients among all samples of the data for classification, clustering, and feature selection operations. The idea of the project is to investigate mathematical models that represent these data mining tasks. The outcome of this project can be applied to many real-world problems including quantitative finance, computational biology, manufacturing processes, supply chain, and operations management.
* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2007-10

Mentor: Wanpracha (Art) Chaovalitwongse, wchaoval at rci.rutgers.edu, Department of Industrial Engineering

Title: Sibling Relationship Reconstruction

This project is aimed at developing novel solution approaches for the problem of reconstructing sibling relationships based on single generation genetic data without parental information. This problem is very difficult but extremely important because knowledge of familial relationships is needed for many biological applications including the estimation of heritabilities of quantitative characters, studies of mating systems and fitness, and managing populations of endangered species. Typically, biologists have used parental data to establish sibling groups indirectly through parentage assignments. Reconstructing sibling relationships without parental data is a much more difficult problem, but one that faces many investigators who sample and genotype cohorts of offspring rather than parent/offspring groups. One of the most promising ideas is to employ simple genetic constraints on the full-sibling groups, imposed by the Mendelian inheritance rules, to data from codominant DNA markers. To incorporate this idea, novel statistical analyses and optimization techniques will be investigated in this project.
* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2007-11

Mentor: John Kolassa, kolassa at stat.rutgers.edu , Statistics

Statistical tests about the equality of two populations are often performed in two steps. An intial step determines whether the hypothesis of equal variances is supported, and the second step actually performs the test on the means. The form of the test used in the second step is determined by the result of the first step. This two-stage testing procedure is often criticized for changing the size of the resulting overall procedure. We will investigate the extent of this impact on size, using Levine's test for the first stage of testing.
* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2007-12

Mentor: Nina Fefferman, feferman_at_Math.Princeton.edu , Tufts University, DIMACS

Project title: Individual-based Utility Theory, Health Care Seeking Behavior, and Disease Outbreaks Part 1

There exists a fundamental game theoretic dichotomy between the individual and the public good in matters of public health. This dichotomy has been rigorously investigated in certain areas (e.g. Vaccination compliance, quarantine, public health directives in case of pandemic outbreak, etc.), but very little attention has been paid to the impact of this conflict on "normal" diseases: seasonal influenza, the common cold, giardiasis, etc. There are two projects with interactive components: (1) the game theoretic analysis of costs and benefits of seeking health care under different scenarios of disease risk, and (2) characterizing the epidemiological impact of the individual care decisions on predictions of disease outbreaks. Together, the hope is that these projects will provide a theoretical organizational strategy that could be recommended to businesses, local governments and school systems to minimize outbreak severity without too greatly compromising efficiency. In this project you will work on component 1.
* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2007-13

Mentor: Nina Fefferman, feferman_at_Math.Princeton.edu , Tufts University, DIMACS

Project title: Individual-based Utility Theory, Health Care Seeking Behavior, and Disease Outbreaks Part 2

There exists a fundamental game theoretic dichotomy between the individual and the public good in matters of public health. This dichotomy has been rigorously investigated in certain areas (e.g. Vaccination compliance, quarantine, public health directives in case of pandemic outbreak, etc.), but very little attention has been paid to the impact of this conflict on "normal" diseases: seasonal influenza, the common cold, giardiasis, etc. There are two projects with interactive components: (1) the game theoretic analysis of costs and benefits of seeking health care under different scenarios of disease risk, and (2) characterizing the epidemiological impact of the individual care decisions on predictions of disease outbreaks. Together, the hope is that these projects will provide a theoretical organizational strategy that could be recommended to businesses, local governments and school systems to minimize outbreak severity without too greatly compromising efficiency. In this project you will work on component 2.
* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2007-14

Mentor: Stanley Dunn, smd at occlusal.rutgers.edu, Department of Biomedical Engineering

Title: Eulerian Graph Representation for siRNA Sequence Structure Applications

RNA interference (RNAi) is a process by which gene expression is suppressed. Since RNAi is a powerful tool for determining gene function and preventing disease, much work has been done to understand the important role that short interfering RNA (siRNA) plays in RNAi. Characteristic sequence, thermodynamic, and structural properties of functional siRNAs have been identified by researchers. We have considered the Eulerian Graph Model of siRNA, introduced by Pancoska et al, in hopes that it would provide valuable insight into the level of functionality of a potential siRNA sequence. Using this graph structure, we have identified siRNA properties, captured in the graphs, that discriminate functional and non-functional siRNAs. The aim of this project is to consider the problems of existence and uniqueness of the Eulerian graphs for functional siRNA sequences.
* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2007-15

Mentors: Tami Carpenter, tcar at dimacs.rutgers.edu, DIMACS and Hanan Luss, hluss at telcordia.com, Telcordia Technologies

Title: Sensor Location

We will consider the problem of locating sensors to detect threats in a prescribed geographical area. Given sensor properties, we will develop algorithms for deciding where the sensors should be placed to maximize the reliability of object-detection, subject to cost constraints. The specific problem we will consider is the equitable allocation of sensors within the prescribed region, where an equitable solution has the property that no location can receive improved coverage without degrading the coverage of another whose coverage is already the same or worse. Algorithms for equitable resource allocation require the repeated solution of related minimax problems. Although lexicographic minimax solutions are attractive, computational challenges may lead us to define alternate objectives as reasonable approximations to equitable solutions.

Prerequisites: The student should demonstrate proficiency in computer programming, have taken a course in operations research or linear programming, and be familiar with basic concepts in graph theory.

* You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DDD2007-16

Academic Co-mentor: S. Muthukrishnan, muthu at cs.rutgers.edu, Computer Science, Industrial Co-mentor: Graham Cormode, cormode at aura.cs.bell-labs.com, AT&T Labs

Title: Algorithms for Analyzing Data Streams with Uncertainty

Data streams consist of long sequences of simple information (such as a series of stock trades or packets in a network). Many algorithms are known for computing functions of data streams efficiently, using an amount of working space that is much smaller than the size of the stream. For example, we can find what is approximately the most popular traded stock, or estimate the number of distinct IP addresses seen on a network, using simple randomized algorithms that succeed with high probability. A new set of problems arise when the data streams contain uncertainty. Suppose we see each piece of information in the stream tagged with a probability that this item is present. What can we compute? Can we find the expected number of distinct IP addresses, or the expected most popular stock and the expected number of trades? This project will involve picking a question like this, finding algorithms and techniques, and analyzing their accuracy.

Relevant references include:

Cormode, G., and Garofalakis, M., "Sketching Streams Through the Net: Distributed Approximate Query Tracking," Proc. of the International Conference on Very Large Data Bases, 2005.

Cormode, G., Garofalakis, M., Muthukrishnan, S., and Rastogi, R., "Holistic Aggregates in a Networked World: Distributed Tracking of Approximate Quantiles," Proceedings of ACM SIGMOD International Conference on Management of Data, 2005.

Cormode, G., and Muthukrishnan, S., "An Improved Data Stream Summary: The Count-min Sketch and Its Applications," Journal of Algorithms, 55 (2005), 58-75.

Cormode, G., and Muthukrishnan, S., "What's New: Finding Significant Differences in Network Data Streams," Transactions on Networking, February 2006.

Cormode, G., and Muthukrishnan, S., "Space Efficient Mining of Multigraph Streams," Proceedings of ACM Principles of Database Systems, 2005.

Prerequisites: The student should have courses in dm, design and analysis of algorithms, and C coding.

* You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DDD2007-17

Academic Co-mentor: S. Muthukrishnan, muthu at cs.rutgers.edu, Computer Science, Industrial Co-mentor: Graham Cormode, cormode at aura.cs.bell-labs.com, AT&T Labs

Title: Designing Internet Ad Auctions

Internet search engines run auctions for search words as well as content on webpages millions of times a day. We need new auction models and efficient auction mechanisms as the nature of these auctions change and new features get added. Underlying these models and mechanisms are novel issues in game theory and approximation algorithms. We will study such combinatorial/stochastic auctions and algorithms, experiment with various algorithms and understand their fundamental limits. A good reference is:

Mehta, A., Saberi, A., Vazirani, U., and Vazirani, V., "Adwords and Generalized On-line Matching," FOCS 2005.

Prerequisites: The student should have courses in dm, design and analysis of algorithms, and C coding.

* You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DDD2007-18

Mentors: Ming Xie, mxie at stat.rutgers.edu, Department of Statistics and E.A. Elsayed, elsayed at rci.rutgers.edu, Department of Electrical Engineering

Project title: Optimization of Sequencing and Threshold Levels of Detection Systems

The candidate will provide assistance to an on-going project that considers a problem of container inspection at the port-of-entry. Containers are inspected at inspection stations, and data generated from different analytical methods, x-rays detectors, gamma-rays detectors and others sensors used for the detection of chemical, biological, radiological, nuclear, and explosive (CBRNE) agents are often relied upon to make critical decisions with regard to the nature of the containers and the appropriate response mechanism. The process of designing an efficient and accurate detection system must be deliberate and well thought-out. The threshold levels of sensors at the inspection stations might result in accepting undesired containers or subjecting "good" containers to unnecessary additional inspections. In this project we investigate approaches for determining the optimum arrangements of sensors and their corresponding threshold levels while considering potential measurement errors and cost and other constraints. Efficient approaches for investigating inspection systems with a large number of sensors will be investigated. The theoretical contributions of this research will be transferred via computational algorithms for practical applications.
* You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DDD2007-19

Mentor: Melike Gursoy, gursoy at rci.rutgers.edu , Industrial Engineering

Project title: Control Strategies in Cases of Pandemic Outbreaks in Urban Centers

We will examine how to manage human traffic flows of sick people under the constraints of urban environments in disease outbreak settings by clever routing/segregation strategies. To be involved in this project, the student has to be fluent either C or C++. Ideally, but not necessarily, the student should also know about linear programming and combinatorial optimization.
* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2007-20

Mentor: Chung-Chie Shan, ccshan at cs.rutgers.edu , Computer Science

A delimited continuation represents what is about to happen in a computation, as a function that maps an immediate result to a later answer. A substructural logic is a system of proofs and models that is sensitive to the order and use of resources, like food and unlike truth. Both delimited continuations and substructural logics are useful for explaining context sensitivity in natural and programming languages. We seek to relate them by using substructural logics to reason about delimited continuations, for instance to prove the soundness of a type system, or to produce a generic account of evaluation order.
* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2007-21

Mentor: William Pottenger, billp at dimacs.rutgers.edu , DIMACS

Project title: Higher Order Associations in Databases using an Apriori Algorithm

Introduction:

Association Rule Mining (ARM) has become quite popular in Computer Science data mining research. Association rules are formed from patterns discovered in data that occur frequently. A common example of such a pattern is the set of items that someone might buy at a grocery store. For example, if many customers buy milk and eggs together, then the two items �milk� and �eggs� together form a frequent itemset, which in turn forms the rule eggs � milk. A grocery store layout manager could use this rule to put the eggs near the milk, thereby encouraging other customers to buy both. Applying ARM to grocery-store data is one of the most common applications, and is termed market-basket analysis. ARM can be used on databases of other types too, however, including e-commerce websites.

Despite the plethora of work in ARM, little work has been done to investigate discovery of higher-order association rules. Higher-order association rules can be understood through the following example from e-commerce market-basket analysis: suppose customer �A� purchases {computer, OS} online, customer �B� buys {laptop, OS} and customer �C� gets {laptop, mouse, battery}. Then several higher-order associations can be formed, including computer-to-laptop through OS, OS-to-mouse through laptop, computer-to-battery through OS and laptop, etc. Li et al. have developed Higher Order Apriori, the first algorithm capable of discovering indirect (i.e., higher-order) associations based on propositional rules [1]. The approach is substantially less complex than previous attempts to discover higher-order associations, such as multi-relational ARM, that discovers relational rules. The method introduced in [1] is based on the Apriori algorithm and discovers rules for all sizes of itemsets across all orders.

Higher Order Apriori discovers itemsets that cross record boundaries, rather than from individual records. Itemsets of any size (k-itemsets) are discovered based on relations up to the nth-order; thus, the k-itemset has a higher order association of length n. The algorithm increases the size of k-itemsets in each iteration (as with Apriori), while generating them across all orders for the current size itemset. Next, support is calculated based on the order of the itemsets and the number of higher-order associations connecting the items [1]. Each itemset�s support is calculated locally for each order and then summed to get the global support. If global support does not meet the threshold, the itemset is discarded. The Higher Order Apriori algorithm generates association rules based on the remaining frequent itemsets.

REU Project Description:

Our research objective is to employ one or more REU students to test, evaluate, and optimize the Higher Order Apriori algorithm. This involves applying the algorithm to real world data including e-marketplace, law enforcement, public healthcare data and other new domains in which higher order associations may be useful. Evaluation can be done in terms of comparison between high-confidence, high-support rules generated by Higher Order Apriori and those generated by Apriori.

Secondly, it is crucial that we further investigate the support metric. The support metric employed in [1] was developed empirically; thus, it does not have a sound theoretical basis or consider all possible contributing factors. We seek to consider additional factors that may prove useful in developing an optimal support metric.

Finally, we plan to incorporate Higher Order Apriori in our Text Mining Infrastructure (TMI) [2]. The TMI is an open-source framework that provides a robust software core for research and development of text mining applications. Higher Order Apriori will be applied in distributed environments using the TMI.

The ideal REU will have sufficient background in Computer Science to enable him or her to work as a research programmer.

References:

[1] Shenzhi Li, Aditya P. Belapurkar, Murat Ganiz, Mark J. Dilsizian, Xiaoning Yang, Christopher D. Janneck, William M. Pottenger. (2006) Higher Order Apriori. http://www.dimacs.rutgers.edu/~billp/pubs/Higher Order Apriori.pdf

[2] L.E. Holzman, T.A. Fisher, L.M. Galitsky, A. Kontostathis and W. M. Pottenger. (2004) A Software Infrastructure for Research in Textual Data Mining. The International Journal on Artificial Intelligence Tools, volume 14, number 4, pages 829-849.

* You must be a enrolled at a U.S. university to be eligible for this project.

Please revisit this page frequently, additional projects will be posted through January.
Index of REU program
DIMACS Home Page
Contacting the Center
Document last modified on January 9, 2007.