DIMACS Conference on Linking Mathematics and Biology in the High Schools

April 29 - 30, 2005
DIMACS Center, CoRE Building, Rutgers University

Fred Roberts, DIMACS, froberts@dimacs.rutgers.edu
Midge Cozzens, Colorado Institute of Technology, mcozzens@coloradoit.org


Gary Benson, Bioinformatics and Biology, Boston University

Title: Waiting Time and Seed Selection for Homology Search

When studying a DNA or protein sequence, a common question is "What other sequences are homologous to this one?" Homology is similarity due to an evolutionary relationship in which the sequences have a common ancestor. Homologous sequences often have related functions, so if we can identify homologies, we obtain important clues to the biological role of the sequence.

Given a query sequence, searching for homology is usually done over a very large database of sequences which have a combined length of perhaps hundreds of BILLIONS of 'letters' (DNA nucleotides or protein amino acids). This search is speeded up by locating "seeds," short pieces of the query that have a perfect match in a database sequence. Seeds are then expanded to check if they are part of a true homology.

The best type of seed to use depends on the mathematical model describing the expected degree of similarity between homologous regions. In this talk I will focus on the "waiting time" function which is used to select an appropriate seed. Here is an example. Suppose we flip a fair coin 5 times and I ask "During the 5 flips, what is the probability that we get 3 heads in a row?" We can extend this to the case where the coin is biased towards heads and ask the same question. The waiting time calculation answers these questions and is directly related to choosing optimal seeds for homology search.

Gary Benson, Bioinformatics and Biology, Boston University

Title: Building DNA models with K'NEX

Simple and informative models of DNA can be built with K'NEX brand constructors, found at most toy stores. These models, which take only a few minutes to build, illustrate many important physical properties of DNA including the major and minor grooves, antiparallel strands, right-handed and left-handed helices, supercoiling, replication, etc. By building and manipulating the models, students (and teachers) can gain an intuitive sense of the physical properties of DNA which are essential to many cellular processes. Instructions for building the models and illustrations of various properties of DNA will be presented. A small set of KNEX pieces will be given to each participant.

L. Charles Biehl, Math teacher and administrator, Charter School of Wilmington, DE

Title: Biomathematics: Using Graph Models for High School

The vertex-edge graph is a powerful and effective tool in many problem-solving and mathematical modeling situations. This session will present an overview of a few topics from biology, including life processes, genetics, and ecology that can effectively use graph models as a basis for understanding and working with them. Topics will include matching DNA sequences, designing phylogenetic trees, modeling protein interaction networks, and references to graph models in epidemiology. The primary focus will be to convey an understanding to high school students an appreciation of how these models are generated, with less (but still some) emphasis on the actual algorithms and processes that are used on them.

Steve Billups, Center for Computational Biology, University of Colorado

Title: Bridge Courses for Cross-Training between Biology, Computer Science, and Mathematics

Under a grant from the Colorado Institute of Technology, the University of Colorado at Denver and Health Sciences Center is designing Bioinformatics options in each of the Biology, Mathematics, and Computer Science majors. A key component of these options is the development of "bridge" courses, which are designed to bring students of one major up to speed with essential knowledge from the other two majors. This talk will describe the design of these courses and discuss aspects that may be adaptable to the high school curriculum.

Bro. Pat Carney, Math Teacher, De Paul Catholic High School, Wayne, NJ

Title: The Role of Teacher Training in Overcoming the Inertia of the Status Quo

Change in education sometimes comes too quickly without the proper foundation and sometimes at a pace that would not qualify for the final heat in a snail race. We cannot afford either of these extremes.

Even with all of the best intentions and programs, the greatest ideas in education can be doomed if the rank and file teachers put up obstacles. This talk will attempt to look at some of these possible road blocks from the point of view of both the biology and mathematics perspectives. We will look at some possible ways to overcome the inevitable inertia so that teachers can be trained and not lose their enthusiasm in a hostile environment. We will draw upon the collected wisdom of teachers who have been involved in other successful innovations (and some not so successful) and learn from the accomplishments DIMACS has had in training leaders in its discrete math programs.

This is a work in progress and it is hoped that the audience will actively contribute to the session.

Valerie DeBellis, Shodor Education Foundation

Title: Changing the Culture: Professional Development for the 21st Century Teacher

What are essential aspects of cross-discipline professional development for high school math and science teachers and how can we embrace a true collaboration with so many players? What might be an efficient training design so that cutting-edge research can reach (and excite) both students and teachers in a timely way? For the past year, our TRED (teacher-researcher-educator) team has explored these questions in a North Carolina rural high school. We will share our experiences and lessons learned. Among them is the lesson that the culture will not change easily.

Kevin DeVizia, Math Teacher, Delaware Valley High School, Milford, PA

Title: Probability and Betweenness of Points: The Genetic Ruler

This presentation will focus on a connection between the Euclidean Geometry topic of Betweenness of Points, Probability, and Gene Mapping. The crossing over of genes provides information allowing for the mapping of the gene.

Kathy Erickson, Monument Mountain Regional High School, Great Barrington, MA

Title: Reverso: The Exciting Game of Genetic Inversion. A Reflection on Teaching a Biomathematics Unit to High School Students

This unit was developed by four teachers at the DIMACS Biomathematics Institute during the summer of 2005. Students explore the idea of genetic inversion, the process by which proteins in a DNA strand are rearranged. The mathematical question focuses on reversals by reflecting strings and involves finding the minimum number of possible moves between two different configurations. Students explored algorithmic thinking and optimization. The presenter will discuss what went well and what she will change about the unit.

Tom Fleetwood, Bio Teacher, Charter School of Wilmingon, DE

Title: Math!? Why Do I Need That? The Biology Student's Dilemma with a Problem-based Solution

Biology has not traditionally been associated with mathematics like physics or chemistry and most students who enjoy biology like it that way. However, amazing advances in technology are causing the math/biology relationship to change and these students are being rapidly left behind. This presentation will discuss ways to engage the biology student with mathematics using the strategy of "problem-based learning" by exposing students to complex real-world problems.

Kathleen Gabric, Biology Teacher, Hinsdale Central High School, Hinsdale, IL

Title: Using Bioinformatics to Make the Bio-Math Connection

Since the implementation of sophisticated bioinformatic tools and databases, I have witnessed an improvement in the level of biological questions and discussions in my classroom. If your school has computers and an Internet connection, this could happen in your classroom, too. The student interface at Biology Student Workbench makes maneuvering through sequence databases very intuitive. In this session, I will demonstrate how my students use these tools to access real data and apply these tools in independent problem solving while being actively engaged in the process of learning. A connection will be made to the math behind these tools.

Kathleen Gabric, Biology Teacher, Hinsdale Central High School, Hinsdale, IL

Poster: BLAST Your Way into Bioinformatics using Biology Student Workbench

Come in and try your hand at with Biology Student Workbench. This session will provide time to experience some of the computational tools it provides: BLAST, CLUSTALW, SIXFRAME, and DRAWTREE. When you leave, you'll be on your way to utilizing this great source (that's free!) in your math or biology classroom.

Sol Garfunkel, COMAP

Title: Mathematics in Service to Biology and Vice Versa

We will present a series of mathematical models of biological phenomena appropriate for the high school classroom. These can be presented in a math class, a biology class or in an integrated setting. Jokes will likely be told.

Mike Gargano, Pace University

Title: An Angiogenesis Model using Graph Theory and the Generalized Ballot Problem

Angiogenesis, the growth of blood vessels in a vascular network, uses the mechanisms of budding (sprouting), connecting (anastomosing), and splitting (intussusception).

Beginning with a single edge, our model simulates the growth of a network. At each time period one of the mechanisms is executed randomly. Thus a network is formed from a sequence of budding, connecting, and splitting. How many legal sequences are there?

The well known "Ballot Problem" in mathematics is used to solve this problem.

High school students will be exposed to practical applications of simple combinatorics (combinations, permutations, and counting), and graph theory by seeing the solution of this problem).

In collaboration with Lorraine Lurie, Lou Quintas, and Eric Wahl.

Lou Giglio, High School Teacher, Riverdell Regional High School, Oradell, NJ

Title: Applications Several Dynamic Programming Techniques

My team of two math teachers developed a classroom module that introduces the student to one of the mathematical models utilized by biologists looking for similarities between the genes in different species for the purpose of making evolutionary connections or to compare an unknown gene sequence to a known genome. Specifically, students learn how to apply several dynamic programming techniques to optimally align base strings globally and locally. By applying these algorithms students work through biological problems using a mathematical approach.

Alan Hastings, Dept. of Environmental Science & Policy, University of California

Title: Quantitative courses and training in mathematical biology, particularly ecology

One of the difficulties with training students in mathematical biology is that the math background students get is often inappropriate for study in biology. Also, math courses tend to use examples from the physical sciences. I will describe several efforts at incorporating mathematics into biological courses in population biology that could be used at earlier levels. Additionally, I will describe and give examples from an undergraduate population biology course of how quantitative ideas can be incorporated into simple spreadsheets in Excel. This method eliminates some of the hurdles associated with other environments like Matlab. Also, use of software of this kind illustrates the 'experimental' aspect of mathematical approaches.

Katherine G. Herbert and James H. Dyer, Montclair State University

Poster: Science Informatics at Montclair State University

Bioinformatics and Bio-Math represent new and exciting ways to teach students about the practical applications within science. Currently, with wide-spread media coverage of bioinformatics projects such as the human genome project and drug discovery projects, students are energized to investigate multi-discipline projects. Montclair State University is answering this call by students for scientific multi-discipline studies through its Science Informatics Program. The Science Informatics Major combines Biology, Chemistry, Computer Science and Mathematics to instruct students in the problem-solving techniques for both Bioinformatics and Cheminformatics. Students complete a core set of courses in each of the subject concentrations. They next complete advanced courses within the concentration of their choice. Finally, they fulfill a ten credit research sequence to practice problem-solving skills combining their knowledge from the four primary subject areas, with an emphasis on developing computer science skills. This poster discussion will present the Science Informatics Major and the successes and challenges it has faced during its creation. It will highlight the unique and significant problem-solving component of the major which is a four course research sequence. Finally, it will discuss future directions for the program.

Laurie Heyer, Mathematics, Davidson College

Title: Bioinformatics? One Minute and One Hour at a Time

Two courses at Davidson College take complementary approaches to integrating mathematics, computer science, and biology. In the Genomics, Proteomics and Systems Biology course, we use case studies to illustrate how mathematics and computer science help solve biological problems. In the Bioinformatics course, we give students the mathematical and computational skills needed to address the biological problems of the future. I will briefly describe each of these courses and discuss a few course topics in more detail.

Rob Hochberg, Computer Science, East Carolina University

Title: Co-Evolution

The remarkable similarity between the shape of the beaks of some Latin American hummingbirds and the shape of the flowers from which they sip nectar, and which they pollinate, has led scientists to speculate that a sort of co-evolution has been occurring between these species: Each has adapted to accomodate mutations in the other. The same seems to happen also between birds and the seeds they eat, the butterflies they don't eat and the eggs they lay. These studies make fascinating reading.

Something similar seems to be happening at the protein level as well. Two proteins which interact can be seen to have co-evolved in order to maintain thier ability to work together. This important observation implies that it may be possible to discover which pairs of proteins interact by trying to detect whether they have co-evolved. We'll show how the simple idea of parsimony can be used to detect this co-evolution, and how it may help biologists in the important task of discovering protein-protein interactions.

Ben Hughes, Galileo Magnet High School

High School Student Title: Applying Abstract Algebra and Graph Theory to Model Flu Seasons
Under the direction of Olgamary Rivera-Marrero, Virginia Tech

Over the summer, I attended a workshop discussing the applications of mathematics to biological problems. Working with a small group of students, I studied influenza dynamics in Virginia and formed models to predict possible future epidemics. We used the Discrete Visualizer of Dynamics software and abstract algebra concepts to make mathematical models from real flu data. Once we had a model to work with, we modeled the optimal methods for controlling the damage caused by the flu in my home county. We used the Mathematical Modeling Using Graphs and Matrices software and graph theory to calculate the best location for a treatment center, based on environmental factors. Using basic knowledge of the flu, including prevention education, we combined the raw data and models to form a presentation outlining, in a comprehensive manner, our findings. The whole situation was hypothetical, but it opened my mind to new ways of combining math and biology to find simple solutions for basic problems.

Eric Jakobsson, Ph.D., Director, NIGMS Center for Bioinformatics and Computational Biology at NIH, and Chair, NIH Biomedical Information Science and Technology Initiative Consortium

Title: The Interdisciplinary Scientist of the 21st Century

In the 20th century, molecular biology emerged as a branch of applied physics. Every page of a molecular biology textbook, or the molecular section of a general biology textbook, is filled with images and facts that could not have been discovered without physics that was discovered after 1900. Deep understandings about biology emerged from understanding the structures and dynamics of biological molecules, using the new physics. But another awareness began to take hold at the end of the century: biological systems are way more than the sum of their molecular building blocks. The relationships between the parts are complex and non-obvious, and understanding those relationships is essential to moving forward in our understanding of living systems. This new agenda of discovery has moved mathematical theories and models, and their implementation in computational algorithms, to the heart of biological research. The new biologist must know mathematics, physics, chemistry, and computation in addition to biology. To continue to teach biology and mathematics in separate silos is to cut off future progress in biological research at its source.

John Jungck, Biology, Beloit College

Title: Valuing Voronoi Visualization: Patterns in Nature for Art, Biology, and Mathematics

What mathematics is it important for biology students to learn? The National Research Council's Bio 2010 has recommended much more mathematics be included in biology curricula and that biologists should collaborate much more with mathematicians. Therefore, it is important to explore challenges to both communities in light of a forthcoming explosion of interdisciplinary interactions. Allometry and fractals have captured the imagination of mathematical biologists as well as amateurs because both apply across at least ten orders of magnitude of biological phenomena and structures from the molecular to the ecological level. Voronoi (nearest neighbor) polygons and polyhedra are less well known to both audiences, but scale equally well. Voronoi polygons and polyhedra are associated with additional mathematical methods that allow deeper insight into a variety of biological phenomena such as growth, diffusion, division, packing, docking of ligands, strength of materials, molecular folding, foraging behavior, predator avoidance, and crowding as well as to their utility in making measurements, modeling interactions, relationship of two- and three-dimensional tomographic structures, and visualization per se. Furthermore, by employing Voronoi polygons and polyhedra in mathematical biology education, we will illustrate the five fold approach of curricular reform in mathematics: (1) analytical (theorem/proof), (2) numerical, (3) symbolic, (4) visual/graphical, and (5) applied to relevant scientific and social problems. While various approaches to constructing Voronoi polygons and polyhedra, Delaunay triangulations, and minimal spanning trees may be formally isomorphic from a mathematical or computer science perspective, different techniques are much better than others in helping students relate a causal mechanical and material model of their biological phenomena of interest, simulating phenomena realistically, or in making appropriate measurements. Multiple methods for constructing Voronoi polygons and polyhedra, Delaunay triangulations, and minimal spanning trees will be applied to epithelial cell boundaries, fish boundaries on sandy lake bottoms, dragonfly wing venation, cross-sections of leaves, fiddler crab flocking behavior, drug design, packing of side chains in polypeptides, bird territories, and forest canopies to illustrate their commonalities and differences? Statistical analyses of Voronoi polygons and L-mosaics will be compared to determine whether nearest neighbor or long-range interactions better apply to a given set of biological data and the application of Voronoi polyhedra to structural bioinformatics will be introduced. Finally, the goals of visualization, geometry, and topology in biology education will be addressed within the context of engaging students with their world, exploring images as data, and aesthetic appreciation of asymmetric, irregular, and fractured forms.

Linda Lundgren, Author, Biology the Dynamics of Life, Glencoe McGraw-Hill

Title: Publishing Bioinformatics in a Well Established Biology Program

Do you want to publish your bioinformatics materials with a main stream biology textbook publisher? Consider the details and requirements of publishing biology materials: State Standards, accountability to local districts, standardized testing, teacher training, No Child Left Behind. How does your material fit in?

Joseph Malkevich, CUNY

Title: Teaching Mathematical Biology in High School

A great variety of biological topics can profit from a mathematical treatment at the high school level. This talk will discuss some of these topics (e.g. clustering, edit distance, population growth, animal harvesting, and phylogenetic trees) and give pointers for materials about these subjects suitable for high school teachers and students.

Michael Martin, Johnson County Community College and University of Kansas

Title: Dynamic Web Tools for Biomathematics: Bringing Realistic Models to Secondary Education

Do tedious computations interfere with meaningful discussions of math as it applies to biology? Moreover, do aspects of computation, visualization, and computer algebra coding overburden the instruction of mathematical biology? In this talk, some publicly available dynamic web tools to refocus the discussion will be demonstrated. These are class-tested tools in areas ranging from population dynamics, pharmacology, cardiology, respiratory exchange, neurology, and selection; mathematical areas include algebra, trigonometry, calculus, difference and differential equations, and stochastic processes. The presenter and a colleague recently won the 2004 Award for Excellence and Innovation at the International Conference on Technology in Collegiate Mathematics for their development of some of these tools. The emphasis of the talk will be to overview the tools and modules that are built around them, and to show how they can be integrated into existing curriculum and delivery and assessment modalities.

Linda Morris, Jefferson County, Colorado School District

Title: Cross-Disciplinary High School Teacher Professional Development

With biology emerging as a dominant scientific partner for mathematics there is a need for cross-disciplinary teacher training at the high school level. Biology and mathematics teachers need to share common theories, methodologies and vocabulary. In this session, participants will review two recently developed BSCS modules, Genes, Environment, and Human Behavior and Bioinformatics and the Human Genome Project to begin a conversation about interdisciplinary instructional materials and what mathematics and biology teachers need to know and be able to do to teach these modules as interdisciplinary units. Specifically, participants will address the following questions:

Charles Mullins, Arkansas School for Math, Sciences and Arts, Hot Springs, AR

Title: Researching the Superstring Problem

At the BioMath Connect Institute '04, my research team, consisting of 7 high school math teachers, investigated the "Superstring Problem." The problem arose in sequencing the human chromosome. Since current techniques can only sequence "small" strings of length around 500-700 bases, it is necessary to cut the chromosome into smaller strings. After these are sequenced it is necessary to reconstruct these fragments into a "superstring" which led mathematicians to the Superstring Problem. Some of this summer research was extended by one of my students into an award-winning project, placing 2nd in mathematics at the Arkansas State Science Fair. In my talk, I will give more details about our summer research and then describe the student/teacher collaboration on research at my school.

S. Muthukrishnan, Computer Science, Rutgers University

Title: Randomly Dealing with Biological Strings by Embeddings

An exciting meeting point for Mathematics and Computer Science with Biology is in dealing with genomic sequences. Early interaction produced a number of algorithmic results using "dynamic programming" taking "quadratic time", and led to many usable programs. However, in the past few years, genomic data has grown in size and we need much faster methods. The solution has been to use the powerful mathematical approach of "embeddings" and algorithmic strategy of "random projections". In this talk, I will provide an introduction to these developments through examples.

Robert Panoff, Shodor Education Foundation

Title: Interactive Modeling Environments: Sources and Resources for Quantitative Reasoning in Biology

Computational science continues to advance the accurate description and prediction of the dynamics of complex systems. Moving from the researcher's workbench to the classroom, real-time model solutions and simulations are now possible in most every area of education in the sciences and mathematics. These interactive learning environments help us to understand diverse phenomena, while opening up new areas of learner-centered, group-oriented, discovery-based learning.

I will explore several examples from recent work in mathematics and biology to demonstrate the full impact of numerical modeling and scientific visualization in the classroom. Besides a variety of free and low-cost sources for modeling tools, resources from the Computational Science Education Reference Desk, a pathway project of the National Science Digital Library will enable participants to bring interactive modeling environments to their own teaching practice.

Jess Platt and Riley Baldwin, Monument Mountain Regional High School, Great Barrington, MA
Under the direction of Kathy Erickson, Monument Mountain Regional High School, BMCI 2004 Participant

High School Student Title: Algorithms for Genetic Inversions

One question in Biomathametics invloves inverting a given sequence of numbers into the identity sequence. The presenters will explain and analyze algorithms to investigate inversions using rotations. Topics will include signed and unsigned sequences and linear and circular setups.

Mjumbe Poe, Computer Science, Harvey Mudd College

Title: Games, Metaphor, and Learning

In this talk we describe an experimental computer game designed to teach protein synthesis to high school and college students. This game is novel in its use of metaphor to introduce game content; specifically our game focuses on elixirs, which are metaphors for proteins.

In our game world, elixirs are essential to life. The raw materials used to produce elixirs are growing scarce and humankind is facing extinction. The adventure game centers around a 16 year-old girl named Rosalyn. As the game begins, Rosalyn returns home to find that her parents have been arrested for "subversion of elixir production." The player must guide Rosalyn to rescue her parents. In the process Rosalyn re-discovers an ancient method of elixir production that uses renewal resources, survives attempts by various political factions to squash her efforts, and saves the world!

Joint work with Elizabeth Sweedyk, Computer Science, Harvey Mudd College

Olgamary Rivera-Marrero and Brandilyn Stigler, Virginia Tech

Poster: Reaching the Community with Mathematical Biology

In response to new ways of teaching, the presenters developed and implemented a mathematics workshop for secondary mathematics teachers and high school students in Danville, VA. The workshop was designed to introduce mathematical biology, an emerging discipline within the field of mathematics, and to demonstrate innovative ways of teaching mathematics with graphical modeling software.

Mathematical modeling was the primary topic of the workshop in which the participants explored basic concepts in abstract algebra and graph theory. The participants were engaged in discovering applications of mathematics through a hypothetical epidemiological problem based on actual data. This experience fostered the educators' appreciation of innovative ways of teaching mathematics using advanced mathematics and mathematical modeling software, with an emphasis on integration of these topics into the Standards of Learning mathematics curriculum. In addition, the activities generated interest of mathematical modeling in the students, while maintaining relevance of advanced mathematics.

During this workshop, we found that the students extended their problem solving and communication skills. All participants established a connection between mathematics and biology and discovered novel applications of technology to solve mathematical problems. Through the creation of an intergenerational environment in which educator worked alongside student and illustration to creative approaches to instruction, the high school teachers developed novel methods to integrate technology and advanced mathematics concepts into the Standards of learning mathematics curriculum.

The presenters will introduce the framework of the workshop, including a description of the mathematics topics and the mathematical software. We will also present the epidemiological problem given to the participants. We will close with a discussion of the implications of this outreach experience.

Christina Sartorio, Seton Catholic High School, Pittston, PA and Zachary Brady, Wyoming Area High School
Under the direction of Jim Kupetz, Seton Catholic High School, BMCI 2004 Participant

High School Student Title: Comparison of the Cytochrome Oxidase subunit I Gene in Soldier and Non-soldier Aphids.

Aphids classified as "soldier" type produce soldiers which protect the rest of the community. The soldiers are genetic clones of the colony "mothers." "Non-soldier" type aphids do not produce any such offspring. The cytochrome oxidase subunit I gene is the last carrier in the electron transport system for cellular respiration. Comparisons between the cytochrome oxidase gene in soldier and non-soldier aphids have shown that the non-soldier and soldier aphids are genetically more similar within their groups, than from one group to the other.

William Sofer, Genetics, Rutgers University

Title: Learning Science/Mathematics by Doing Science/Mathematics: Genuine Research Projects for High School Students

Many educators believe that an ideal way to learn science and mathematics is to participate in a real research project. In this "apprenticeship" model, students would interact with mentor scientists/mathematicians, deal with new data, and generate novel findings that could be published in the literature. At first glance, it would appear that such projects are impractical for high school students to pursue. However, I will attempt to show that high school students can carry out genuine research projects in bioinformatics in their schools under the supervision of their teachers and thereby get a rich exposure to modern molecular biology and mathematics.

James Stevens, University of Colorado, Colorado Springs

Title: A Biology and Engineering Cooperative Project

This presentation will focus on the cooperative aspects of one joint research project between the departments of biology and mechanical engineering at the University of Colorado at Colorado Springs. This project, funded by the Colorado Institute of Technology, is examining the heat generation characteristics of wild and drug resistant lines of tumor cells as one tool to explore the relationship between cellular metabolic activities and immune system visibility. The complementary strengths of the two departments as applied to the necessary measurements and analysis will be discussed. For this particular project, the biology department is the science driver and brings expertise in cell culture and sample preparation while the mechanical engineering department brings expertise in data acquisition, mathematics and thermodynamic analysis. Other ongoing cooperative projects will be described briefly.

DeWitt Sumners, Math and Molecular Biophysics, Florida State

Title: Developing the new Biomedical Mathematics Major at Florida State University

At Florida State University the Department of Mathematics has developed a new undergraduate major in Biomedical Mathematics. We are also cooperating with the Department of Biological Sciences on a grant from the Howard Hughes Foundation whose aim is to produce more computationally literate biology majors. This talk will discuss our efforts in curriculum development. The talk will also discuss some interesting new applications of topology and geometry in molecular biology and medical imaging, which may be of interest in high school curriculum development in both mathematics and biology.

Johnson C. Wong, Arkansas School for Math, Sciences and Arts, Hot Springs, AR
Under the direction of Charles Mullins, Arkansas School for Math, Sciences and Arts, BMCI 2004 Participant

High School Student Title: Effectiveness of the Greedy Algorithm in DNA Sequencing

This project is a study on the effectiveness of the Greedy algorithm in the shortest superstring problem, with focus placed on DNA shotgun sequencing. The effectiveness of the algorithm was measured by the ratio alpha between the lengths of the Greedy solution and the true optimal solution.

To test this, a computer program was written to randomly generate sets of strings given certain parameters. The Greedy solution was calculated and compared to the true solution found by brute force. The program sequentially cycled through combinations of the three parameters tested given a range for each: string length, number of strings, and alphabet size. The ratio between the times taken for each method was used to find an equation of time as a function of the three variables by running a multiple regression.

The value of alpha tended to increase with increasing number of strings and decreasing string length and alphabet size. Generally, the Greedy algorithm produced the correct solution with greater consistency in cases with lower average alpha values. Time increased exponentially with respect to all variables, with number of strings having the most influence and alphabet size the least.

The Greedy algorithm was found to be effective in the cases studied. Match frequency for most situations was high - around 90% for most cases and never below 70% - and alpha remained close to one. Even in the relatively simple cases studied, the time required for brute force became thousands of times larger than the time taken by the algorithm. Under optimal choices of parameters, the Greedy algorithm is a good and quick approximation for the shortest superstring problem.

Previous: Program

Workshop Index
DIMACS Homepage
Contacting the Center
Document last modified on May 2, 2005.