Hello.
My name is Steve, and this is my home page.
Well. There doesn't appear to be anything really intricate on this page
as of yet, so let's talk about what I'm doing over the summer, and a little
bit about myself...
Name: Stephen Marotta
E-Mail Address: smarotta@drew.edu
School of Origin: Drew University
Advisor: Dr. William Steiger
So the question on your mind is most likely, "What did I do during the summer of
2000?". I'm almost too happy to answer.
Over the course of my stay with DIMACS at
Rutgers University, I have been working with Dr.
Steiger on the problem of slope selection. The problem, in a nutshell, is the following:
Given a set of data points, usually in a statistical context, find a line in the format
l defined as y = mx + b such that l is the best-fit line
for the given set of data points.
You're probably thinking to yourself,
"Well, what about the least-squares method? That's always been good to me in the past..."
That's as may be, but the problem with least-squares is that it is suceptible to one
erroneous data point. If just one point is hopelessly wrong, your results are rendered
meaningless. So we want a more robust solution
Dr. Henry Teil, in 1950, outlined
a new best-fit technique, which was to find all of the possible slopes formed by the (n choose 2)
pairs of points in our data set, take the median of those slopes, and that would be
the slope of your best-fit line. The snag is, how do we compute this in optimal time?
We could always just compute all of the slopes, sort them, and pick out
the one in the middle. But first, it costs O(n^2) to compute those slopes, then
a further O(n^2 log n) to sort them. So we need a better algorithm. The methods
required in such an algorithm are described in detail in a paper by Drs. William Steiger, Robert Cole,
Jeffrey Salowe, and Endre Szemeredi, in which they outline a deterministic algorithm
to compute the median slope in O(n log n). Unfortunately, the constant in front of that
expression is something like 10^120, which is, in seconds, orders of magnitude longer than the
age of the universe.
My project is to implement and evaluate probabilistic algorithms
to compute the median of the slopes in O(n log n) time, whilst at the same time keeping the constant
nice and low, hovering about 20 or 30 or so. If you would like more information regarding
how this would be accomplished, feel free to email me with
any questions.
Goodbye.