# Hello.

My name is Steve, and this is my home page.
Well. There doesn't appear to be anything really intricate on this page as of yet, so let's talk about what I'm doing over the summer, and a little bit about myself...

Name: Stephen Marotta
School of Origin: Drew University
Advisor: Dr. William Steiger

So the question on your mind is most likely, "What did I do during the summer of 2000?". I'm almost too happy to answer.
Over the course of my stay with DIMACS at Rutgers University, I have been working with Dr. Steiger on the problem of slope selection. The problem, in a nutshell, is the following: Given a set of data points, usually in a statistical context, find a line in the format l defined as y = mx + b such that l is the best-fit line for the given set of data points.

You're probably thinking to yourself, "Well, what about the least-squares method? That's always been good to me in the past..." That's as may be, but the problem with least-squares is that it is suceptible to one erroneous data point. If just one point is hopelessly wrong, your results are rendered meaningless. So we want a more robust solution

Dr. Henry Teil, in 1950, outlined a new best-fit technique, which was to find all of the possible slopes formed by the (n choose 2) pairs of points in our data set, take the median of those slopes, and that would be the slope of your best-fit line. The snag is, how do we compute this in optimal time?

We could always just compute all of the slopes, sort them, and pick out the one in the middle. But first, it costs O(n^2) to compute those slopes, then a further O(n^2 log n) to sort them. So we need a better algorithm. The methods required in such an algorithm are described in detail in a paper by Drs. William Steiger, Robert Cole, Jeffrey Salowe, and Endre Szemeredi, in which they outline a deterministic algorithm to compute the median slope in O(n log n). Unfortunately, the constant in front of that expression is something like 10^120, which is, in seconds, orders of magnitude longer than the age of the universe.

My project is to implement and evaluate probabilistic algorithms to compute the median of the slopes in O(n log n) time, whilst at the same time keeping the constant nice and low, hovering about 20 or 30 or so. If you would like more information regarding how this would be accomplished, feel free to email me with any questions.