DIMACS Theoretical Computer Science Seminar

Title: Near optimality of the priority sampling procedure

Speaker: Mario Szegedy , Rutgers University

Date: April 12, 2005 2:00-3:00pm

Location: DIMACS Center, CoRE Bldg, Room 431, Rutgers University, Busch Campus, Piscataway, NJ

Abstract:

N. Duffield, C. Lund and M. Thorup conjectured that the variance of their highly successful priority sampling procedure is not larger than the variance of the threshold sampling procedure with sample size one smaller. The conjecture's theoretical significance is that the latter procedure is provably optimal among all off-line sampling procedures. Its practical significance is that it rigorously explains why testing priority sampling on internet traffic analysis has found to be performing magnitudes better than previous schemes. We prove the conjecture. Our result also gives an affirmative answer to the conjecture of N. Alon, N. Duffield, C. Lund and M. Thorup, which states that the standard deviation for the subset sum estimator obtained from k priority samples is upper bounded by W/\sqrt{k-1}. (W is the actual subset sum.) The paper is available electronically (ECCC 2005) http://eccc.uni-trier.de/eccc-reports/2005/TR05-001/index.htm