Mathematical Methods for Mining in Massive Data Sets


Helene E. Kulsrud
Center for Communications Research, Princeton/IDA

Data Mining Problems fall into three general categories, the discovery of: associations, patterns or special events. Originally traditional data base management techniques and programs were used to obtain this information. However, with the advent of higher bandwidth and faster computers, the speed needed to access the desired information from very large data bases, requires more powerful mathematical methods. New and improved techniques which have proved effective are statistical such as Regression Trees, Vector Space Methods, Wavelet Coding, and Parentage/Acquaintance. Probabilistic techniques, such as Bayesian Nets and Markov Models have been employed as have been Associative methods such as Neural Nets and the Apriori Algorithm. The relative effectiveness of these methods and examples of success stories will be presented.