DIMACS/Computer Science/Statistics Joint Seminar

Title: Not so naive Bayesian classification

Speaker: Geoff Webb, Monash University

Date: Thursday, August 24, 2006 11:00am

Location: DIMACS Center, CoRE Bldg, Room 431, Rutgers University, Busch Campus, Piscataway, NJ


Abstract:

Naive Bayes is an extremely efficient classification learning technique. Despite its simplicity, naive Bayes has proved remarkably accurate for many tasks. In consequence, it is widely deployed, even though its accuracy is known to be limited by its attribute independence assumption. Of numerous proposals to improve the accuracy of naive Bayes by weakening its attribute independence assumption, both LBR and SuperParent TAN have demonstrated remarkable accuracy. However, both techniques attain this accuracy at a considerable computational cost. Motivated by both theoretical and practical considerations, we present a new approach to weakening the attribute independence assumption by averaging all of a constrained class of semi-naive Bayesian classifiers. In extensive experiments this technique delivers comparable prediction accuracy to LBR and SuperParent TAN, with substantially improved computational efficiency. It has the desirable properties of

Despite being generative, it delivers classification accuracy competitive with state-of-the-art discriminative techniques.

BIO:

Geoff Webb holds a research chair in the Faculty of Information Technology at Monash University. Prior to Monash he held appointments at Griffith University and then Deakin University where he received a personal chair. His primary research areas are machine learning, data mining, and user modelling. He is widely known for his contribution to the debate about the application of Occam's razor in machine learning and for the development of numerous algorithms and techniques for machine learning, data mining and user modelling. His commercial data mining software, Magnum Opus, is marketed internationally by Rulequest Research. He is editor-in-chief of the highest impact data mining journal,Data Mining and Knowledge Discovery and a member of the editorial boards of Machine Learning, ACM Transactions on Knowledge Discovery in Data, and User Modeling and User-Adapted Interaction.