DIMACS Monitoring Message Streams Seminar

Title: Using Prior Knowledge for Text Categorization

Speaker: Aynur Dayanik, DIMACS, Rutgers University

Date: Monday, February 14, 2005 4:15pm

Location: DIMACS Center, CoRE Bldg, Room 433, Rutgers University, Busch Campus, Piscataway, NJ


Text categorization is the problem of assigning texts to predefined categories. Automatic text categorization algorithms use labeled texts to assign the correct labels to new texts. If training data is limited, then prior knowledge becomes crucial to build accurate classifiers.

This talk will present a Bayesian logistic regression method for text categorization. This method allows an expert to incorporate prior knowledge into text categorization effort via informative priors. Our preliminary results suggest that if there is insufficient training data, then prior knowledge increases the accuracy of classifiers.

(This is joint work with Alex Genkin, David D. Lewis, David Madigan, and Vladimir Menkov.)