SCILS/DIMACS Monitoring Message Streams Seminar Series

Title: Thresholding Support Vector Machines for Text Classification

Speaker: James Shanahan, Clairvoyance Corporation

Date: December 11, 2003 3:00pm

Location: DIMACS Center, CoRE Bldg, Room 433, Rutgers University, Busch Campus, Piscataway, NJ


Support vector machine (SVM) learning algorithms focus on finding the hyperplane that best separates positive and negative learning examples. By "best", we mean the hyperplane that maximizes the margin (the distance from the separating hyperplane to the nearest examples) since this criterion provides a good upper bound of the generalization error. When applied to the practical problem of text classification, commonly used learning algorithms produce SVMs with excellent precision but poor recall. In my talk, I will briefly review various relaxation approaches that have been proposed to counter this poor recall. Then, I will present two new threshold relaxation algorithms that I have developed recently which boost the performance of baseline SVMs by at least 20% for standard information retrieval measures.

I will also overview ongoing work in the area of opinion mining, and the "stunning cluster" hypothesis for automatic query expansion.

Bio for James Shanahan

Dr. James G. Shanahan is Senior Research Scientist at Clairvoyance Corporation where he heads the Filtering and Machine Learning Group. At Clairvoyance Corp, he is actively involved in developing cutting-edge information management systems that harness information retrieval, linguistics, text/data mining and machine learning. Prior to joining Clairvoyance, he was a research scientist at Xerox Research Center Europe (XRCE), Grenoble, France, where, as a member of the Co-ordination Technologies Group, he developed and patented new document-centric approaches to information access (known as Document Souls). Before joining Xerox, he completed his PhD in 1998 at the University of Bristol in probabilistic fuzzy approaches to machine learning. He has extensive industrial experience both at the AI group at Mitsubishi in Tokyo, Japan, and at the satellite-scheduling group of the Iridium project at Motorola, Phoenix, AZ (over 5 years).