DIMACS TR: 2005-35

Machine Learning Methods in the Analysis of Lung Cancer Survival Data

Authors: Dmitriy Fradkin, Dona Schneider and Ilya Muchnik


Support Vector Machines (SVM) and penalized logistic regression are well known to the machine learning community but are yet to be actively used in an epidemiological application. We apply them to the task of constructing a predictive model for the survival of patients diagnosed with lung cancer and analyzing the importance of features based on model parameters. The methods produce distinct and complementary models, making it advantageous to consider both whenever possible to gain different perspectives into large datasets. After applying the methods to Surveilance, Epidemiology and End Results (SEER) data, we also compute several measures of feature importance in the final models, showing these measures to be strongly correlated.

Paper Available at: ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/2005/2005-35.ps.gz
DIMACS Home Page