DIMACS TR: 2005-35
Machine Learning Methods in the Analysis of Lung Cancer Survival Data
Authors: Dmitriy Fradkin, Dona Schneider and Ilya Muchnik
Support Vector Machines (SVM) and penalized logistic regression are well
known to the machine learning community but are yet to be actively used in
an epidemiological application. We apply them to the task of constructing
a predictive model for the survival of patients diagnosed with lung cancer
and analyzing the importance of features based on model parameters.
The methods produce distinct and complementary models, making it
advantageous to consider both whenever possible to gain different
perspectives into large datasets. After applying the methods to
Surveilance, Epidemiology and End Results (SEER) data, we also compute
several measures of feature importance in the final models, showing these
measures to be strongly correlated.
Paper Available at:
DIMACS Home Page