Mining Massive Text Streams
Here is a list of my recent papers that are relevant to
this project with PDF files and editorial notes.
This paper presents a sequential Monte Carlo algorithm for
fully Bayesian analysis of large-scale parametric models. One of the models
considered is a sparse Bayesian classifier. This may be quite relevant for our
project. Code is available, but
may need work.
This paper empirically explores a few issues
to do with sparse Bayesian classifiers. In particular, the paper has a comparison
of fully-Bayesian versus plug-in Bayesian prediction. This is relevant since
plug-in Bayesian prediction is computationally simpler than fully-Bayesian. The
paper cites some theoretical evidence showing the plug-in Bayes can sometimes
beat fully-Bayes (in terms of predictive loss) and the experiments bear this out.
The paper also experiments with kernels and different Bayesian priors.
Code is available, but may need work.
This paper explores variants of the Naive Bayes classifier
and clarifies some of the basic assumptions of the model. Our project will
probably eschew Naive Bayes models in favor of sparse Bayesian classifiers.