CCICADA Student-organized Seminar Series

Title: Literature Search through Mixed-Membership Community Discovery

Speaker: Tina Eliassi-Rad, Lawrence Livermore National Laboratory

Date: Thursday, September 24, 2009; 12:00-1:00pm

Location: DIMACS Center, CoRE Bldg, Room 431, Rutgers University, Busch Campus, Piscataway, NJ


Given a research topic (e.g. reconstruction of the 1918 influenza virus) and a couple of seminal papers on that topic, how do we find authors who are conducting similar research? Traditional solutions to this problem include looking at the citations in the seminal papers and/or conducting Web searches on keywords associated with the chosen topic. Both of these commonly used solutions have biases that limit their effectiveness. For example, looking at the citations of a paper only provides a partial view of the domain (namely, the ones provided by the authors). Doing a Web search on keywords neglects the wealth of information embedded in social networks (such as co-authorship graphs).

In this work, we propose a new approach to the literature search problem that is based on finding mixed-membership communities on an augmented co-authorship (ACA) graph. We construct an ACA graph by fusing the information from a bipartite expertise-by-author graph into a co-authorship graph, which produces a denser and more structured version of the original co-authorship graph. For our mixed-membership community discovery algorithm, we utilize our Latent Dirichlet Allocation for Graphs (LDA-G). LDA-G is a scalable generative model that adapts the Latent Dirichlet Allocation (LDA) topic-modeling algorithm for use in graphs rather than text corpora. A simple post-analysis of LDA-G's communities provides a ranking of the most similar authors. In our experiments on PubMed data, LDA-G produces better solutions than when it is applied to regular co-authorship graphs or bipartite expertise-by-author graphs. In addition to our qualitative results, we provide quantitative results based on link prediction performance of LDA-G's posterior estimate. This work is joint with Keith Henderson at LLNL.