DIMACS TR: 2002-58

Methods of data fusion in information retrieval: rank vs. score combination



Authors: D. Frank Hsu, Jacob Shapiro and Isak Taksa

ABSTRACT

Combination of multiple evidences (multiple query formulations, multiple retrieval schemes or systems) has been shown (mostly experimentally) to be effective in data fusion in information retrieval. However, the question of why and how combination should be done still remains largely unanswered. In this paper, we provide a model for simulation and analysis in the study of data fusion in the information retrieval domain. A rank-score function is defined and the concept of a Cayley graph is used in the design and analysis of our framework. Our model and results have led to better understanding of the data fusion phenomena in the information retrieval domain. In particular, we have shown (analytically) and in simulation that combination using rank performs better than combination using score under certain conditions.

Paper Available at: ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/2002/2002-58.doc.gz
DIMACS Home Page