DIMACS TR: 2005-42

Simulated Entity Resolution by Diverse Means: DIMACS Work on the KDD Challenge of 2005

Authors: Andrei Anghelescu, Aynur Dayanik, Dmitriy Fradkin, Alex Genkin, Paul Kantor, David Lewis, David Madigan, Ilya Muchnik and Fred Roberts


This report describes DIMACS work on two of the groups of entity resolution problems, ER1 and ER2 for the KDD Challenge in 2005. We presume that the situation is intended to mimic, using abstracts and author information from the life sciences, some real world problem, in which it is important to recognize the identity of an individual, even though he may share that name with other individuals (ER1), or may actively seek to hide his identity by removing his own name from a work, or replacing it with an alias (ER2a, and ER2b,c). Thus specific problems investigated include author resolution, finding a missing author of a paper, and detecting a false author of a paper. The methods used to attack these problems include combinatorial cluster analysis, fusion of methods, penalized logistic regression / maximum entropy approaches, and dependency modeling.

Paper Available at: ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/2005/2005-42.ps.gz
DIMACS Home Page