DIMACS TR: 2005-42
Simulated Entity Resolution by Diverse Means: DIMACS Work on the KDD Challenge of 2005
Authors: Andrei Anghelescu, Aynur Dayanik, Dmitriy Fradkin, Alex Genkin, Paul Kantor, David Lewis, David Madigan, Ilya Muchnik and Fred Roberts
ABSTRACT
This report describes DIMACS work on two of the groups of
entity resolution problems, ER1 and ER2 for the KDD Challenge
in 2005. We presume that the situation is intended to mimic,
using abstracts and author information from the life sciences,
some real world problem, in which it is important to recognize
the identity of an individual, even though he may share that
name with other individuals (ER1), or may actively seek to
hide his identity by removing his own name from a work, or
replacing it with an alias (ER2a, and ER2b,c). Thus specific
problems investigated include author resolution, finding a
missing author of a paper, and detecting a false author of a
paper. The methods used to attack these problems include
combinatorial cluster analysis, fusion of methods, penalized
logistic regression / maximum entropy approaches, and
dependency modeling.
Paper Available at:
ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/2005/2005-42.ps.gz
DIMACS Home Page