DyDAn Homeland Security and DIMACS Computational and Mathematical Epidemiology Joint Seminar


Title: Link Mining: Current State of the Art

Speaker: Ronen Feldman, Bar-Ilan University

Date: February 19, 2007 12:00 - 1:30 pm

Location: DIMACS Center, CoRE Bldg, Room 431, Rutgers University, Busch Campus, Piscataway, NJ


Abstract:

The information age has made it easy to store large amounts of data. The proliferation of documents available on the Web, on corporate intranets, on news wires, and elsewhere is overwhelming. However, while the amount of data available to us is constantly increasing, our ability to absorb and process this information remains constant. Search engines only exacerbate the problem by making more and more documents available in a matter of a few key strokes. Link Mining is a new and exciting research area that tries to solve the information overload problem by using techniques from data mining, machine learning, Information Extraction, Text Categorization, Visualization and Knowledge Management. Link Mining is the process of building up networks of interconnected objects through various relationships in order to discover patterns and trends. The main tasks of Link Mining are to extract, discover, and link together sparse evidence from vast amounts of data sources, to represent and evaluate the significance of the related evidence, and to learn patterns to guide the extraction, discovery, and linkage of entities. The relationships could be transactional, geographical, social, or temporal. Link Mining involves the preprocessing of document collections (text categorization, term extraction, and information extraction), integration with structured information sources, the storage of the intermediate representations, the techniques to analyze these intermediate representations (distribution analysis, clustering, trend analysis, association rules, etc.) and visualization of the results. In this tutorial we will present the general theory of Link Mining and will demonstrate several systems that use these principles to enable interactive exploration of a combination of structured and unstructured collections. We will present a general architecture of Link Mining systems that operate on the web and will outline the algorithms and data structures behind the systems. The Tutorial will cover the state of the art in this rapidly growing area of research. Several real world applications of Link Mining will be presented.

see: DIMACS Computational and Mathematical Epidemiology Seminar Series 2006 - 2007 and the DyDAn - Center for Dynamic Data Analysis Home Page http://www.dydan.rutgers.edu/.