DIMACS Center, CoRE Building, Rutgers University

**Organizers:****David Ozonoff**, Boston University, School of Public Health, dozonoff@bu.edu**Melvin Janowitz**, Rutgers University, melj@dimacs.rutgers.edu**Fred Roberts**, Rutgers University, froberts@dimacs.rutgers.edu

Title: A Connection Between Formal Concept Analysis and Boolean Dissimilarities

A simple view of the input to Formal Concept Analysis defines it as a finite collection of binary attributes on a finite set P of objects, with |P|=p. View this as a binary matrix, where the (i,j) entry is 1 if |P|object i has attribute j. We associate with any p-element binary vector V the set of all objects which are 1 on V.

A Boolean dissimilarity is a seemingly different object. It is a mapping d from P\times P into the Boolean algebra B. It satisfies d(x,y)=d(y,x) when x ,y are distinct. If d(x,x) is properly defined, we will show how formal concept analysis may be viewed as a type of Boolean dissimilarity., and how this leads naturally to a more general type of formal concept analysis. Hopefully this generalization can be used to gain additional insight into the understanding of epidemiological data. The ideas will be illustrated with a concrete example.

Time permitting it will be shown how this circle of ideas leads to some interesting open problems in abstract lattice theory.

Title: Management of Quantified Semantic Taxonomies for Biothreat Response

Bio-ontologies are an increasingly important class of semantically enhanced database technologies necessary for the rapid integration of knowledge in the face of imminent threats such as the appearance of novel pathogens. In this talk, we discuss current work in applied order theory for the categorization of gene lists, annotation of novel protein function, and inter-operability of distinct bio-ontologies. We begin by introducing our overall context, which is the representation of bio-ontologies as semantic hierarchies; mathematically, as labeled, possibly weighted, directed acyclic graphs, and thereby as the finite partially ordered sets (posets) generated by them. After reviewing our prior work in the use of pseudo-distances among comparables poset nodes to build scoring algorithms for categorization, we introduce two new approaches for quantified posets: Markov process-based pseudo-distances and weighted normalized pseudo-distances. We conclude by defining the inter-operability task as the problem of inducing an appopriate sub-poset of the product of two constituent posets informed by an "anchoring" fuzzy relation on their nodes.

Title: Superquick tutorial on Epidemiology for mathematicians

What is epidemiology and what do epidemiologists worry about? Epidemiology is an observational science that differs from experimental sciences like physics and chemistry in that it cannot manipulate the independent variable. In that regard it is closer to natural history seismology or cosmology. This tutorial will be a very quick introduction to the concepts, preoccupations and methods of epidemiologists, tailored for mathematicians. We will give an overview of descriptive and analytic epidemiology and their concepts and language, such as incidence and prevalence, review how mathematics has been used in epidemiology, and try to encourage participants to think about ways discrete mathematics can be applied to epidemiological problems. In particular we will discuss how order theory and data mining techniques relate to epidemiological questions and, if time permits, preview some current work.

Title: Concept Lattices in Epidemiology

This talk will focus on our interpretation of the concept lattice in an epidemiological context, including its relationship to the generalized contingency table (introduced by Ozonoff et al), and its use in exploring epidemiological data, via a software demonstration. We will discuss various methods of determining subcontexts of a formal context and sublattices of the concept lattice, for example, the use of interval isomorphism theorems (in Formal Concept Analysis, Ganter and Wille, Springer 1999) in conjunction with a lattice diagram, and the application of various concept filtering algorithms and bigraph decomposition algorithms (Abello et al).

Title: Tutorial on the Theory of Measurement and its Applications

Measurement has something to do with numbers. But what kinds of properties do we expect of those numbers and what kinds of empirical relationships should those numbers reflect? And what kinds of statements involving those numbers are meaningful? Attempting to put the process of measurement on a firm mathematical foundation, mathematicians, psychologists, economists, and philosophers of science, building on the paradigm examples of physics, developed what is today called the "representational theory of measurement." The theory describes the process of measurement as the process of finding homomorphisms on ordered algebraic systems. This tutorial will describe the theory and give examples from a variety of disciplines, including physics, psychology, economics, operations research, and public policy. We will develop the theory of meaningful statements and give examples of statements about averages, algorithms, optimal solutions, and statistical tests that are meaningless. While there will be no examples from epidemiology, the goal will be to encourage participants to explore the relevance of this theory to epidemiological applications.

Title: Formal Concept Analysis with Galicia

Initially motivated by the need to make the lattice theory more intuitive and easier to teach, Formal Concept Analysis (FCA) has grown up into a first-class knowledge discovery paradigm with successful applications to the resolution of practical problems involving clustering, classification or association discovery. Yet there are only few tools available to support the FCA practitioner in his analysis task and even fewer for the developer of new FCA methods. The Galicia project is a first attempt at building a software platform that covers all the steps of an FCA-based data analysis task, from data preparation to lattice visualization. It has been designed as an open-source, open-ended tool, implemented in Java and freely distributed under the LGPL license.

In this presentation, we shall examine current research on FCA algorithms for data analysis/mining and then show how they reflect into the latest development work on the Galicia project.

In the first part, FCA will be introduced together with its basic constructs: the Galois connection associated to a binary relation R over a pair of sets (A, B), the corresponding pair of closure operators on 2^A and 2^B, and the implication relation between subsets. The way the concept lattice comes out of these will be described and two further analysis-related families of subsets will also be presented, i.e., the pseudo-closed and the minimal generators. The efficient computation of the three structures will be discussed as well.

In the second part, the Galicia platform will be presented. The currently available algorithmic tools within the platform, either for analysis or for visualization of the analysis results, will be covered. Focus will be put on the various evolution scenarios for input data and the way already available analysis results are maintained to reflect this evolution.

A demonstration of the platform will follow the presentation.

Program

Working Group Index

DIMACS Homepage

Contacting the Center

Document last modified on March 3, 2005.