# DIMACS Seminar on Math and CS in Biology

## Title:

An Algorithm for Finding Maximal Common Subtopologies in Protein Structures

## Speaker:

- Ina Koch
- Institute for Algorithms and Scientific Computing, SCAI
- GMD - German National Research Center for Information Technology
- D-53754 Sankt Augustin, Germany

## Place:

- Computer Science Building, Room 402
- Princeton University

## Time:

- 11:00 AM
- Tuesday, March 26, 1996

## Abstract:

For many applications, the determination of structural similarities is an
essential part of analyzing chemical compounds and biochemical structures. In
contrast to small chemical compounds, protein structures are more complex such
that they are often described on different structural levels. The secondary
and supersecondary structure level is of special interest, because structural
motifs for spatially adjacent secondary structure elements are often
responsible for the function of the protein without constituting already a
domain structure itself.
We describe a method which enables us to search efficiently for maximal common
substructures in a set of protein structures. In order to obtain a unique
description of the secondary structure of a protein we model it as an
undirected labeled graph such that the vertices represent helices or strands
and the edges describe spatial adjacencies between the vertices. Connected
components in these graphs are called "topology graphs".

In order to search for maximal common subgraphs in more than two topology
graphs we must first compute all maximal common subgraphs in all graph
pairs. We express all possibilities of compatible edge pairs in the considered
topology graph pair in one graph [1], the so-called "product graph". Then, a
maximal common substructure in both topology graphs corresponds to a maximal
clique in the product graph.

For solving the NP-hard maximum-clique problem we enumerate all maximal cliques
in product graphs. Our method is based on the algorithm of Bron & Kerbosch [2].
The efficiency of our technique is based on two optimizations. First, we
recognize and eliminate equal search spaces. Second, we cut the search space
and reduce the number of maximal cliques drastically by building up only those
maximal cliques, which correspond exclusively to connected subgraphs in the
participating topology graphs.

For large topology graphs, for instance with 32 and 55 vertices, respectively,
and 36 and 74 edges, respectively, the product graph has 316 vertices and 40109
edges. We can enumerate the resulting 323 cliques for this example in a few
seconds, whereas the original algorithm needs more than four days.

Searching for maximal connected subgraphs in more than two topology graphs we
have to compute all maximal common subgraphs in all graph pairs. We store the
appropriate edge sets representing maximal common substructures for every
topology graph pair in ordered binary trees. By intersection of these edge sets
we obtain the maximal common substructure in a set containing more than two
topology graphs.

- [1] Levi,G. 1972.Calcolo 9, 341--352.
- [2] Bron,C. and Kerbosch,J. 1973.Commun.ACM 16, 575--577.

Document last modified on March 19, 1996