## Abstract

In analysis, comparison and classification of conformations of proteins, a common computational task involves extractions of similar substructures. Structural comparisons are usually based on either of two measures of similarity: the root-mean-square (r.m.s.) deviation upon optimal superposition, or the maximal element of the difference distance matrix. The analysis presented here clarifies the relationships between different measures of structural similarity, and can provide a basis for developing algorithms and software to extract all maximal common well-fitting substructures from proteins. Given atomic coordinates of two proteins, many methods have been described for extracting some substantial (if not probably maximal) common substructure with low r.m.s. deviation. This is a relatively easy task compared with the problem addressed here, i.e., that of finding all common substructures with r.m.s. deviation less than a prespecified threshold. The combinatorial problems associated with similar subset extraction are more tractable if expressed in terms of the maximal element of the difference distance matrix than in terms of the r.m.s. deviation. However, it has been difficult to correlate these alternative measures of structural similarity. The purpose of this article is to make this connection. We first introduce a third measure of structural similarity: the maximum distance between corresponding pairs of points after superposition to minimize this value. This corresponds to fitting in the Chebyshev norm. Properties of Chebyshev superposition are derived. We describe relationships between the r.m.s, and minimax (Chebyshev) deviations upon optimal superposition, and between the Chebyshev deviation and the maximal element of the difference distance matrix. Combining these produces a relationship between the r.m.s. deviation upon optimal superposition and the maximal element of the difference distance matrix. Based on these results, we can apply algorithms and software for finding subsets of the difference distance matrix for which all elements are less than a specified bound, either to select only subsets for which the r.m.s. deviation is less than or equal to a specified threshold, or to select subsets that include all subsets for which the r.m.s. deviation is less than or equal to a threshold.

Original language | English (US) |
---|---|

Pages (from-to) | 320-328 |

Number of pages | 9 |

Journal | Proteins: Structure, Function and Genetics |

Volume | 33 |

Issue number | 3 |

DOIs | |

State | Published - Nov 15 1998 |

## All Science Journal Classification (ASJC) codes

- Structural Biology
- Biochemistry
- Molecular Biology