MEDLINE/PubMed Journal Browser Search

Pubmed for Handhelds

PUBMED FOR HANDHELDS

Search MEDLINE/PubMed

Title: Clustering of database sequences for fast homology search using upper bounds on alignment score.
Author: Itoh M, Akutsu T, Kanehisa M.
Journal: Genome Inform; 2004; 15(1):93-104. PubMed ID: 15712113.
Abstract:
Homology data are among the most important information used to predict the functions of unknown proteins and thus fast and accurate methods are needed. In this paper, we propose a new approach for fast and accurate homology search using pre-computed all-against-all similarity scores in a target database. We previously developed a method for derivation of an upper bound of the Smith-Waterman score (SW-score) between a query and a homolog candidate sequence using the SW-score between the candidate and a sequence similar to the query. In this paper, by using this upper bound, we first cluster the sequences in the target database so that upper bounds of SW-scores for all the members in the clusters are less than a given value and select representative sequences for respective clusters. Then, the query sequence is searched against the representative sequences and the upper bounds of SW-scores for respective clusters are estimated. Only if the upper bound is higher than a given threshold, SW-alignments are computed for all the sequences in the cluster. We performed computational experiments to test efficiency of the proposed method for the KEGG/GENES database using the KEGG/SSDB. The results suggest that our method is efficient for redundant databases that include multiple closely related species.

[Abstract] [Full Text] [Related] [New Search]