MEDLINE/PubMed Journal Browser Search

Pubmed for Handhelds

PUBMED FOR HANDHELDS

Search MEDLINE/PubMed

Title: Hierarchical method to align large numbers of biological sequences.
Author: Taylor WR.
Journal: Methods Enzymol; 1990; 183():456-74. PubMed ID: 2156130.
Abstract:
The method presented here is intended as a compromise between finding a good overall alignment and the time taken to do so. Many multiple alignment algorithms spend an excessively large amount of effort trying to find the best global alignment. This time is often ill spent because the results of the standard dynamic programming alignment algorithm are dominated by the choice of gap penalty and the form of the score matrix, both of which have a poor theoretical foundation. Nonetheless, it is important that savings in time do not compromise the quality of the alignment. By using the consensus sequence approach, this danger is largely avoided as the conserved features of the sequences are quickly identified and preserved through further cycles. In the alignment of existing alignments, which is one of the more novel aspects of the method, each alignment was treated as an averaged consensus sequence with gaps making no contribution. This gives rise to the advantageous property that gaps will have a greater propensity to be inserted where there are already gaps and is equivalent to a local change in the gap penalty. This type of behavior represents a transition away from the homogeneous scoring schemes used in aligning two sequences toward a scoring scheme that depends on position in the sequence. The alignment of consensus sequences thus forms a bridge between simple pair alignment and the alignment of discrete patterns in which sequence features and allowed gap locations are exaggerated. To complete this transition the program described above has been integrated into the earlier pattern matching (template) program. Such templates can reliably locate sequence similarities that are too weak or scattered to be found by the more standard alignment methods and should therefore produce a further condensation of the sequence data bank. Only by continually extending our knowledge of the relationships between sequences to increasingly distant similarities can we hope to avoid being overwhelmed by the increasing amount of data.

[Abstract] [Full Text] [Related] [New Search]