next up previous
Next: Introduction

Efficiency of Shared-Memory Multiprocessors for a Genetic Sequence Similarity Search Algorithm

Ed Huai-hsin Chi*, Elizabeth Shoop*, John Carlis*, Ernest Retzel**, John Riedl*

* Computer Science Department
University of Minnesota
4-192 EE/CSci Building
Minneapolis, MN 55455
echi@cs.umn.edu

** Computational Biology Centers
Medical School, University of Minnesota
Box 196, UMHC, 1460 Mayo Building
420 Delaware Street S.E.
Minneapolis, MN 55455

Abstract:

Molecular biologists who conduct large-scale genetic sequencing projects are producing an ever-increasing amount of sequence data. GenBank, the primary repository for DNA sequence data is doubling in size every 1.3 years. Keeping pace with the analysis of this data is a difficult task. One of the most successful techniques for analyzing genetic data is sequence similarity analysis---the comparison of unknown sequences against known sequences kept in databases. As biologists gather more sequence data, sequence similarity algorithms are more and more useful, but take longer and longer to run. BLAST is one of the most popular sequence similarity algorithms in use today, but its running time is proportional to the size of the database. Sequence similarity analysis using BLAST is becoming a bottleneck.

Shared-Memory Multiprocessors (SMPs) may offer performance that scales with the growth of the genetic databases. This paper analyzes the performance of BLAST on SMPs, to improve our theoretical and practical understanding of the scalability of the algorithm. We conducted experiments on throughput and response time of BLAST on three different SMPs---SGI Challenge, Sun Sparc Center 2000, and Cray CS6400. We experimentally determined the scalability of the three major variations of the BLAST algorithm, BLASTX, BLASTN, and BLASTP, and compared the experimental results to the theoretically predicted scalability. The results indicate that BLASTX and BLASTP scale well on SMPs. Moreover, achieving acceptable response times will require more processors as sequence databases increase in size, even considering the likely future improvements in processor speed.





Ed H. Chi
Wed May 1 17:13:37 CDT 1996