We conducted experiments on SMP machines to further our understanding of their suitability for genetic sequence similarity computation. For BLASTX, BLASTN, and BLASTP, the three major variations of the BLAST algorithm, experiments were conducted on three different SMP machines--SGI Challenge, Sun Sparc Center 2000, and Cray CS6400.
We found that the throughput performance was linearly scalable with the number of processors, with little performance degradation. With one BLAST process per processor throughput doubled when the number of processors used was doubled.
Using multiple processors on a single input sequence resulted in significant improvement in response time. For instance, a BLASTX process that took nearly 1.3 hours on a single processor was completed in only 3.5 minutes with 24 processors on the Cray CS6400. Serial components in the parallel algorithm were measured, and the speedup predicted theoretically using Amdahl's law compares very well with the measured speedup for BLASTN and BLASTP.
(a) Predicted number of processors that could be used effectively as database size grows
(b) Predicted number of processors needed to keep the response time fixed for a 500 base sequence
Figure: Predicted applicability of SMP machines to genome sequence analysis
Knowing the efficiency of SMPs for BLAST computation, we can predict
the number of processors that can be efficiently utilized as the
genetic sequence database grows. We know that genetic databases are
doubling in size approximately every 1.3 years, and that processor
speed is doubling roughly every 1.5 years. Since the database is
growing faster than processor speed, more processors will be needed to
keep the response time low over time. As figure
(b)
shows, 17 processors will soon be needed to keep the response time of
BLASTX for a 500 base sequence at 15 seconds.
Moreover, for a fixed processor efficiency, more processors can be
used to improve performance on the growing databases. As
figure
(a) shows, if we want to keep the processor
utilization level at 85% efficiency, then by the year 2005 we could
use up to 22 processors for running BLASTX on a 500 base sequence.
These predictions show that Shared-Memory Multiprocesseor architectures will offer significant performance benefits for genetic sequence similarity computation for years to come.