The performance of the SMP machines for throughput was striking.
Figure
shows linear improvement as we add more
processors, with almost no loss of efficiency. Adding more processors
than 24 is likely to result in still higher throughput without encountering
system bus limits. The measured results are close to optimum,
which rejects Hypothesis II-A for the throughput case. They are
explained by SMP architectures that provide sufficient memory
bandwidth for each processor to work at nearly peak efficiency even
when all other processors are also working at peak efficiency,
Figure: Throughput performance measured in Bases per second on Cray CS6400
Response time also decreases as more processors are added, but in this
case there is some loss of efficiency. Using traditional scalability
analysis, speedup is defined as
, where
is the one processor serial run time and
is the parallel run
time on multiple processors [9]. For the three different
architectures, the speedup curves for
BLASTN, BLASTP, and BLASTX are plotted in
Figure
.
(a) BLASTN
(b) BLASTP
(c) BLASTX
Figure: Speedup vs. # of processors for all three algorithms on
three SMP machines
The speedup curves indicate that BLASTX and BLASTP have fairly good response time speedups to at least 24 processors. Moreover, the three different SMP architectures obtained similar speedups. BLASTN, on the other hand, has significantly poorer speedup. What caused this difference in performance between the algorithms? Let us analyze the efficiency and the overhead of each algorithm.