The overhead of a parallel system is defined as , which is the parallel cost minus the serial cost. Overhead characterizes the scalability of the parallel system . It is directly related to the rate at which the problem size must increase in order for the system to maintain cost-optimality.
We plotted the overhead of the three algorithms in Figure . BLASTN and BLASTP's overhead increased slowly and steadily. BLASTX's overhead increased much more dramatically.
Figure: Overhead vs. # of processors on SGI Challenge for all three algorithms
The common sources of overhead in a parallel system are interprocessor communication, contention for memory and disk, load imbalance, and serial components of the algorithm . Understanding the parallel algorithm enables us to analyze the different components that contribute to overhead. The parallel BLAST algorithm is shown in Figure . We measured various overheads of the parallel system as we increased the number of processors to better understand how various components of the parallel algorithm affect the speedup:
We measured the effect of semaphore calls on parallel run time. We removed the search code from the algorithm, and ran it with just the semaphore calls. The total amount of time used was a small fraction of the original run time, suggesting that semaphore calls are not a significant contributor to overhead.
We tested whether load-balancing was a problem for BLASTX, since each of the task chunks represents much more work than in the BLASTN or BLASTP case. The unevenness of these task chunks could be a source of load imbalance. To test this, we compiled different SMP versions of BLASTX and increased the number of task chunks each time (100, 200, 500, 1000, 2000, 10000, 100000). We then ran the tests from 4 processors to 8 processors for each of these versions. The results indicate that run times are similar even when the number of task chunks are increased. Therefore, load imbalance does not seem to hinder the speedup of BLASTX. This was surprising because we expected that a static choice of 500 task chunks will be problematic for load-balancing.
We measured the serial components of the parallel BLAST algorithm. For BLASTN and BLASTP, the serial parts of the algorithm, including building the word list, collating and sorting the hits, and printing the search results, contributed nearly 50% of the overhead. The mutual exclusion semaphore calls account for the rest of the overhead. Using Amdahl's Law and measuring the serial components of the BLASTN and BLASTP, the theoretical model closely predicts the speedup of the algorithms, supporting Hypothesis II-B. The resulting plot is shown in Figure . Thus, we are able to explain why BLASTN and BLASTP behave the way they do on SMP machines.
Figure: Speedup predictions made based on serial components in the parallel algorithm
For BLASTX, however, the serial components of the algorithm only explain 20% of the measured overhead. The remainder of the overhead we have not yet identified, so the following extrapolations for BLASTX are based on curve fits to the experimental data, rather than on a complete performance model.
The views and opinions expressed in this page are strictly those of the page author.
The contents of this page have not been reviewed or approved by the University of Minnesota.