To formulate the research questions, we need to identify the factors
that could affect the run time of the parallel BLAST algorithm. So we
need to understand the parallel BLAST algorithm. We use the the
parallel SMP implementation of BLAST provided by the National Center
for Biotechnology Information [5]. A high-level sketch of
the parallel algorithm is shown in Figure
. The BLAST
algorithm is done in three stages:
Algorithm BLAST
BEGIN
//stage 1: compile a list of high-scoring words.
set time mark T0;
Parse command line arguments;
set time mark T1;
Build word list and DFA;
set time mark T2;
//stage 2: search on database using word list
Create a new task;
Fork threads to run on different processors {
Sync processors;
Init task into 500 chunks;
Allocate chucks on demand to different processors;
Each processor do search and extend on the
sequences in the task chunks allocated;
}
//stage 3: collate, sort, and print hits
Set time mark ED3;
Collate hits;
Sort hits;
set time mark T4;
Print search reports;
set time mark ED5;
END
Figure: The parallel BLAST algorithm on SMP machines
In the first stage, the BLAST algorithm compiles a list of words based on the input sequence. Each word is a short segment of the input sequence that are likely to have some statistical significance. This stage is done serially. In the second stage, the parallel BLAST algorithm scans the database for hits by dividing the database into 500 task chunks, and allocating these task chunks on demand to the processors. Each processor looks for hits by matching the word list against the sequences in its task chunk, and trys to extend each of these hits to longer matches. When a processor finishes its chunk, the master processor allocates another task chunk to it. In the third stage, the master processor collates, sorts, and outputs the hits.
Looking at the algorithm, the possible factors that could affect the run time are the sequence itself, the amount of resource contention in the second stage, and the amount of serial component in the algorithm. To direct our experiments, we formulated research questions and hypotheses based on these factors.

In different sequence similarity analysis scenarios, many different sequences are analyzed. These sequences differ substantially in their length, ranging from a few hundred to a few thousand bases. They also differ in content, which is our term for differences in the patterns that occur in different sequences. Content changes between organisms as well as between genes within one organism. The BLAST algorithm is theoretically quite sensitive to sequence length, and may be sensitive to content as well. It would be prohibitive to exhaustively test every length and type of sequence. Instead we seek to measure the effects of sequence length and content on BLAST performance. Knowing this relationship will help us understand the range of sequence analysis scenarios to which our results apply.

The more sensitive BLAST performance is to sequence content, the more sequences we must use in our experiments to ensure the results generalize.

Understanding the effect of sequence length on BLAST performance will allow us to extrapolate our experimental results to other sequence lengths. Theoretical analysis suggests a linear relationship, which we will test experimentally.

Since BLAST performance is a bottleneck in sequence analysis, we would like to speed BLAST processing. One approach is to use multiple processors. The BLAST algorithm is memory and I/O intensive, suggesting that shared-memory multiprocessors may be most effective. However, scalability on shared-memory multiprocessors may be limited by bus bandwidth, serial components of the blast algorithm, memory contention, or other factors. Understanding the scalability of the BLAST algorithm will improve our understanding of the practicality of SMP machines for genetic sequence similarity computation.

To maximize throughput, we run a separate BLAST process on each processor. To minimize response time, the SMP version of BLAST uses multiple processors to scan different portions of the database. In each case, shared memory is used to store the entire database. As processors scan the database in parallel, memory use will be heavy. Memory contention and bus bandwidth may limit performance as the number of processors increases.

Of these three stages in the BLAST algorithm, only stage two is parallelized in the SMP version. Therefore, the speedup obtained by using SMP machines will be limited by Amdahl's law by the preprocessing and postprocessing that are done serially.
The views and opinions expressed in this page are strictly those of the page author.
The contents of this page have not been reviewed or approved by the University of Minnesota.