next up previous
Next: Experiment 2: Throughput Up: Experiment 1: Content Previous: Results

Analysis

The BLAST algorithm is , where N is length of the database in residues, and W is the number of neighborhood words in the word list. According to the theoretical model, if two sequences generate a similar number of neighborhood words, then they should have similar search time. We expect sequences of similar length to generate a similar number of words [1], so we expect similar run time.

However, the results suggest that sequence content does affect the run time for BLASTX and BLASTP, partially rejecting Hypothesis I-A. Looking at the output of the runs, the sequence 13955 from the BLASTP runs took considerably more time than the others, but the number of neighborhood words was not significantly larger than the others. Closely examining the results, we see that run time is correlated with the number of word hits. However, some sequences have more word hits for the same number of neighborhood words, suggesting that the database is not uniform in content.

On the other hand, the length test plot in Figure gif suggests that run time is linearly proportional to sequence length for BLASTP and BLASTX, partially supporting Hypothesis I-B. However, BLASTN appears to have a more complex relationship between run time and sequence length. Doing a line fit on the data generated coefficients of correlation of greater than 0.99 for all three algorithms, though the linear relationship appears weak for BLASTN for very short sequence lengths.



Ed H. Chi
Wed May 1 17:13:37 CDT 1996

The views and opinions expressed in this page are strictly those of the page author.
The contents of this page have not been reviewed or approved by the University of Minnesota.