The BLAST algorithm is
, where N is length of the
database in residues, and W is the number of neighborhood words in
the word list. According to the theoretical model, if two sequences
generate a similar number of neighborhood words, then they should have
similar search time. We expect sequences of similar length to
generate a similar number of words [1], so we expect
similar run time.
However, the results suggest that sequence content does affect the run time for BLASTX and BLASTP, partially rejecting Hypothesis I-A. Looking at the output of the runs, the sequence 13955 from the BLASTP runs took considerably more time than the others, but the number of neighborhood words was not significantly larger than the others. Closely examining the results, we see that run time is correlated with the number of word hits. However, some sequences have more word hits for the same number of neighborhood words, suggesting that the database is not uniform in content.
On the other hand, the length test plot in Figure
suggests that run time is linearly proportional to sequence length for
BLASTP and BLASTX, partially supporting Hypothesis I-B. However,
BLASTN appears to have a more complex relationship between run time
and sequence length. Doing a line fit on the data generated
coefficients of correlation of greater than 0.99 for all three
algorithms, though the linear relationship appears weak for BLASTN for
very short sequence lengths.
The views and opinions expressed in this page are strictly those of the page author.
The contents of this page have not been reviewed or approved by the University of Minnesota.