The expect value(E-value) can be changed in order to limit
the number of hits to the most significant ones. The lower the
E-value, the better the hit. The E-value is dependent on the length
of the query sequence and the size of the database. For example, an
alignment obtaining an E-value of 0.05 means that there is a 5 in
100 chance of occurring by chance alone.
E-values are very dependent on the query sequence length and the
database size. Short identical sequence may have a high E-value and
may be regarded as "false positive" hits. This is often seen if one
searches for short primer regions, small domain regions etc. The
default threshold for the E-value on the BLAST web page is 10.
Increasing this value will most likely generate more hits. Below are
some rules of thumb which can be used as a guide but should be
considered with common sense.
- E-value < 10e-100 Identical sequences. You will get long alignments across the entire query and hit sequence.
- 10e-100 < E-value < 10e-50 Almost identical sequences. A long stretch of the query protein is matched to the database.
- 10e-50 < E-value < 10e-10 Closely related sequences, could be a domain match or similar.
- 10e-10 < E-value < 1 Could be a true homologue but it is a gray area.
- E-value > 1 Proteins are most likely not related
- E-value > 10 Hits are most likely junk unless the query sequence is very short.