BLAST at NCBI

When running a BLAST search at the NCBI, the Workbench sends the sequences you select to the NCBI's BLAST servers. When the results are ready, they will be automatically downloaded and displayed in the Workbench. When you enter a large number of sequences for searching with BLAST, the Workbench automatically splits the sequences up into smaller subsets and sends one subset at the time to NCBI. This is to avoid exceeding any internal limits the NCBI places on the number of sequences that can be submitted to them for BLAST searching. The size of the subset created in the CLC software depends both on the number and size of the sequences.

To start a BLAST job to search your sequences against databases held at the NCBI, go to:

        Toolbox | BLAST (Image blastsearch)| BLAST at NCBI (Image blast_ncbi)

Alternatively, use the keyboard shortcut: Ctrl+Shift+B for Windows and Image command_key_web +Shift+B on Mac OS.

This opens the dialog seen in figure 23.2

Image NCBIBLASTsearchstep1
Figure 23.2: Choose one or more sequences to conduct a BLAST search with.

Select one or more sequences of the same type (either DNA or protein) and click Next.

In this dialog, you choose which type of BLAST search to conduct, and which database to search against (figure 23.3). The databases at the NCBI listed in the dropdown box will correspond to the query sequence type you have, DNA or protein, and the type of blast search you can chose among to run. A complete list of these databases can be found in BLAST databases. Here you can also read how to add additional databases available the NCBI to the list provided in the dropdown menu.

Image NCBIBLASTsearchstep2
Figure 23.3: Choose a BLAST Program and a database for the search.

BLAST programs for DNA query sequences:

BLAST programs for protein query sequences:

If you search against the Protein Data Bank protein database homologous sequences are found to the query sequence, these can be downloaded and opened with the 3D view.

Click Next.

This window, see figure 23.4, allows you to choose parameters to tune your BLAST search, to meet your requirements.

Image NCBIBLASTsearchstep3
Figure 23.4: Parameters that can be set before submitting a BLAST search.

When choosing blastx or tblastx to conduct a search, you get the option of selecting a translation table for the genetic code. The standard genetic code is set as default. This setting is particularly useful when working with organisms or organelles that have a genetic code different from the standard genetic code.

The following description of BLAST search parameters is based on information from http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml.

The parameters you choose will affect how long BLAST takes to run. A search of a small database, requesting only hits that meet stringent criteria will generally be quite quick. Searching large databases, or allowing for very remote matches, will of course take longer.

Click Finish to start the tool.

BLAST a partial sequence against NCBI

You can search a database using only a part of a sequence directly from the sequence view:

        select the sequence region to send to BLAST | right-click the selection | BLAST Selection Against NCBI (Image blast_ncbi)

This will go directly to the dialog shown in figure 23.3 and the rest of the options are the same as when performing a BLAST search with a full sequence.