2023 BLAST NEWS¶
Tue, 28 Nov 2023¶
BLAST+ 2.15.0 is here!¶
We have included two exciting new features in the latest BLAST+ release
One will run searches faster for you. The other allows you to limit your search easily by organism.
Let’s talk about how this version of BLAST runs faster in some cases. If you run BLAST with multiple threads (more than one CPU), there are two ways that BLAST can divide the work among the threads. Which method works better depends (among other factors) upon the size of the database, the blast program, and the number of queries. Picking the appropriate threading model can speed up a search of a small database(e.g., Swiss-Prot, 364MB) and a lot of queries by a factor of 2 to 10 without affecting your results, which is what this change does. Read more about this feature and the two BLAST threading models here.
The second feature significantly simplifies limiting searches to a non-leaf taxonomic node (e.g., bacteria). To limit a search by taxonomy, use a taxID (the number that specifies a taxonomic node) for your search. Read more about this feature here.
The exciting part is that these two features can be used together to deliver more targeted and faster results: when you limit your search taxonomically, you are effectively searching a smaller database. BLAST figures this out, and if your new database size is small enough (often the case with taxonomic limits), switching threading methods means BLAST can work much faster. It also limits the results to what you asked for.
Download BLAST+ 2.15.0
Check out all the BLAST release notes.
Questions or comments? Please write the BLAST help desk.
Thu, 24 Aug 2023¶
ClusteredNR database on BLAST+¶
The ClusteredNR database is now available for BLAST+
Accessing cluster information from the experimental ClusteredNR database for BLAST+¶
This document shows how to retrieve cluster information from a local copy of the ClusteredNR database on your computer/machine.
The partial results below are from a protein BLAST search against the ClusteredNR database with BLAST+. The accessions in the second column are matches to your query. These accessions correspond to the representative sequences for each cluster and they serve as the identifiers for each cluster. You can use these representative accessions to retrieve cluster information using the scripts described below, which are provided with the ClusteredNR database.
Protein BLAST results for the ClusteredNR database on BLAST+
$ blastp -db nr_cluster_seq -query query.faa -num_threads 8 -outfmt 6 | head -3
XP_013375972.1 XP_013375972.1 100.000 651 0 0 1 651 1 651 0.0 1354
XP_013375972.1 XP_010640962.1 89.555 651 66 2 1 651 1 649 0.0 1188
XP_013375972.1 KFO25476.1 88.786 651 71 2 1 651 113 761 0.0 1180
Included scripts¶
get-cluster-members.sh
- lists all member accessions for a cluster containing the provided representative accession.
count-cluster-members.sh
- returns the size of the cluster for a provided representative sequence.
get-cluster-representatives-for-taxid.sh
- lists the representative accessions from a given taxonomy identifier (taxID).
get-cluster-repr-for-accession.sh
- returns the representative accession for the cluster containing the given accession.
These scripts allow you to retrieve all member accessions of a cluster based on the representative accession (i.e., the cluster identifier). They also allow you to retrieve the representative accession with any member accession. You can also retrieve extra fields such as the NCBI taxonomy ID (taxID), an integer identifying a specific taxonomic node, or the title for each member accession in a cluster.
The scripts use an SQLite3 database that must be present in the same directory. You should use the accession.version (e.g., XP_013375972.1) and not just the accession for these scripts to work correctly.
Read the NCBI blog post on the new ClusteredNR database to learn about the value and basics of clustering.
Prerequisites¶
Usage examples¶
Note: Invoke any of the scripts with the -h
option to see their usage instructions
Retrieving cluster members, taxonomy information, and sequence titles
./get-cluster-members.sh -a XP_013375972.1 -T -t
member_accession member_taxid member_title
-------------------- --------------- --------------------------------------------------------------------------------
XP_013375972.1 34839 PREDICTED: rab proteins geranylgeranyltransferase component A 2 [Chinchilla lani
XP_005005664.1 10141 rab proteins geranylgeranyltransferase component A 2 [Cavia porcellus]
XP_012369165.1 10160 rab proteins geranylgeranyltransferase component A 2 [Octodon degus]
XP_021121136.1 10181 rab proteins geranylgeranyltransferase component A 2 [Heterocephalus glaber]
Getting the number of members for a cluster
$ ./count-cluster-members.sh -a XP_013375972.1
4
Retrieving all representative accessions for an NCBI taxonomy ID (taxID)
Note: this includes all taxIDs beyond the chosen node. For example, a taxID for a genus will include all representatives for the species and subspecies.
$ ./get-cluster-representatives-for-taxid.sh -t 10141
representative
--------------------
XP_012997687.1
XP_013013097.1
XP_012999071.1
XP_005005726.1
XP_013008508.2
XP_013005496.1
XP_003461046.1
XP_012997197.2
XP_003465871.2
...
Listing the cluster representative accession and the protein title for any protein accession
$ ./get-cluster-repr-for-accession.sh -a XP_021121136.1 -T
representative title
-------------------- ----------------------------------------------------------------------------
XP_013375972.1 rab proteins geranylgeranyltransferase component A 2 [Heterocephalus glaber]
Questions or comments? Please tell us what you think. Write the BLAST help desk.
Tue, 22 Aug 2023¶
Try BLAST+ 2.14.1 today!¶
We added the cleanup-blastdb-volumes.py
script to remove unused BLAST database volumes. Read the documentation here.
We also switched the protocol from ftp
to https
to access BLAST databases for increased performance and reliability when downloading data from the NCBI with the update_blastdb.pl
script.
And we fixed a few bugs related to downloading data from the NCBI, and mt_mode
crashing blastn
and blastx
.
Check out the release notes.
Download BLAST+ 2.14.1
Questions or comments? Please write the BLAST help desk.
Thu, 22 Jun 2023¶
BLAST Quick Start guides!¶
Need some help getting started with BLAST?
Use the BLAST quick start guides to learn how to perform a BLAST search and understand your results.
These quick start guides for the search and result pages show you the minimal steps needed to perform a BLAST search and how to navigate your search results.
Take a little time to check out the BLAST search page guide and the result page guide.
Questions or comments? Please write the BLAST help desk.
Fri, 28 Apr 2023¶
BLAST+ 2.14.0 is here!¶
BLASTP, BLASTX, and TBLASTN are faster than before.
We have made BLAST searches faster for proteins and translated DNA(BLASTP, BLASTX, and TBLASTN) faster by improving support for initial long words. This improvement helps us speed up the fast modes (e.g., using -task blastp-fast on the command-line).
In one example, a search of phage reads(ERR7959948) against swissprot using “-task blastx-fast” was four times faster than the default search (4 hours vs. 16 hours). This query(_ERR7959948) has 2.2 million reads and 324.4 million bases.
We have also fixed a number of bugs and added some other improvements.
Check out the release notes.
Download BLAST+ 2.14.0.
Mon, 24 Apr 2023¶
Faster BLASTP and BLASTX searches on the web!¶
BLASTP and BLASTX performance has been improved on the web.
Improvements¶
We have improved the BLASTP (protein-protein) and BLASTX (DNA-protein) searches with better support for longer word-sizes. With this change, searches against nr run about 20% faster with faster speed improvements for smaller databases like ClusteredNR, UniProtKB/Swiss-Prot, or PDB. This new version of BLAST produces equivalent results, with the overwhelming majority of searches returning the same results as the previous version of BLASTP and BLASTX.
Questions or comments? Please write the BLAST help desk.
Checkout BLASTP and BLASTX on the BLAST web service.
Tue, 21 Mar 2023¶
IgBLAST 1.21.0 is now available!¶
The improvements in this latest version are available to both the command line and web IgBLAST users.
Improvements¶
Added gaps to all _alignments_aa fields (such as sequence_alignment_aa) to reflect gaps in nucleotide sequence alignment.
Added the new AIRR format field: sequence_aa. This is the direct translation (no gaps) of a nucleotide sequence using the reading frame determined by the nucleotide alignment to its closest germline V gene.
Added the new AIRR format field: d_frame. This is the D gene frame that is in-frame with the J gene coding frame. IgBLAST offers built-in IGHD gene frame support for mouse as defined by Ichihara Y et al (European Journal of Immunology Volume 19, Issue 10 p. 1849-1854). Users can use their own custom D gene definition with IgBLAST depending on their needs.
Download IgBLAST here https://ftp.ncbi.nlm.nih.gov/blast/executables/igblast/release/LATEST.
Checkout the IgBLAST GitHub page at https://ncbi.github.io/igblast/.
Mon, 09 Jan 2023¶
ElasticBLAST 1.0.0 is Now available!¶
ElasticBLAST version 1.0.0 has support for faster cheaper disks at AWS and better supports Kubernetes on GCP!
ElasticBLAST versions prior to version 1.0.0 will stop working on GCP after January 31, 2023.
This is because older versions of ElasticBLAST rely on version 1.21 of kubernetes, which will reach its end of life on the Google Kubernetes Engine on that date. Please upgrade your installation of ElasticBLAST to its latest version.
Improvements¶
ElasticBLAST on AWS now defaults to the faster and cheaper EBS gp3 disk type.
ElasticBLAST on GCP now supports all of the versions of kubernetes offered by Google Kubernetes Engine.
ElasticBLAST on GCP defaults to the stable version of kubernetes offered by Google Kubernetes Engine.
Bug fixes¶
ElasticBLAST uses GCP’s recommended way of dealing with read/write persistent disk.
Long user names no longer cause errors in AWS.
Fixed error caused by APIs not being enabled in GCP.
Please checkout this bioRxiv paper: ElasticBLAST: Accelerating Sequence Search via Cloud Computing.
Our ElasticBLAST GitHub page: https://github.com/ncbi/elastic-blast.