Create BLAST database metadata¶
ElasticBLAST includes the create-blastdb-metadata.py
script to generate a
file containing metadata
used by ElasticBLAST to configure the memory limit for each BLAST job.
These metadata files are provided and maintained by NCBI in both GCP and AWS buckets, but if you are working with your own BLAST database, you will benefit from creating the metadata file and uploading it to the cloud alongside the BLAST database files. This tutorial will show you how to do that.
Note:
If you create a BLAST database using makeblastdb
version 2.13
or newer, you do not need to use the create-blastdb-metadata.py
script. Just upload
the BLAST database files to the cloud bucket of your choice!
The code sample below assumes you are creating a nucleotide
BLAST database from a FASTA file called MY_FASTA_FILE.fsa
and uploading the
database to the AWS S3 bucket s3://mybucket/blastdb
.
# Create BLASTDB
makeblastdb -in MY_FASTA_FILE.fsa -dbtype nucl --title "My database title" --out my-database
# Upload BLASTDB
aws s3 cp my-database* s3://mybucket/blastdb/
If you do not have makeblastdb
version 2.13
or newer, please follow the instructions below.
The examples below assume that you have a nucleotide BLAST database called
ecoli
located in your computer’s /blast/db
directory and that you
will store said BLAST database in s3://mybucket/blastdb
. Please update
these values accordingly.
Create BLASTDB metadata file¶
Run the command below to create a BLASTDB metadata file for /blast/db/ecoli
.
create-blastdb-metadata.py --db /blast/db/ecoli --dbtype nucl --pretty
You can verify that the metadata file was generated as follows:
cat ecoli-nucl-metadata.json
The output will resemble what is shown below:
{
"dbname": "ecoli",
"version": "1.1",
"dbtype": "Nucleotide",
"description": "ecoli",
"number-of-letters": 4662239,
"number-of-sequences": 400,
"files": [
"ecoli.ndb",
"ecoli.nhr",
"ecoli.nin",
"ecoli.nnd",
"ecoli.nni",
"ecoli.nos",
"ecoli.not",
"ecoli.nsq",
"ecoli.ntf",
"ecoli.nto",
"ecoli.nog"
],
"last-updated": "2020-01-10",
"bytes-total": 1319541,
"bytes-to-cache": 1170705,
"number-of-volumes": 1
}
Please do not rename this file as ElasticBLAST expects that file name when searching for it in the cloud.
Upload BLASTDB and metadata file to the cloud¶
To upload your BLAST database and metadata file to AWS please run a command like the one below (again, please update the values accordingly):
aws s3 cp ecoli-nucl-metadata.json s3://mybucket/blastdb/
for f in /blast/db/ecoli.n* ; do aws s3 cp $f s3://mybucket/blastdb/; done
To upload your BLAST database and metadata file to GCP please run a command like the one below (again, please update the values accordingly):
gsutil cp ecoli-nucl-metadata.json gs://mybucket/blastdb/
gsutil cp /blast/db/ecoli.n* gs://mybucket/blastdb/
Getting online help¶
You can obtain the script’s online help by running the command below:
create-blastdb-metadata.py --help
usage: create-blastdb-metadata.py [-h] --db DBNAME --dbtype {prot,nucl} [--out FILENAME] [--output-prefix PATH] [--pretty] [--logfile LOGFILE] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [--version]
This program creates BLAST database metadata in JSON format.
required arguments:
--db DBNAME A BLAST database
--dbtype {prot,nucl} Database molecule type
optional arguments:
--out FILENAME Output file name. Default: ${db}-${dbtype}-metadata.json
--output-prefix PATH Path prefix for location of database files in metadata
--pretty Pretty-print JSON output
--logfile LOGFILE Default: create-blastdb-metadata.log
--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL}
--version show program's version number and exit