Create BLAST database metadata¶
ElasticBLAST includes the create-blastdb-metadata.py
script to generate a
file containing metadata used by ElasticBLAST to configure the memory limit for
each BLAST job.
These metadata files are provided and maintained by NCBI in both GCP and AWS buckets, but if you are working with your own BLAST database, you will benefit from creating the metadata file and uploading it to the cloud alongside the BLAST database files. This tutorial will show you how to do that.
The example below assumes that you have a nucleotide BLAST database called
ecoli
located in your computer’s /blast/db
directory and that you
will store said BLAST database in s3://mybucket/blastdb
. Please update
these values accordingly.
Create BLASTDB metadata file¶
Run the command below to create a BLASTDB metadata file for /blast/db/ecoli
to be uploaded to s3://mybucket/blastdb
.
create-blastdb-metadata.py --db /blast/db/ecoli --dbtype nucl --pretty --output-prefix s3://mybucket/blastdb
You can verify that the metadata file was generated as follows:
cat ecoli-nucl-metadata.json
The output will resemble what is shown below:
{
"dbname": "ecoli",
"version": "1.1",
"dbtype": "Nucleotide",
"description": "ecoli",
"number-of-letters": 4662239,
"number-of-sequences": 400,
"files": [
"s3://mybucket-name/blastdbs/ecoli.ndb",
"s3://mybucket-name/blastdbs/ecoli.nhr",
"s3://mybucket-name/blastdbs/ecoli.nin",
"s3://mybucket-name/blastdbs/ecoli.nnd",
"s3://mybucket-name/blastdbs/ecoli.nni",
"s3://mybucket-name/blastdbs/ecoli.nos",
"s3://mybucket-name/blastdbs/ecoli.not",
"s3://mybucket-name/blastdbs/ecoli.nsq",
"s3://mybucket-name/blastdbs/ecoli.ntf",
"s3://mybucket-name/blastdbs/ecoli.nto",
"s3://mybucket-name/blastdbs/ecoli.nog"
],
"last-updated": "2020-01-10",
"bytes-total": 1319541,
"bytes-to-cache": 1170705,
"number-of-volumes": 1
}
Please do not rename this file as ElasticBLAST expects that file name when searching for it in the cloud.
Upload BLASTDB and metadata file to the cloud¶
To upload your BLAST database and metadata file to AWS please run a command like the one below (again, please update the values accordingly):
aws s3 cp ecoli-nucl-metadata.json s3://mybucket/blastdb/
for f in /blast/db/ecoli.n* ; do aws s3 cp $f s3://elasticblast-camacho/blastdb/; done
To upload your BLAST database and metadata file to GCP please run a command like the one below (again, please update the values accordingly):
gsutil cp ecoli-nucl-metadata.json gs://mybucket/blastdb/
gsutil cp /blast/db/ecoli.n* gs://mybucket/blastdb/
Getting online help¶
You can obtain the script’s online help by running the command below:
create-blastdb-metadata.py --help
usage: create-blastdb-metadata.py [-h] --db DBNAME --dbtype {prot,nucl} [--out FILENAME] [--output-prefix PATH] [--pretty] [--logfile LOGFILE] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [--version]
This program creates BLAST database metadata in JSON format.
required arguments:
--db DBNAME A BLAST database
--dbtype {prot,nucl} Database molecule type
optional arguments:
--out FILENAME Output file name. Default: ${db}-${dbtype}-metadata.json
--output-prefix PATH Path prefix for location of database files in metadata
--pretty Pretty-print JSON output
--logfile LOGFILE Default: create-blastdb-metadata.log
--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL}
--version show program's version number and exit