Create BLAST database metadata

ElasticBLAST includes the create-blastdb-metadata.py script to generate a file containing metadata used by ElasticBLAST to configure the memory limit for each BLAST job.

These metadata files are provided and maintained by NCBI in both GCP and AWS buckets, but if you are working with your own BLAST database, you will benefit from creating the metadata file and uploading it to the cloud alongside the BLAST database files. This tutorial will show you how to do that.

The example below assumes that you have a nucleotide BLAST database called ecoli located in your computer’s /blast/db directory and that you will store said BLAST database in s3://mybucket/blastdb. Please update these values accordingly.

Create BLASTDB metadata file

Run the command below to create a BLASTDB metadata file for /blast/db/ecoli to be uploaded to s3://mybucket/blastdb.

create-blastdb-metadata.py --db /blast/db/ecoli --dbtype nucl --pretty  --output-prefix s3://mybucket/blastdb

You can verify that the metadata file was generated as follows:

cat ecoli-nucl-metadata.json

The output will resemble what is shown below:

{
  "dbname": "ecoli",
  "version": "1.1",
  "dbtype": "Nucleotide",
  "description": "ecoli",
  "number-of-letters": 4662239,
  "number-of-sequences": 400,
  "files": [
    "s3://mybucket-name/blastdbs/ecoli.ndb",
    "s3://mybucket-name/blastdbs/ecoli.nhr",
    "s3://mybucket-name/blastdbs/ecoli.nin",
    "s3://mybucket-name/blastdbs/ecoli.nnd",
    "s3://mybucket-name/blastdbs/ecoli.nni",
    "s3://mybucket-name/blastdbs/ecoli.nos",
    "s3://mybucket-name/blastdbs/ecoli.not",
    "s3://mybucket-name/blastdbs/ecoli.nsq",
    "s3://mybucket-name/blastdbs/ecoli.ntf",
    "s3://mybucket-name/blastdbs/ecoli.nto",
    "s3://mybucket-name/blastdbs/ecoli.nog"
  ],
  "last-updated": "2020-01-10",
  "bytes-total": 1319541,
  "bytes-to-cache": 1170705,
  "number-of-volumes": 1
}

Please do not rename this file as ElasticBLAST expects that file name when searching for it in the cloud.

Upload BLASTDB and metadata file to the cloud

To upload your BLAST database and metadata file to AWS please run a command like the one below (again, please update the values accordingly):

aws s3 cp ecoli-nucl-metadata.json s3://mybucket/blastdb/
for f in /blast/db/ecoli.n* ; do aws s3 cp $f s3://elasticblast-camacho/blastdb/; done

To upload your BLAST database and metadata file to GCP please run a command like the one below (again, please update the values accordingly):

gsutil cp ecoli-nucl-metadata.json gs://mybucket/blastdb/
gsutil cp /blast/db/ecoli.n* gs://mybucket/blastdb/

Getting online help

You can obtain the script’s online help by running the command below:

create-blastdb-metadata.py --help
usage: create-blastdb-metadata.py [-h] --db DBNAME --dbtype {prot,nucl} [--out FILENAME] [--output-prefix PATH] [--pretty] [--logfile LOGFILE] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [--version]

This program creates BLAST database metadata in JSON format.

required arguments:
  --db DBNAME           A BLAST database
  --dbtype {prot,nucl}  Database molecule type

optional arguments:
  --out FILENAME        Output file name. Default: ${db}-${dbtype}-metadata.json
  --output-prefix PATH  Path prefix for location of database files in metadata
  --pretty              Pretty-print JSON output
  --logfile LOGFILE     Default: create-blastdb-metadata.log
  --loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL}
  --version             show program's version number and exit