MegaBLAST on a large nucleotide set

In this example, you search 87,374 hepatitis virus sequences against the nt database, producing tabular output. The search should take about 75 minutes and cost less than $10. The preemptible or spot price could be as little as 20% of that, but may take longer to complete.

Below is the configuration file for this example. Copy it into a new file with a text editor, then fill in the needed sections, which includes the cloud-provider information, the query path, and a bucket for your results. Assuming you are using the same account as in the quickstart, use the same cloud-provider information. For the query path, uncomment either the GCP (gs://) or the AWS (s3://) option and delete the other one. You may use the results bucket used in the quickstart, but you should change the final location (BDQA).

The instructions below assume the configuration file is named hepatitis.ini. If you use a different name, you can simply modify the instructions.

 1[cloud-provider]
 2**FILL IN**
 3
 4[cluster]
 5num-nodes = 4
 6
 7[blast]
 8program = blastn
 9db = nt
10#queries = gs://elastic-blast-samples/queries/tests/hepatitis.fsa
11#queries = s3://elasticblast-test/queries/hepatitis.fsa.gz
12results = **FILL IN**
13options = -evalue 0.01 -outfmt 7

Once you have finished your edits to the configuration file, you are ready to start your run. You should follow the same steps you used in your quickstart.

First, run elastic-blast with the submit command:

elastic-blast submit --cfg hepatitis.ini

Once the above command returns (which may take a few minutes), you can check the status of the search:

elastic-blast status --cfg hepatitis.ini

Once your search is done, you may download the results as shown below.

For GCP, use the command:

gsutil -qm cp ${YOUR_RESULTS_BUCKET}/*.out.gz .

For AWS, use the command:

aws s3 cp ${YOUR_RESULTS_BUCKET}/ . --exclude "*" --include "*.out.gz" --recursive

Here, YOUR_RESULTS_BUCKET should be set to the name of the results bucket used in your configuration file.

Finally, make sure to delete your resources if the Auto-shutdown feature is not enabled:

elastic-blast delete --cfg hepatitis.ini

You should also run the checks outlined in the quickstart to double-check that all resources have been deleted. Please see clean up cloud resources for GCP or clean up cloud resources for AWS for details.