Tips for GCP

How to easily try elastic-blast on GCP?

To try elastic-blast with relatively small input size (less than 10k residues, or less than 100k bases), run elastic-blast from the GCP cloud shell. You can access it on your web browser at https://console.cloud.google.com/?cloudshell=true or via the gcloud command:

gcloud alpha cloud-shell ssh --ssh-flag=-A

How to install dependencies on Debian/Ubuntu machines?

If you are working on Debian or Ubuntu Linux distribution and have root permissions, you can install kubectl and python-distutils as follows:

sudo apt-get -yqm update
sudo apt-get install -yq kubectl python3-distutils

Using the Free Trial at GCP

GCP has a Free Trial for new users (https://cloud.google.com/free). The Free Trial comes with some restrictions that are important for ElasticBLAST users. These include only being able to run eight cores concurrently and limiting the persistent disk size to 250G (https://cloud.google.com/terms/free-trial). Normally, ElasticBLAST would run more than eight cores at a time and the default persistent disk size is 3000G.

You should be able to run ElasticBLAST under the Free Trial following the instructions at Quickstart for GCP, but you will need to modify the configuration file to use fewer resources. You may not be able to use the cloud shell and the instance suggested below as that may exceed the quota on cores allowed at one time. In that case, you will need to submit your ElasticBLAST search from your own computer.

Fro additional details about GCP’s free tier (duration, products included, etc), please visit https://cloud.google.com/free/docs/gcp-free-tier .

Below is a configuration file that should work under the Free Trial as of January 2022. This file has been modified from the one in Quickstart for GCP in the following ways:

  • num-nodes has been set to 1 rather than 2.

  • A machine-type, n1-highmem-8, with 8 CPUs has been specified. Normally, ElasticBLAST automatically sets the machine type based on the size of the database and the program.

  • A persistent disk (pd-size) with 200G has been specified.

  • The database is set to swissprot, which is small enough to fit into the memory of the n1-highmem-8 machine.

 1[cloud-provider]
 2gcp-project = YOUR_GCP_PROJECT_ID
 3
 4[cluster]
 5num-nodes = 1
 6labels = owner=USER
 7machine-type = n1-highmem-8
 8pd-size = 200G
 9
10[blast]
11program = blastp
12db = swissprot
13queries = gs://elastic-blast-samples/queries/protein/BDQA01.1.fsa_aa
14results = gs://elasticblast-USER/results/BDQA
15options = -task blastp-fast -evalue 0.01 -outfmt "7 std sskingdoms ssciname"