Getting started on Google Cloud with the Genomics Pipelines API

Pipelines API v2

Setting up PAPIv2

For now the easiest way to try PAPIv2 is to start with the sample configuration in cromwell.examples.conf and adjust it to fit your needs.

Permissions:

Google recommends using a service account to authenticate to GCP.

You may create a service account using the gcloud command, consider running the following script and replace MY-GOOGLE-PROJECT:

#!/bin/bash
export LC_ALL=C 
RANDOM_BUCKET_NAME=$(head /dev/urandom | tr -dc a-z | head -c 32 ; echo '')

#Create a new service account called "my-service-account", and from the output of the command, take the email address that was generated
EMAIL=$(gcloud beta iam service-accounts create my-service-account --description "to run cromwell"  --display-name "cromwell service account" --format json | jq '.email' | sed -e 's/\"//g')

# add all the roles to the service account
for i in storage.objectCreator storage.objectViewer lifesciences.workflowsRunner lifesciences.admin iam.serviceAccountUser storage.objects.create
do
    gcloud projects add-iam-policy-binding MY-GOOGLE-PROJECT --member serviceAccount:"$EMAIL" --role roles/$i
done

# create a bucket to keep the execution directory
gsutil mb gs://"$RANDOM_BUCKET_NAME"

# give the service account write access to the new bucket
gsutil acl ch -u "$EMAIL":W gs://"$RANDOM_BUCKET_NAME"

# create a file that represents your service account.  KEEP THIS A SECRET.
gcloud iam service-accounts keys create sa.json --iam-account "$EMAIL"

Prerequisites

This tutorial page relies on completing the previous tutorial:

Goals

At the end of this tutorial you'll have run your first workflow against the Google Pipelines API.

Let's get started!

Configuring a Google Project

Install the Google Cloud SDK. Create a Google Cloud Project and give it a project id (e.g. sample-project). We’ll refer to this as <google-project-id> and your user login (e.g. username@gmail.com) as <google-user-id>.

On your Google project, open up the API Manager and enable the following APIs:

  • Google Compute Engine API
  • Cloud Storage
  • Google Cloud Life Sciences API

Authenticate to Google Cloud Platform
gcloud auth login <google-user-id>

Set your default account (will require to login again)
gcloud auth application-default login

Set your default project
gcloud config set project <google-project-id>

Create a Google Cloud Storage (GCS) bucket to hold Cromwell execution directories. We will refer to this bucket as google-bucket-name, and the full identifier as gs://google-bucket-name.
gsutil mb gs://<google-bucket-name>

Workflow Source Files

Copy over the sample hello.wdl and hello.inputs files to the same directory as the Cromwell jar. This workflow takes a string value as specified in the inputs file and writes it to stdout.

hello.wdl

task hello {
  String addressee  
  command {
    echo "Hello ${addressee}! Welcome to Cromwell . . . on Google Cloud!"  
  }
  output {
    String message = read_string(stdout())
  }
  runtime {
    docker: "ubuntu:latest"
  }
}

workflow wf_hello {
  call hello

  output {
     hello.message
  }
}

hello.inputs

{
  "wf_hello.hello.addressee": "World"
}

Google Configuration File

Copy over the sample google.conf file utilizing Application Default credentials to the same directory that contains your sample WDL, inputs and Cromwell jar. Replace <google-project-id> and <google-bucket-name>in the configuration file with the project id and bucket name. Replace <google-billing-project-id> with the project id that has to be billed for the request (more information for Requester Pays can be found at: Requester Pays)

google.conf

include required(classpath("application"))

google {

  application-name = "cromwell"

  auths = [
    {
      name = "application-default"
      scheme = "application_default"
    }
  ]
}

engine {
  filesystems {
    gcs {
      auth = "application-default"
      project = "<google-billing-project-id>"
    }
  }
}

backend {
  default = “PAPIv2”
  providers {
    PAPIv2 {
      actor-factory = "cromwell.backend.google.pipelines.v2beta.PipelinesApiLifecycleActorFactory"
      config {
        // Google project
        project = "<google-project-id>"

        // Base bucket for workflow executions
        root = "gs://<google-bucket-name>/cromwell-execution"

        // Polling for completion backs-off gradually for slower-running jobs.
        // This is the maximum polling interval (in seconds):
        maximum-polling-interval = 600

        // Optional Dockerhub Credentials. Can be used to access private docker images.
        dockerhub {
          // account = ""
          // token = ""
        }

        genomics {
          // A reference to an auth defined in the `google` stanza at the top.  This auth is used to create
          // Pipelines and manipulate auth JSONs.
          auth = "application-default"

          // Endpoint for APIs, which defaults to us-central1. To run with a location different from us-central1,
          // change the endpoint-url to start with the location, such as https://europe-west2-lifesciences.googleapis.com/
          endpoint-url = "https://lifesciences.googleapis.com/"

          // This allows you to use an alternative service account to launch jobs, by default uses default service account
          compute-service-account = "default"

          // Cloud Life Sciences API is limited to certain locations. See https://cloud.google.com/life-sciences/docs/concepts/locations
          // and note that changing the location also requires changing the endpoint-url.
          location = "us-central1"  

          // Pipelines v2 only: specify the number of times localization and delocalization operations should be attempted
          // There is no logic to determine if the error was transient or not, everything is retried upon failure
          // Defaults to 3
          localization-attempts = 3
        }

        filesystems {
          gcs {
            // A reference to a potentially different auth for manipulating files via engine functions.
            auth = "application-default"
            project = "<google-billing-project-id>"
          }
        }
      }
    }
  }
}

Run Workflow

java -Dconfig.file=google.conf -jar cromwell-67.jar run hello.wdl -i hello.inputs

Outputs

The end of your workflow logs should report the workflow outputs.

[info] SingleWorkflowRunnerActor workflow finished with status 'Succeeded'.
{
  "outputs": {
    "wf_hello.hello.message": "Hello World! Welcome to Cromwell . . . on Google Cloud!"
  },
  "id": "08213b40-bcf5-470d-b8b7-1d1a9dccb10e"
}

Success!

Next steps

You might find the following tutorials interesting to tackle next: