AwsBatch101

Getting started on AWS with AWS Batch (beta)

Prerequisites

This tutorial page relies on completing the previous tutorial:

Downloading Prerequisites

Goals

At the end of this tutorial you'll have configured your local environment to run workflows using Cromwell on AWS Batch.

Let's get started!

To create all the resources for running a Cromwell server on AWS using CloudFormation, launch the Cromwell Full Stack Deployment. Alternatively, this page will walk through the specific steps to configure and run a local Cromwell server using AWS Batch.

Authenticating a local Cromwell server with AWS
Configuring the AWS environment
Configuring Cromwell
Workflow Source Files
Running Cromwell and AWS
Outputs

Authenticating a local Cromwell server with AWS

The easiest way to allow a local Cromwell server to talk to AWS is to:

Install the AWS CLI through Amazon's user guide.
Configure the AWS CLI by calling aws configure (provide your Access Key and Secret Access Key when prompted).

Cromwell can access these credentials through the default authentication provider. For more options, see the Configuring authentication of Cromwell with AWS section below.

Configuring the AWS environment

Next you'll need the following setup in your AWS account: - The core set of resources (S3 Bucket, IAM Roles, AWS Batch) - Custom Compute Resource (Launch Template or AMI) with Cromwell Additions

Information and instructions to setup an AWS environment to work properly with Cromwell can be found on AWS for Genomics Workflow. By deploying the CloudFormation templates provided by AWS, the stack will output the S3 bucket name and two AWS Batch queue ARNs (default and high-priority) used in the Cromwell configuration.

Configuring Cromwell

Now we're going to configure Cromwell to use the AWS resources we just created by updating a *.conf file to use the AWSBackend at runtime. This requires three pieces of information:

The AWS Region where your resources are deployed.
S3 bucket name where Cromwell will store its execution files.
The ARN of the AWS Batch queue you want to use for your tasks.

You can replace the placeholders (<your region>, <your-s3-bucket-name> and <your-queue-arn>) in the following config:

`aws.conf`

include required(classpath("application"))

aws {

  application-name = "cromwell"
  auths = [
    {
      name = "default"
      scheme = "default"
    }
  ]
  region = "<your-region>"
}

engine {
  filesystems {
    s3.auth = "default"
  }
}

backend {
  default = "AWSBatch"
  providers {
    AWSBatch {
      actor-factory = "cromwell.backend.impl.aws.AwsBatchBackendLifecycleActorFactory"
      config {

        numSubmitAttempts = 6
        numCreateDefinitionAttempts = 6

        // Base bucket for workflow executions
        root = "s3://<your-s3-bucket-name>/cromwell-execution"

        // A reference to an auth defined in the `aws` stanza at the top.  This auth is used to create
        // Jobs and manipulate auth JSONs.
        auth = "default"

        default-runtime-attributes {
          queueArn: "<your arn here>"
        }

        filesystems {
          s3 {
            // A reference to a potentially different auth for manipulating files via engine functions.
            auth = "default"
          }
        }
      }
    }
  }
}

For more information about this configuration or how to change the behaviour of AWS Batch, visit the AWS Backend page.

Workflow Source Files

Lastly, create an example workflow to run. We're going to define a simple workflow that will echo a string to the console and return the result to Cromwell. Within AWS Batch (like other cloud providers), we're required to specify a Docker container for every task.

`hello.wdl`

task hello {
  String addressee = "Cromwell"
  command {
    echo "Hello ${addressee}! Welcome to Cromwell . . . on AWS!"
  }
  output {
    String message = read_string(stdout())
  }
  runtime {
    docker: "ubuntu:latest"
  }
}

workflow wf_hello {
  call hello
  output { hello.message }
}

Running Cromwell and AWS

Provided all of the files are within the same directory, we can run our workflow with the following command:

Note: You might have a different Cromwell version number here

java -Dconfig.file=aws.conf -jar cromwell-36.jar run hello.wdl

This will: 1. Start Cromwell in run mode, 2. Prepare hello.wdl as a job and submit this to your AWS Batch queue. You can monitor the job within your AWS Batch dashboard. 3. Run the job, write execution files back to S3, and report progress back to Cromwell.

Outputs

The end of your workflow logs should report the workflow outputs.

[info] SingleWorkflowRunnerActor workflow finished with status 'Succeeded'.
{
  "outputs": {
    "wf_hello.hello.message": "Hello World! Welcome to Cromwell . . . on AWS!"
  },
  "id": "08213b40-bcf5-470d-b8b7-1d1a9dccb10e"
}

Success!

Next steps

You might find the following tutorials and guides interesting to tackle next: