AWS Batch Backend

AWS Batch is a set of batch management capabilities that dynamically provision the optimal quantity and type of compute resources (e.g., CPU or memory optimized instances) based on the volume and specific resource requirements of the batch jobs submitted.

This section provides details on how to configure the AWS Batch backend with Cromwell. For instructions on common configuration and deployment tutorial, see Getting started with AWS Batch.

Resources and Runtime Attributes

Cromwell and AWS Batch recognizes number of runtime attributes, more information can be found in the customize tasks page.

Running Cromwell on an EC2 instance

Cromwell can be run on an EC2 instance and submit jobs to AWS Batch, AWS provide CloudFormation stacks and guides to building the correct IAM permissions.

Scaling Requirements

For a Cromwell server that will run multiple workflows, or workflows with many steps (e.g. ones with large scatter steps), it is recommended to setup a database to store workflow metadata. The application config file will expect a SQL database location. Follow these instructions on how to create a serverless Amazon Aurora database.

Configuring Cromwell for AWS Batch

Within the *.conf file, you have a number of options to change the Cromwell's interaction with AWS Batch.

Filesystems

More information about filesystems can be found on the Filesystems page.

Amazon's S3 storage is a supported filesystem in both the engine and backend, this means that S3 files can be referenced at a workflow level, and as input files, provided they are prefixed by 's3://'.

  • filesystems
  • filesystems.s3.auth
  • filesystems.s3.caching.duplication-strategy

Configuring Authentication

To allow Cromwell to talk to AWS, the default authentication scheme uses the default authentication provider with the following AWS search paths: - Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY - Java system properties - aws.accessKeyId and aws.secretKey - Default credential profiles file - Created by the AWS CLI, typically located at ~/.aws/credentials - Instance profile credentials - Only relevant on EC2 instances

Allowing private Docker containers

AWS Batch allows the use of private Docker containers by providing dockerhub credentials. Under the specific backend's configuration, you can provide the following object:

(backend.providers.AWSBatch.config.)dockerhub = {
  // account = ""
  // token = ""
}

More configuration options

  • (backend.providers.AWSBatch.config.)concurrent-job-limit specifies the number of jobs that Cromwell will allow to be running in AWS at the same time. Tune this parameter based on how many nodes are in the compute environment.
  • (backend.providers.AWSBatch.config.)root points to the S3 bucket where workflow outputs are stored. This becomes a path on the root instance, and by default is cromwell_root. This is monitored by preinstalled daemon that expands drive space on the host, ie AWS EBS autoscale. This path is used as the 'local-disk' for containers.