Google Cloud Storage (GCS)
Cromwell supports workflows referencing objects stored in Google Cloud Storage. The Cromwell configuration for GCS is as follow:
# A reference to a potentially different auth for manipulating files via engine functions.
auth = "application-default"
# Google project which will be billed for requests on buckets with requester pays enabled
project = "google-billing-project"
# When a cache hit is found, the following duplication strategy will be followed to use the cached outputs
# Possible values: "copy", "reference". Defaults to "copy"
# "copy": Copy the output files
# "reference": DO NOT copy the output files but point to the original output files instead.
# Will still make sure than all the original output files exist and are accessible before
# going forward with the cache hit.
duplication-strategy = "copy"
authfield refers to the authentication schema that should be used to authenticate requests. See here for more info.
projectfield has to do with the Requester Pays feature (see below).
caching.duplication-strategyfield determines how Cromwell should behave w.r.t output files when call is being cached. The default strategy
copyis to copy the file to its new call location. As mentioned,
referencewill not copy the file and simply point the results to the existing location. See the Call Caching documentation for more information.
GCS has a feature called Requester Pays (RP). This section describes how Cromwell supports it and the consequences on cost. Please first read the official documentation if you're not already familiar with it.
The billing project Cromwell uses to access a bucket with requester pays is determined as follows:
- If a
google_projectwas set in the workflow options when the workflow was submitted, this value is used
- Otherwise, the value of the
projectfield in the
gcsfilesystem configuration is used
- Otherwise, if the machine Cromwell runs on is authenticated using gcloud and a default project is set, this value will be used
Important Note #1: In order for a project to be billable to access a bucket with requester pays, the credentials used need to have the
serviceusage.services.use permission on this project.
Important Note #2: Pipelines API version 1 does not support buckets with requester pays, so while Cromwell itself might be able to access bucket with RP, jobs running on Pipelines API V1 with file inputs and / or outputs will not work. For full requester pays support, use the Pipelines API v2 Cromwell backend.
Important Note #3: Access to requester pays buckets from Cromwell is seamless, this also means that Cromwell will not report in the logs or metadata when it access a bucket with requester pays. It is the user's responsibility to be aware of the extra cost of running workflows access requester pays buckets.