HTTP Inputs
Overview
For shared filesystem and Batch backends Cromwell can support workflow inputs specified by http and https URLs.
Please note this is not true "filesystem" support for HTTP URLs;
if inputs to a workflow are specified by HTTP URLs the outputs of steps will nevertheless appear at local or GCS paths and not HTTP
URLs.
Configuration
Cromwell's default configuration defines an instance of the HTTP filesystem named http. There is no additional configuration
required for the HTTP filesystem itself so adding HTTP filesystem support to a backend is a simple as
adding a reference to this filesystem within the backend's filesystems stanza. e.g. Cromwell's default Local shared filesystem
backend is configured like this (a GCP Batch or AWS backend would be configured in a similar way):
backend {
default = "Local"
providers {
Local {
...
config {
filesystems {
local {
...
}
http { }
}
}
...
}
...
}
}
If there is a need to turn off this http filesystem in the default Local backend the following Java property
allows for this: -Dbackend.providers.Local.config.filesystems.http.enabled=false.
Caveats
Using HTTP inputs in Cromwell can produce some unexpected behavior:
- Files specified by HTTP URIs will be renamed locally, so programs that rely on file extensions or other filenaming conventions may not function properly.
- Files located in the same remote HTTP-defined directory will not be colocated locally. This can cause problems if a program is expecting an index file (e.g. .fai) to appear in the same directory as the associated data file (e.g. .fa) without specifying the index location.