Filesystems

Most workflows represent their inputs and outputs in the form of files. Those files are stored in filesystems. There exists many filesystems. This section describes which filesystems Cromwell supports.

Overview

Filesystems are configurable. The reference.conf, which is the configuration inherited by any Cromwell instance, contains the following:

# Filesystems available in this Crowmell instance
# They can be enabled individually in the engine.filesystems stanza and in the config.filesystems stanza of backends
# There is a default built-in local filesytem that can also be referenced as "local" as well.
filesystems {
  drs {
      class = "cromwell.filesystems.drs.DrsPathBuilderFactory"
      # Use to share a unique global object across all instances of the factory
      global {
        # Class to instantiate and propagate to all factories. Takes a single typesafe config argument
        class = "cromwell.filesystems.drs.DrsFileSystemConfig"
        config {
          resolver {
            url = "https://martha-url-here or https://drshub-url-here"
            # The number of times to retry failures connecting or HTTP 429 or HTTP 5XX responses, default 3.
            num-retries = 3
            # How long to wait between retrying HTTP 429 or HTTP 5XX responses, default 10 seconds.
            wait-initial = 10 seconds
            # The maximum amount of time to wait between retrying HTTP 429 or HTTP 5XX responses, default 30 seconds.
            wait-maximum = 30 seconds
            # The amount to multiply the amount of time to wait between retrying HTTP or 429 or HTTP 5XX responses.
            # Default 2.0, and will never multiply the wait time more than wait-maximum.
            wait-mulitiplier = 2.0
            # The randomization factor to use for creating a range around the wait interval.
            # A randomization factor of 0.5 results in a random period ranging between 50% below and 50% above the wait
            # interval. Default 0.1.
            wait-randomization-factor = 0.1
          }
        }
      }
   }
  gcs {
    class = "cromwell.filesystems.gcs.GcsPathBuilderFactory"
  }
  s3 {
    class = "cromwell.filesystems.s3.S3PathBuilderFactory"
  }
  http {
    class = "cromwell.filesystems.http.HttpPathBuilderFactory"
  }
}

It defines the filesystems that can be accessed by Cromwell. Those filesystems can be referenced by their name (drs, gcs, s3, http and local) in other parts of the configuration.

Note: - S3 filesystem is experimental. - DRS filesystem has initial support only. Also, currently it works only with GCS filesystem in PapiV2 backend.

Also note that the local filesystem (the one on which Cromwell runs on) is implicitly accessible but can be disabled. To do so, add the following to any filesystems stanza in which the local filesystem should be disabled: local.enabled: false.

Engine Filesystems

Cromwell is conceptually divided in an engine part and a backend part. One Cromwell instance corresponds to an "engine" but can have multiple backends configured. The engine.filesystems section configures filesystems that Cromwell can use when it needs to interact with files outside of the context of a backend.

For instance, consider the following WDL:

version 1.0

workflow my_workflow {
    String s = read_string("/Users/me/my_file.txt")
    output {
        String out = s
    }
}

This workflow is valid WDL and does not involve any backend, or even a task. However it does involve interacting with a filesystem to retrieve the content of my_file.txt With a default configuration Cromwell will be able to run this workflow because the local filesystem is enabled by default. If the file is located on a different filesystem (a cloud filesystem for instance), we would need to modify the configuration to tell Cromwell how to interact with this filesystem:

engine {
  filesystems {
    gcs {
      auth = "application-default"
    }
  }
}

(See the Google section for information about the auth field.)

We can now run this workflow

version 1.0

workflow my_workflow {
    String s = read_string("gs://mybucket/my_file.txt")
    output {
        String out = s
    }
}

Default "engine" Filesystems

If you don't change anything in your own configuration file, the following default is inherited from reference.conf:

engine {
  filesystems {
    local {
      enabled: true
    }
    http {
      enabled: true
    }
  }
}

Note: since our configuration files are HOCON, to disable filesystems you must add enabled: false into your overriding configuration file. It is not sufficient to simply omit a filesystem from your stanza.

For example: adding this to your configuration file will remove the http filesystem and leave local for use in the engine:

engine {
  filesystems {
    http {
      enabled: false
    }
  }
}

Whereas this example will leave http unchanged and merely re-assert the default enabling of local. In other words, this will do nothing:

engine {
  filesystems {
    local {
      enabled: true
    }
  }
}

Backend Filesystems

Similarly to the engine, you can also configure backend filesystems individually. Some backends might require the use of a specific filesystem. For example, the Pipelines API backend requires Google Cloud Storage. Let's take another example:

version 1.0

task my_pipelines_task {
    input {
        File input_file
    }
    String content = read_string(input_file)

    command {
        echo ~{content}
    }

    runtime {
        docker: "ubuntu"
    }
}
workflow my_workflow {
    call my_pipelines_task { input: input_file = "gs://mybucket/my_file.txt" }
}

Suppose this workflow is submitted to a Cromwell running a Pipelines API backend. This time the read_string function is in the context of a task run by the backend. The filesystem configuration used will be the one in the config section of the Pipelines API backend.

Supported Filesystems