Cromwell Backend Development
What’s a Backend?
Cromwell is a workflow execution service, which means it takes a representation of work to do (tasks) as well as the dependencies and code flow between them. When it comes to actually executing the specific task, Cromwell delegates that work to a “backend”. A backend is therefore responsible for the actual execution of a single task on some underlying computing platform, such as Google Cloud Platform, AWS, TES, Local, etc. Backends are implemented as a software layer between the Cromwell engine and that underlying platform.
The underlying platform services requests for a job to run while the backend shim provides the interface layer between Cromwell and the platform. In general, the more sophisticated the platform the thinner the shim needs to be and vice versa.
A job from Cromwell’s perspective is a collection of the following information:
- A unix command line to run
- A mapping of where its input files currently live to where the command line expects them to be
- A mapping of where the command line will write its outputs to where they should eventually wind up
- An optional Docker image which if supplied will be the environment in which the command will be run
- A collection of arbitrary key/value pairs which can be used by the execution platform to tune the request, e.g. amount of memory or the number of CPUs.
The Docker image is not required for a backend but is highly recommended. The bioinformatics workflow field is rapidly moving towards a Docker model so one will find better support for their backend if it is built around using Docker containers.
The input/output file mappings will depend on the needs of the platform. In the simplest example these values would be identical, living on a shared filesystem. However an example of where this would be useful would be a model where an input file lives in a cloud bucket and is directly copied onto the machine running the command line followed by copying the program’s outputs back to a cloud bucket.
The underlying platform can be extremely simple or have arbitrary levels of complexity as long as one can map from the above concepts to running a command. That mapping could happen completely in the platform, the Cromwell backend, or a mixture of the two. The implementer of a backend will need to strike the balance which works best for both their needs and the particulars of the platform itself.
Throughout the rest of this document the following terms are used, the difference between them may be subtle but important:
- Workflow: a container of one or more calls, possibly expressing execution dependencies on each other. May contain scatters, conditional logic, or subworkflow invocations.
- Task: The abstract definition of a thing to run. Think of this like a function in a programming language.
- Call: An instantiated request in Cromwell of a thing to run. To further the function analogy this would be an invocation of that function.
- Job: The physical manifestation of a call by the backend. Examples of this would be an SGE job, a unix process, or a Google Life Sciences operation ID.
Within a running workflow, the backend is used in the following manner:
- Initialization: Initialization routine called the first time a workflow uses a backend
- Execute: The workflow requests that the backend run a job
- Recover: Attempt to reconnect to a previously started job. An example of this would be if Cromwell was restarted and wanted to reattach to currently running jobs.
- Abort: Request that a running job be halted
- Finalization: When a workflow is complete, allows for any workflow level cleanup to take place.
The initialization and finalization steps are optional and are provided for cases where a backend needs that behavior; not all backends will need these.
The implementations of both the recover and abort steps are up to the backend developer. For instance the recover function could be implemented to actually perform an execution and/or the abort function could be written to do nothing at all. Implementing these in a more robust manner is recommended for most cases but neither are universally appropriate.
How do I create a Backend?
This section is assuming that you both know the underlying execution platform you wish to use and that you know how to programmatically submit work to it.
The Cromwell engine uses backend-specific types extending Akka
Actor while processing a workflow:
BackendLifecycleActorFactoryThe entry point into the backend implementation specified in Cromwell configuration. e.g. a sample configuration for the Local backend
BackendWorkflowInitializationActorHandles the initialization phase
- Usually a derivative
StandardAsyncExecutionActorto run an individual job within the workflow
BackendWorkflowFinalizationActorHandles the finalization phase
These three actors are all represented by a trait which a backend developer would need to implement. Both the initialization and finalization actors are optional.
The following explanations of the traits mention multiple types defined in the Cromwell codebase. In particular
BackendJobDescriptor wraps all the information necessary for a backend to instantiate a job and
JobKey provides the
information to uniquely identify a job. It is recommended that one look in the Cromwell codebase for more information on
these and other types.
If a backend developer wishes to take advantage of the initialization phase of the backend lifecycle they must implement this trait. There are three functions which must be implemented:
abortInitialization: Unitspecifies what to do, if anything, when a workflow is requested to abort while a backend initialization is in progress.
validate: Future[Unit]is provided so that the backend can ensure that all of the calls it will handle conform to the rules of that backend. For instance, if a backend requires particular runtime attributes to exist.
beforeAll: Future[Option[BackendInitializationData]]is the actual initialization functionality. If a backend requires any work to be done prior to handling a call, that code must be called from here.
Nearly all production backend implementations in Cromwell extend the StandardAsyncExecutionActor trait. The minimum overrides for implementations of this trait are enumerated below, but overrides of other methods will also be required. There are several backend implementations in the Cromwell codebase that can serve as references, for example:
- Google Life Sciences / Pipelines API common layer
- GA4GH Task Execution Service (TES)
- AWS Batch
Overrides required for compilation and basic execution of
type StandardAsyncRunInfoencapsulates the type of the run info when a job is started.
type StandardAsyncRunStateencapsulates the type of the run status returned during each poll.
def statusEquivalentTo(thiz: StandardAsyncRunState)(that: StandardAsyncRunState): Booleanshould return true if the status contained in
thizis equivalent to
that, delta any other data that might be carried around in the state type
def standardParams: StandardAsyncExecutionActorParamsa standard set of parameters passed to the backend.
def isTerminal(runStatus: StandardAsyncRunState): BooleanReturns true when a job is complete, either successfully or unsuccessfully.
- At least one of:
def execute(): ExecutionHandleexecutes the job specified in the params
def executeAsync(): Future[ExecutionHandle]asynchronously executes the job specified in the params
- At least one of:
def pollStatus(handle: StandardAsyncPendingExecutionHandle): StandardAsyncRunStatereturns the run status for the job
def pollStatusAsync(handle: StandardAsyncPendingExecutionHandle): Future[StandardAsyncRunState]asynchronously returns the run status for the job
There is only one function to override for this trait if the backend developer chooses to use the finalization functionality:
afterAll: Future[Unit]is the hook to perform any desired functionality, and is called when the workflow is completed.