Our Solution
Pypline will replace the existing image processing pipeline with software to dynamically create job-specific workflows for execution on the USGS processing cluster. A new UI for presenting options to the user based on mission-specific recipes allows for a greater flexibility in workflow creation. The user input will then be taken by the job-generating script to dynamically create a workflow unique to that user’s processing job. Management of submitted jobs and their execution will be performed by a new workflow management tool deployed at USGS. The new pipeline image processing pipeline seen in the diagram below provides for the flexibility needed by USGS and end users.
The heart of the new pipeline is its new workflow management software and job scheduler. Airflow was selected by USGS to provide both of these services. Airflow allows for workflows to be broken down into the individual processing steps and submitted as python scripts that represent workflows as directed acyclic graphs (DAG). A DAG is comprised of individual tasks and the order in which they need to be executed. Airflow allows for these tasks to be built using already generated templates. Individual ISIS commands can be represented as a templated task to be included within a DAG script. This allows for dynamic job generation that is unique to each image processing job. Additionally, the included Airflow UI will allow for viewing submitted jobs in DAG format, providing easier monitoring and troubleshooting of generated workflows.
Our new User Interface utilizes the existing mission-specific processing recipes to generate a visualization of the workflow. The recipes contain relevant mission specific mission data including what initial ISIS tools must be used to convert from mission image format to the ISIS cube format and what ISIS tools are available for that mission. From the UI, a user is able to manipulate what toolkit steps are included and the parameters passed to those toolkit steps. That workflow is then translated to a JSON object for use in the generation of a DAG that is submitted to the Airflow scheduler. DAG generation is handled by our workflow generation solution, capable of receiving the JSON object and outputting an Airflow compatible DAG to be executed.
Finally, all components of the final solution are containerized in order to allow for easier maintenance post project completion. Application containerization allows for deployment and isolated execution of applications through OS-level virtualization without the overhead of running an entire virtual machine. The replacement pipeline rectifies issues seen in the current solution with these key features:
Dynamic Workflows
The flexibility issues seen in the current pipeline is solved through the elimination of prebuilt workflow scripts and creation of dynamically generated workflows. This allows for the full menu of relevant image processing options to be presented to the user. Additionally, it provides end users with the ability to stop workflows at any step in the previous pre-generated workflows or eliminate individual steps within them. Overall the new pipeline offers an increased ability for users to tailor the output images for their needs.Easier Maintenance
The new pipeline separates out the portions of the pipeline potentially requiring maintenance to allow for easier upkeep of the pipeline. Specifics to the mission are contained within the instrument recipes and the implementation of specific ISIS tools is implemented as templated Airflow tasks. This isolation supports the maintenance of existing mission data in addition to the inclusion of future missions over the lifetime of the new pipeline.Additional Image Output Options
One of the inclusions into the new UI is an image output location to be specified by the user. For users outside of USGS, this is always for the images to be packaged for download (like the current process). Internally USGS employees will be given an option to specify a folder path on the server to place the finalized output. This prevents any wasted bandwidth downloading the images just to be placed back on USGS servers.
The new image processing pipeline adds in the flexibility to generate dynamic workflows and provide a better experience for end POW website user. The focus on dynamic job creation allows for easier future modifications to the pipeline and its ongoing maintenance.