Our Solution


Pypline will replace the existing image processing pipeline with software to dynamically create job-specific workflows for execution on the USGS processing cluster. A new UI for presenting options to the user based on mission-specific recipes allows for a greater flexibility in workflow creation. The user input will then be taken by the job-generating script to dynamically create a workflow unique to that user’s processing job. Management of submitted jobs and their execution will be performed by a new workflow management tool deployed at USGS. The new pipeline image processing pipeline seen in the diagram below provides for the flexibility needed by USGS and end users.

The heart of the new pipeline is its new workflow management software and job scheduler. Airflow was selected by USGS to provide both of these services. Airflow allows for workflows to be broken down into the individual processing steps and submitted as python scripts that represent workflows as directed acyclic graphs (DAG). A DAG is comprised of individual tasks and the order in which they need to be executed. Airflow allows for these tasks to be built using already generated templates. Individual ISIS commands can be represented as a templated task to be included within a DAG script. This allows for dynamic job generation that is unique to each image processing job. Additionally, the included Airflow UI will allow for viewing submitted jobs in DAG format, providing easier monitoring and troubleshooting of generated workflows.

Our new User Interface utilizes the existing mission-specific processing recipes to generate a visualization of the workflow. The recipes contain relevant mission specific mission data including what initial ISIS tools must be used to convert from mission image format to the ISIS cube format and what ISIS tools are available for that mission. From the UI, a user is able to manipulate what toolkit steps are included and the parameters passed to those toolkit steps. That workflow is then translated to a JSON object for use in the generation of a DAG that is submitted to the Airflow scheduler. DAG generation is handled by our workflow generation solution, capable of receiving the JSON object and outputting an Airflow compatible DAG to be executed.

Finally, all components of the final solution are containerized in order to allow for easier maintenance post project completion. Application containerization allows for deployment and isolated execution of applications through OS-level virtualization without the overhead of running an entire virtual machine. The replacement pipeline rectifies issues seen in the current solution with these key features:

The new image processing pipeline adds in the flexibility to generate dynamic workflows and provide a better experience for end POW website user. The focus on dynamic job creation allows for easier future modifications to the pipeline and its ongoing maintenance.