{"_id":"564246059f4ed50d008be1af","parentDoc":null,"category":{"_id":"5601aee850ee460d0002224c","__v":20,"project":"55faeacad0e22017005b8265","version":"55faeacad0e22017005b8268","pages":["56023786930fe1170074bd2c","561d53a09463520d00cd11ef","561d546d31d9630d001eb5d1","561d54af31d9630d001eb5d3","561d54e56386060d00e0601e","561d554d9463520d00cd11f2","564246059f4ed50d008be1af","5643712a0d9748190079defb","564372751ecf381700343c1e","5643742008894c0d00031ed3","5643747a0d9748190079df01","564375c988f3a60d00ac86b0","56437d0f0d9748190079df13","56437e83f49bfa0d002f560a","56437f7d0d9748190079df15","5643810508894c0d00031ef5","5643826f88f3a60d00ac86cb","564382de88f3a60d00ac86ce","56e07ba14685db1700d94873","56e08c9b903c7a29001d5352"],"sync":{"url":"","isSync":false},"reference":false,"createdAt":"2015-09-22T19:41:28.703Z","from_sync":false,"order":8,"slug":"tasks-and-workflows-guide","title":"Tasks and Workflows Guide"},"version":{"_id":"55faeacad0e22017005b8268","project":"55faeacad0e22017005b8265","__v":33,"createdAt":"2015-09-17T16:31:06.800Z","releaseDate":"2015-09-17T16:31:06.800Z","categories":["55faeacbd0e22017005b8269","55faf550764f50210095078e","55faf5b5626c341700fd9e96","55faf8a7825d5f19001fa386","560052f91503430d007cc88f","560054f73aa0520d00da0b1a","56005aaf6932a00d00ba7c62","56005c273aa0520d00da0b3f","5601ae7681a9670d006d164d","5601ae926811d00d00ceb487","5601aeb064866b1900f4768d","5601aee850ee460d0002224c","5601afa02499c119000faf19","5601afd381a9670d006d1652","561d4c78281aec0d00eb27b6","561d588d8ca8b90d00210219","563a5f934cc3621900ac278c","5665c5763889610d0008a29e","566710a36819320d000c2e93","56ddf6df8a5ae10e008e3926","56e1c96b2506700e00de6e83","56e1ccc4e416450e00b9e48c","56e1ccdfe63f910e00e59870","56e1cd10bc46be0e002af26a","56e1cd21e416450e00b9e48e","56e3139a51857d0e008e77be","573b4f62ef164e2900a2b881","57c9d1335fd8ca0e006308ed","57e2bd9d1e7b7220000d7fa5","57f2b992ac30911900c7c2b6","58adb5c275df0f1b001ed59b","58c81b5c6dc7140f003c3c46","595412446ed4d9001b3e7b37"],"is_deprecated":false,"is_hidden":false,"is_beta":false,"is_stable":true,"codename":"v1","version_clean":"1.0.0","version":"1"},"__v":54,"project":"55faeacad0e22017005b8265","user":"55fae9d4825d5f19001fa379","updates":["57364fcfda06991700865d90"],"next":{"pages":[],"description":""},"createdAt":"2015-11-10T19:31:17.558Z","link_external":false,"link_url":"","githubsync":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":0,"body":"A \"workflow\" is a series of tasks chained together to run on the GBDX platform. Each \"task\" is an individual process that performs a specific action. In order to run as part of a workflow, a task must be registered in the workflow system's task registry.\n\nTasks are run from Docker containers that are available via Docker Hub. User inputs must be validated against the system's task and workflow JSON schemas. When a workflow is run, the system generates events that indicate status.\n\n\nFor more information on how tasks integrate into the workflow system, read our \n[Task and Workflow Course](doc:task-and-workflow-course) \n\n\n\n#Workflow Resources\n\nThe Workflow API allows users to:\n* register a task in the workflow system's task registry\n* manage tasks in the task registry\n* create a new workflow by submitting a workflow definition\n* see the status of a workflow\n* search for workflows by status, time range, or owner.\n[block:parameters]\n{\n  \"data\": {\n    \"h-0\": \"Resource\",\n    \"h-1\": \"Description\",\n    \"0-0\": \"Workflows\",\n    \"0-1\": \"A series of tasks chained together to complete an action\",\n    \"1-0\": \"Tasks\",\n    \"1-1\": \"Atomic activities that perform a given action. They have one or more inputs and can have one or more outputs.\",\n    \"2-0\": \"Workflow Events\",\n    \"2-1\": \"System status events generated by the system, the events for a task include submission, starting, running and completing\",\n    \"3-0\": \"Schemas\",\n    \"3-1\": \"JSON schemas used to validate user inputs. The two schemas are the task definition schema and the workflow schema.\"\n  },\n  \"cols\": 2,\n  \"rows\": 4\n}\n[/block]\n\n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Registering a Task on GBDX\"\n}\n[/block]\nIf you are an algorithm producer, you will need to register your task on GBDX. This requires you to \"Dockerize\" your task and submit a JSON task definition to the GBDX Workflow system. \n\n## \nTo learn how to create and Dockerize a task to run on GBDX, see <a href=\"http://gbdxstories.digitalglobe.com/create-task\">How to Create a Task</a>\n\n## Add \"tdgpdeploy\" as a Collaborator\nBefore you register your task in  the GBDX task registry (see below), you'll need to add \"tdgpdeploy\" as a collaborator in Docker Hub. Depending on the type of Docker Hub account you have, you'll either add it as a direct collaborator or as a member of your read-only access collaborator team. \n\nThis example shows how to add \"tdgpdeploy\" as a direct collaborator. Type the name in the \"username\" box and click \"add user\". It will appear in the user list with \"Collaborator\" access.\n\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/58bb0f8-Task_collaborator.png\",\n        \"Task collaborator.png\",\n        1308,\n        275,\n        \"#e7ecf4\"\n      ]\n    }\n  ]\n}\n[/block]\nIf your Docker Hub account level includes organizations, refer to the \n<a href=\"https://docs.docker.com/docker-hub/orgs/\">Docker Hub documentation</a> to learn how to add tdgpdeploy to your readonly access team. \n\n## Register the Task in the GBDX Task Registry\nTasks must be registered with the workflows task registry before they can be added to a workflow. If you are using a task that's already been registered on the GBDX platform, you can skip this step.\n\nTo register a task, the definition JSON is created and and sent as a Body POST to the /workflows/v1/tasks endpoint.\n\n\n### Rules for Tasks\n* Tasks require a name\n* Tasks require a version number. When registering a new version of a task, the version number \nmust be incremented.\n* When a new task is registered, the \"isPublic\" task must be set to \"false\" or not included in the task definition. Only a superuser can set a task to \"public\".\n*If a new version of a public task is registered, the owner can set the new version to public.\n* Tasks require at least one input port and one output port.\n* Task ports require a name and a type\n* Tasks require one container. Only Docker is currently supported.\n\n\nAll tasks must be defined and registered in the GBDX Task Registry. The Task definition JSON file includes the following sections:\n\n### Create  the Task Definition\n\nThe task definition has four main sections:\n\n1. Name and Properties\n2. Input Port Descriptors \n3. Output Port Descriptors\n4. Container Descriptors\n\n#### Example Task Definition\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"{\\n\\t\\\"name\\\": \\\"test-success\\\",\\n\\t\\\"description\\\": \\\"Runs a no-op task that writes successful output status.\\\",\\n\\t\\\"version\\\": \\\"0.0.1\\\",\\n\\t\\\"properties\\\": {\\n\\t\\t\\\"isPublic\\\": false,\\n\\t\\t\\\"timeout\\\": 7200,\\n    \\\"authorizationRequired\\\": true\\n\\t},\\n\\t\\\"inputPortDescriptors\\\": [{\\n\\t\\t\\t\\\"required\\\": true,\\n\\t\\t\\t\\\"description\\\": \\\"A string input.\\\",\\n\\t\\t\\t\\\"name\\\": \\\"inputstring\\\",\\n\\t\\t\\t\\\"type\\\": \\\"string\\\"\\n\\t\\t},\\n\\t\\t{\\n\\t\\t\\t\\\"name\\\": \\\"dependency_input\\\",\\n\\t\\t\\t\\\"type\\\": \\\"string\\\"\\n\\t\\t}\\n\\t],\\n\\t\\\"outputPortDescriptors\\\": [{\\n\\t\\t\\\"name\\\": \\\"dependency_output\\\",\\n\\t\\t\\\"type\\\": \\\"string\\\"\\n\\t}],\\n\\t\\\"containerDescriptors\\\": [{\\n\\t\\t\\\"type\\\": \\\"DOCKER\\\",\\n\\t\\t\\\"command\\\": \\\"\\\",\\n\\t\\t\\\"properties\\\": {\\n\\t\\t\\t\\\"image\\\": \\\"tdgp/test-success\\\",\\n\\t\\t\\t\\\"mounts\\\": [{\\n\\t\\t\\t\\t\\\"local\\\": \\\"$task_data_dir\\\",\\n\\t\\t\\t\\t\\\"container\\\": \\\"/mnt/work\\\",\\n\\t\\t\\t\\t\\\"read_only\\\": false\\n\\t\\t\\t}]\\n\\t\\t}\\n\\t}]\\n\\n}\",\n      \"language\": \"json\"\n    }\n  ]\n}\n[/block]\n#### Name, Version, Description, and Properties\n\nElement | Description\n--- | ---\nName | The name you define for the task\nVersion | The version number for the task. New tasks default to 0.0.1 if the version is not defined. See [How to Version a Task](doc:how-to-version-a-task) \nDescription | A brief description of the task\nProperties | See below\n\nThe properties section includes the following:\n\nProperty | Description\n--- | ---\nisPublic | Boolean value that determines whether a task is private or public. \ntimeout | The task will time out if not completed within the set time frame. The timeout value is in seconds. See below for minimum, default, and maximum values allowed. \nauthorizationRequired | Boolean value. If authorizationRequired is True, the task will capture the requestor's GBDX token and make GBDX requests on their behalf. If the task has impersionation_allowed is True in the Workflow definition, authorizationRequired is required to be True. See [Task User Impersonation](#task-user-impersonation) for details. \n\nNote: The properties section is not required. If the properties are not included as part of the task definition the default values will be used. See below for default values. However, if the properties section is part of your task definition, the \"timeout\" property is required and a value must be set. \n\nExample:\n\n### Private and Public tasks\n\nThese are the business rules for the \"isPublic\" flag:\n\n* New tasks must be registered as private tasks. \n* Only a member of the GBDX team at DigitalGlobe can change a new task from \"private\" to \"public\". \n*If a new version of a public task is published, the task owner can set the new version to public.\n*The task owner can change a task from \"public\" to \"private\".\n\nWhen a new task is registered on GBDX, it must be set to \"private\". \"Private\" means only users from the account that registered the task can run it. \n\n####Definitions\nValue | Description\n--- | ---\nPublic | Any GBDX user can access or use this task, regardless of what account they're associated with.\nPrivate | Only users associated with the account the task is registered under can access or use this task. A \"private\" task cannot be shared between multiple accounts.\n\nTo register a private task:\n\nIn the JSON task definition, under \"properties\", set ```\"isPublic\": false```\n\nIf the properties section is not included in the task definition, the task will default to ```\"isPublic\": false```.\n\n#### Flag Settings\nSetting | Result\n--- | ---\n\"isPublic\": true | The task will be public\n\"isPublic\": false | the task will be private\n\"isPublic\": [no value] | The default value of \"false\" will be used. The task will be private. \n'isPublic\" not included in the task definition | The default value of \"false\" will be used. The task will be private. \n\n\n####Process\nBefore a task can be set to public, a review process is required. Setting a task to \"public\" is part of the algorithm submission process. The task is set to public by the GBDX team once it has been reviewed. \n\n####Error Conditions\nSubmitting a task definition for a new task with ```isPublic\":true``` will result in the following error, and the task will not be registered.\n\nStatus code:403, Message: 'Creating a public task is unauthorized.'\n\n\n\n###Define a Time-out  threshold for a Task\nBy default, tasks time out after 7200 seconds (2 hours).\n\nThe default time out value can be changed in the properties section of the task definition jSON. You can do this by setting a value in seconds for \"timeout\": [value].\n\n\nType | Value\n--- | ---\nDefault | 7200 (2 hours)\nMinimum | 0\nMaximum | 172800 (48 hours)\n\nImportant: If the \"properties\" section is included in the task definition, the \"timeout\" property is required and a value must be set. \n\n### Updating the Task Definition\n\nAll fields in the task definition except Name and Version can be updated in a PUT request. A PUT request does not require the version to be incremented, and it does not trigger a task Docker migration. \n\nMake a ```PUT``` request to the  /workflows/v1/tasks endpoint. Include the full task definition, including the changes. \n\n#### Input Port Descriptors\nUse this section to define the task input ports for this task.\n\nInput ports have:\n\nElement | Value\n--- | ---\nName | The input port name\nType | input port type (string, directory)\nDescription | A short description of the input type for this port\nrequired | boolean \n\n\n\n#### Output Port Descriptors\nUse this section to define the task output ports for this task.\n\nElement | Value\n--- | ---\nName | The output port name\nType | output port type (string, directory)\nDescription | A short description of the output type for this port\n\n\n[block:callout]\n{\n  \"type\": \"info\",\n  \"title\": \"Important: Symbolic links should not be written as task output.\"\n}\n[/block]\n#### Container Descriptors\nUse this section to define the container in which the task is run. Only Docker containers are supported on GBDX.\n\nElement | Value\n--- | ---\nType | The domain on which the task is run\nImage | The name of the Docker image that is pulled from Docker Hub\nCommand | The command to run within the container\n\n\n\nTo see tasks that are already registered, run [List Tasks](doc:list-tasks-in-thetask-registry) \n\nTo see the task descriptor schema, run [Task Definition Schema](doc:get-the-task-definition-schema) \n\n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Search for a Workflow\"\n}\n[/block]\nYou can search for workflows using the following criteria. \n\n__Filter__ | __Description__\n--- | --- |\nSearch by State |The state of the workflow. For example submitted, failed, timedout, succeeded.  You can only search by one state at a time. \nSearch by Time Range | The lookback time (last N hours) is based on the time the workflow search query is run. For example, if a query is ran at 14:00 and the value is set to 2, the workflows that started at 12:00 and on will be listed. The default value is 24. \nSearch by Owner | Search by workflow owner will return all workflows that have the specific owner listed.\n\n\n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Run a Workflow\"\n}\n[/block]\nTo run a workflow,  make a POST request to https://geobigdata.io/workflows/v1/workflows with the workflow definition in the request body. The request will list each task that is to be run in the workflow, in the order it should be run, going from top to bottom. The request will include a workflow name, and the task name, task type, and inputs and outputs for each task. \n\nTo see the workflow definition schema, see [Workflow Definition Schema](doc:get-the-workflow-schema) .\nTo see an example workflow definition as a POST request body, see [Submit a Workflow](doc:submit-a-workflow) \n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Retrieve STDOUT output from a completed task in a workflow\"\n}\n[/block]\nGET to /workflows/v1/workflows/<workflow_id>/tasks/<task_id>/stdout\n\nResponse: 200 OK\nResponse body contains the STDOUT from the Docker container that ran the task. \nWhen there is no data, the request will return \"empty.\"\n\n\n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Retrieve STDERR output from a completed task in a workflow\"\n}\n[/block]\nGET to /workflows/v1/workflows/<workflow_id>/tasks/<task_id>/stderr\n\nResponse: 200 OK\nResponse body contains the STDERR from the Docker container that ran the task.\nWhen there is no data, the request will return \"empty.\"\n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Task User Impersonation\"\n}\n[/block]\nTasks can perform GBDX operations on behalf of a user. For example, a task can make a catalog query, place an order, or use any other GBDX API. This means a task can run in a workflow without authorization.\n\nIn order for a task to performs GBDX operations on behalf of the user, the task definition and the workflow definition must allow it.\n\nTo allow user impersonation:\n\n* In the task definition, \"authorizationRequired\" must be set to true (false is the default if the value is not set).\n\n* In the workflow definition, \"impersonation_allowed\" must be set to true (false is the default if the value is not set).\n\nWhen impersonation is allowed, the user token is passed as part of the runtime information, and will be available in gbdx_runtime.json in user_token field.\n\n## Task Definition\n\nName | Description | Type |Default\n--- | --- | --- | --- | --- \nauthorizationRequired | Indicator that the task logic will require GBDX authorization from the running user. If this is true and during workflow invocation 'impersonation_allowed' flag is not set on the task, workflow will fail. | boolean | false\n\nWhile the task descriptor schema has changed, tasks that are already registered are not updated with the authorizationRequired flag. These tasks, and any new tasks registered without the flag, will default to \"false.\" \n\nTo add the flag or change the value for an existing task, the task must be re-registered. \n\nThis example shows a task descriptor with \"authorizationAllowed\" = true.\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"{\\n \\\"inputPortDescriptors\\\": [\\n   {\\n     \\\"name\\\": \\\"dummy\\\",\\n     \\\"type\\\": \\\"string\\\"\\n   }\\n ],\\n \\\"containerDescriptors\\\": [\\n   {\\n     \\\"type\\\": \\\"DOCKER\\\",\\n     \\\"command\\\": \\\"\\\",\\n     \\\"properties\\\": {\\n       \\\"image\\\": \\\"dummy\\\"\\n     }\\n   }\\n ],\\n \\\"description\\\": \\\"Test task\\\",\\n \\\"name\\\": \\\"test-task-with-required-authorization\\\",\\n \\\"properties\\\": {\\n   \\\"timeout\\\": 7200,\\n   \\\"authorizationRequired\\\": true\\n }\\n}\",\n      \"language\": \"json\"\n    }\n  ]\n}\n[/block]\n## Workflow Definition\nName | Description | Type| Default\n--- | --- | --- | --- | --- | ---\nimpersonation_allowed | when set to true, the task can use user security token. Impersonation is allowed. | boolean | false\n\nWhen there are multiple tasks defined in a single workflow, the tasks can have different values and the workflow will run successfully. This example shows a workflow definition with three tasks. Two allow impersonation and one does not.\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"{\\n    \\\"name\\\": \\\"AOP_stagetoS3\\\",\\n    \\\"tasks\\\": [\\n        {\\n            \\\"name\\\": \\\"AOP\\\",\\n            \\\"outputs\\\": [\\n                {\\n                    \\\"name\\\": \\\"data\\\"\\n                },\\n                {\\n                    \\\"name\\\": \\\"log\\\"\\n                }\\n            ],\\n            \\\"inputs\\\": [\\n                {\\n                    \\\"name\\\": \\\"data\\\",\\n                    \\\"value\\\": \\\"<bucket location>\\\"\\n                },\\n                {\\n                    \\\"name\\\": \\\"enable_acomp\\\",\\n                    \\\"value\\\": \\\"true\\\"\\n                }\\n            ],\\n            \\\"taskType\\\": \\\"AOP_Strip_Processor\\\",\\n            \\\"impersonation_allowed\\\": false\\n        },\\n        {\\n            \\\"name\\\": \\\"StagetoS3_Data\\\",\\n            \\\"inputs\\\": [\\n                {\\n                    \\\"name\\\": \\\"data\\\",\\n                    \\\"source\\\": \\\"AOP:data\\\"\\n                },\\n                {\\n                    \\\"name\\\": \\\"destination\\\",\\n                    \\\"value\\\": \\\"<bucket location>\\\"\\n                }\\n            ],\\n            \\\"taskType\\\": \\\"StageDataToS3\\\",\\n            \\\"impersonation_allowed\\\": true\\n        }\\n    ]\\n}\\n\",\n      \"language\": \"json\"\n    }\n  ]\n}\n[/block]\n## Scenarios\n\nNumber | Scenario | Result\n--- | --- | ---\nScenario #1 | Task Definition \"authorizationRequired\" = true; Workflow Definition task impersonation_allowed = true | Task runs successfully\nScenario #2 | Task Definition \"authorizationRequired\" = false; Workflow Definition task impersonation_allowed = false | Task runs successfully\nScenario #3 |Task Definition \"authorizationRequired\" = true; Workflow Definition task impersonation_allowed = false | Workflow is rejected with an error that task requires impersonation\nScenario #4 | Task Definition \"authorizationRequired\" = false; Workflow Definition task impersonation_allowed = true | Task runs successfully\n\n\n# Batch Workflows \n\nThis example is from a batch workflow named \"batch_test.\" This is the batch value list for the input port named \"input_data.\"\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"      {\\n       \\\"name\\\": \\\"batch_test\\\",\\n       \\\"batch_values\\\": [\\n           {\\n               \\\"name\\\": \\\"input_data\\\",\\n               \\\"values\\\": [\\n                   \\\"CAT1\\\",\\n                   \\\"CAT2\\\",\\n                   \\\"CAT3\\\"\\n               ]\\n           }\",\n      \"language\": \"json\"\n    }\n  ]\n}\n[/block]\nThis is the **old** method for defining a batch value for a port descriptor. With this breaking change, this format will cause an error. \n\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \" \\\"inputs\\\": [\\n           {\\n             \\\"name\\\": \\\"data\\\",\\n             \\\"value\\\": \\\"$batch_value:input_data\\\"\\n            }\",\n      \"language\": \"json\"\n    }\n  ]\n}\n[/block]\nThis is the **new** method for defining a batch value for a port descriptor:\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"  \\n     \\\"inputs\\\": [\\n           {\\n             \\\"name\\\": \\\"data\\\",\\n             \\\"value\\\": \\\"{{input_data}}\\\"\\n           }\",\n      \"language\": \"json\"\n    }\n  ]\n}\n[/block]\n\n## Batch workflow overview\nWith batch workflows, you can submit multiple workflows with one API request. A batch workflow request spawns multiple  workflows that run the same tasks on multiple inputs concurrently. \n\nIn a batch workflow request, a single workflow definition is created, but multiple input values are allowed for an input port. Each input value is processed as a separate workflow. Each instance will have its own workflow ID.\n\nIn a batch workflow definition, output ports can also have batch values. When this is true, the batch value list for the output port descriptor must have the same number of values as the input ports for that task. \n\n## Batch Workflow API\nThe API endpoint for submitting batch workflows is  \n\n    /workflows/v1/batch_workflows\n\n This endpoint does not replace workflows/v1/workflows. Use this endpoint to submit a single workflow request. \n\nThe batch workflow endpoint, workflows/v1/batch_workflows, accepts multiple values for task inputs in one submission. \n\n## Batch workflow definition\n\nIn the batch workflow definition: \n\n* the batch workflow is given a name\n* the batch value lists are defined for each input or output\n* the tasks are defined\n* input ports and output ports are defined to accept batch values if applicable.\n\n### Business Rules\n1. The number of values in the batch value lists determines the number of workflows that will be started by the batch workflow request. \n\n2. All inputs and outputs that accept batch values must have the same number of values in the list. If not, the batch workflow will error out. \n\n3. The batch value names in the batch workflow definition must be identical to the batch value names in the task definition. \n\nFor example, if the name of batch value list in the workflow definition is \"input_data\", the batch value name defined on the input port must also be \"input_data.\"\n\nThis batch value list  is named \"input_data.\" \n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"      \\\"batch_values\\\": [\\n           {\\n               \\\"name\\\": \\\"input_data\\\",\\n               \\\"values\\\": [\\n                   \\\"CAT1\\\",\\n                   \\\"CAT2\\\",\\n                   \\\"CAT3\\\"\\n               ]\\n           },\",\n      \"language\": \"json\"\n    }\n  ]\n}\n[/block]\n\nThe list above corresponds to the value for the input port named \"data.\" The value in double curly brackets is also named \"input_data.\"\n\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"  \\\"tasks\\\": [\\n           {\\n               \\\"name\\\": \\\"task_1\\\",\\n               \\\"outputs\\\": [\\n                   {\\n                       \\\"name\\\": \\\"data\\\",\\n                       \\\"persist\\\": true,\\n                       \\\"persistLocation\\\": \\\"{{destination}}\\\"\\\"\\n                   }\\n               ],\\n               \\\"inputs\\\": [\\n                   {\\n                       \\\"name\\\": \\\"data\\\",\\n                       \\\"value\\\": \\\"{{input_data}}\\\"\\n                   }\",\n      \"language\": \"json\"\n    }\n  ]\n}\n[/block]\n  \n## Batch values\n\nThe batch workflow definition includes a section called \"batch_values.\" In the example below, each list has three values.  As a result, three workflows will be spawned and run in parallel.  \n\nBatch value lists must have the same number of values. The first value in the first input + the first value in the second input + the first value of the third input will create the first workflow.\n\n## Batch workflow request example\n\nThis is an example of a batch workflow request. In this example, there are two input ports and one output port that accept batch values. Each batch value list has three values. \n\n\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"{\\n       \\\"name\\\": \\\"batch_test\\\",\\n       \\\"batch_values\\\": [\\n           {\\n               \\\"name\\\": \\\"input_data\\\",\\n               \\\"values\\\": [\\n                   \\\"CAT1\\\",\\n                   \\\"CAT2\\\",\\n                   \\\"CAT3\\\"\\n               ]\\n           },\\n           {\\n               \\\"name\\\": \\\"input_dem\\\",\\n               \\\"values\\\": [\\n                   \\\"SRTM90\\\",\\n                   \\\"SRTM120\\\",\\n                   \\\"SRTM150\\\"\\n               ]\\n           },\\n           {\\n               \\\"name\\\": \\\"destination\\\",\\n               \\\"values\\\": [\\n                   \\\"result/1\\\",\\n                   \\\"result/2\\\",\\n                   \\\"result/3\\\"\\n               ]\\n           }\\n       ],\\n       \\\"tasks\\\": [\\n           {\\n               \\\"name\\\": \\\"task_1\\\",\\n               \\\"outputs\\\": [\\n                   {\\n                       \\\"name\\\": \\\"data\\\",\\n                       \\\"persist\\\": true,\\n                       \\\"persistLocation\\\": \\\"{{destination}}\\\"\\\"\\n                   }\\n               ],\\n               \\\"inputs\\\": [\\n                   {\\n                       \\\"name\\\": \\\"data\\\",\\n                       \\\"value\\\": \\\"{{input_data}}\\\"\\n                   },\\n                   {\\n                       \\\"name\\\": \\\"demspecifier\\\",\\n                       \\\"value\\\": \\\"{{input_dem}}\\\"\\n                   }\\n               ],\\n               \\\"taskType\\\": \\\"test_task\\\"\\n           }\\n       ]\\n    }\",\n      \"language\": \"json\"\n    }\n  ]\n}\n[/block]\n    \n\n\n## Rate Limits\nBatch workflows can run up to 100 workflows concurrently.  This number may vary, depending on your account limits. If too many inputs are submitted in a single batch workflow request, the system will return a \"too many workflows launched\" error message. \n\n## Checking the batch workflow's status\nChecking a batch workflow's status will return a list of all workflows in the batch and their individual states. Although there is a batch workflow ID, there is no batch workflow state.\n\nTo check the states of the workflows in a batch workflow, see [Get a batch workflow](doc:get-a-batch-workflow) .\n\n## Canceling a Batch Workflow\nSubmitting a request to cancel a batch workflow will cancel all individual workflows in the batch that have not already completed. For more about canceling a batch workflow, see [Cancel a batch workflow](doc:cancel-a-batch-workflow). \n\nTo cancel a single workflow in a batch, use the workflows endpoint to submit the request. See [Cancel a Workflow](doc:cancel-a-workflow) .\n\n## Error messaging\n\nIf the value names in the workflow and task sections don't match, or If there is a batch value list in the workflow definition, but no corresponding batch value, the system will return the following error:\n\n\"batch value names do not match in the workflow JSON\"\n\nIf the number of items in the input value lists don't match, the system will return the following error:\n\n\"invalid number of parameters. batch values must be consistent across all workflow.\"\n \n\n# Multiplex Ports \n\nMultiplex ports can be used when a task has a variable number of inputs and outputs. A port is defined as \"multiplex\" during task definition. Then when a workflow runs the task, multiple inputs can be defined for that port. \n\n## Create a multiplex port in a task\n\nMultiplex ports are set in the task definition. To define a task input port as a multiplex port,  \n1.\tGive the input port a name\n2.\tAdd \"multiplex\": true to the inputPort Descriptors\n\nFor example, in the task “DGLayers”, the input port named “SRC” can accept more than one input  because it was defined as a multiplex port. The number of inputs that will be accepted is determined when the task is run as part of a workflow. \n\nWhen the DGLayers task was defined, this port was given a single name, “SRC”, and \"multiplex\": true was included in the inputportDescriptor.\n\n\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"\\\"inputPortDescriptors\\\": [\\n  {\\n    \\\"name\\\": \\\"SRC\\\",\\n    \\\"required\\\": true,\\n    \\\"type\\\": \\\"directory\\\",\\n    \\\"description\\\": \\\"S3 path containing input layers.\\\",\\n    \\\"multiplex\\\": true\\n  },\\n  {\\n    \\\"name\\\": \\\"recipe_filename\\\",\\n    \\\"required\\\": true,\\n    \\\"type\\\": \\\"string\\\"\\n  },\\n\",\n      \"language\": \"json\"\n    }\n  ]\n}\n[/block]\nOutput ports can also be set up as multiplex if the number of output files will vary. \n\nIn the DGLayers example, the outputPort “DST” is a multiplex port. \n\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"\\\"outputPortDescriptors\\\": [\\n    {\\n      \\\"required\\\": true,\\n      \\\"type\\\": \\\"directory\\\",\\n      \\\"multiplex\\\": true,\\n      \\\"name\\\": \\\"DST\\\",\\n      \\\"description\\\": \\\"Output directory.\\\"\\n    },\",\n      \"language\": \"json\"\n    }\n  ]\n}\n[/block]\nWhen the task is run as part of a workflow, the multiple values for the multiplex port are defined.  This example shows the convention used to describe the values. \n\n\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"“inputs”: [\\n{\\n    “name”: “SRC_1”,\\n    “value”: “val_1”\\n},\\n{\\n    “name”: “SRC_2”,\\n    “value”: “val_2”\\n}\",\n      \"language\": \"json\"\n    }\n  ]\n}\n[/block]\n### gbdxtools example\n\nIf you're running <a href=\"https://github.com/digitalglobe/gbdxtools\" target=\"_blank\">**gbdxtools**</a>, use this example. Multiplex ports are supported like normal ports when a workflow is run. \n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"dgl_task = gbdx.Task(\\\"DGLayers_v_2_0\\\", SRC_1=”val_1”, SRC_2=”val_2”, SRC_3=”val_3”, recipe_dir=recipe_dir, recipe_filename=recipe_filename)\\nsave_dgl_task = gbdx.Task(\\\"StageDataToS3\\\",data=dgl_task.outputs.DST.value,destination=out_classmap_loc)\\nworkflow = gbdx.Workflow([ dgl_task,save_dgl_task ])\",\n      \"language\": \"json\"\n    }\n  ]\n}\n[/block]\n\nNote: Name/Source pairs are supported in addition to name/value pairs.\n\n# Workflow Library Tasks\n\nA workflow library task contains a workflow definition. The purpose of the workflow library task is to let a user run the same series of tasks multiple times without having to recreate the workflow. \n\nFor example,  let's say you want to run the following tasks in this order:\n1. AOP_Strip_Processor\n2. protogenV2PANTEX10\n3. StageDataToS3\n\nTo run these tasks, you create a workflow. The workflow defines which task are run, the input values and other optional values for those tasks, and the order they're run in.\n\nNow you want to run this same set of tasks again, and you'll likely run them many times in the future.  This is where the workflow library task is useful. You'll create a task that includes that workflow definition in it. Then you'll run a workflow that runs only the workflow library task.\n\n## Rules for Workflow Library Tasks\n•Workflow Library Tasks require a name\n•Workflow Library Tasks require at least one input port\n•Workflow Library Task ports require a name and a type\n•Workflow Library Tasks require a taskSequenceDescriptor. \n•TaskSequenceDescriptor has the same requirements as a workflow with exception of an outputMapping object\n•Values are mapped from the inputPortDescriptors to the taskSequenceDescriptor via \"$ref\"\n\n## Register a Workflow Library Task:\nA task is registered by posting the task definitions to the Tasks endpoint.\n\n POST  to  /workflows/v1/tasks \n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"{\\n    \\\"name\\\": \\\"test_sequence\\\",\\n    \\\"description\\\": \\\"Runs a sequence of task.\\\",\\n    \\\"inputPortDescriptors\\\": [\\n        {\\n            \\\"name\\\": \\\"input_1\\\",\\n            \\\"description\\\": \\\"String input 1.\\\",\\n            \\\"required\\\": true,\\n            \\\"type\\\": \\\"string\\\"\\n        },\\n        {\\n            \\\"name\\\": \\\"input_2\\\",\\n            \\\"description\\\": \\\"String input 2.\\\",\\n            \\\"type\\\": \\\"string\\\"\\n        }\\n    ],\\n    \\\"outputPortDescriptors\\\": [\\n        {\\n            \\\"name\\\": \\\"output_1\\\",\\n            \\\"type\\\": \\\"string\\\"\\n        }\\n    ],\\n    \\\"taskSequenceDescriptor\\\": {\\n        \\\"name\\\": \\\"test_name\\\",\\n        \\\"tasks\\\": [\\n            {\\n                \\\"name\\\": \\\"Task_1\\\",\\n                \\\"taskType\\\": \\\"test-success\\\",\\n                \\\"inputs\\\": [\\n                    {\\n                        \\\"name\\\": \\\"inputstring\\\",\\n                        \\\"value\\\": \\\"$ref:input_1\\\"\\n                    }\\n                ],\\n                \\\"outputs\\\": [\\n                    {\\n                        \\\"name\\\": \\\"dependency_output\\\"\\n                    }\\n                ]\\n            },\\n            {\\n                \\\"name\\\": \\\"Task_2\\\",\\n                \\\"taskType\\\": \\\"test-success\\\",\\n                \\\"inputs\\\": [\\n                    {\\n                        \\\"name\\\": \\\"inputstring\\\",\\n                        \\\"source\\\": \\\"Task_1:dependency_output\\\"\\n                    },\\n                    {\\n                        \\\"name\\\": \\\"dependency_input\\\",\\n                        \\\"value\\\": \\\"$ref:input_2\\\"\\n                    }\\n                ],\\n                \\\"outputs\\\": [\\n                    {\\n                        \\\"name\\\": \\\"dependency_output\\\"\\n                    }\\n                ]\\n            }\\n        ],\\n        \\\"outputMapping\\\": [\\n            {\\n                \\\"name\\\": \\\"output_1\\\",\\n                \\\"source\\\": \\\"Task_2:dependency_output\\\"\\n            }\\n        ]\\n    }\\n}\\n\",\n      \"language\": \"json\"\n    }\n  ]\n}\n[/block]\n\nResponse:\n    test_sequence successfully registered.\n\n##Run a workflow library task in a workflow\n\nThis is a sample workflow library task that would be run in a workflow. \n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"{\\n    \\\"name\\\": \\\"Sample workflow with task sequence.\\\",\\n    \\\"tasks\\\": [\\n        {\\n            \\\"name\\\": \\\"Task_Sequence_1\\\",\\n            \\\"taskType\\\": \\\"test_sequence\\\",\\n            \\\"inputs\\\": [\\n                {\\n                    \\\"name\\\": \\\"input_1\\\",\\n                    \\\"value\\\": \\\"test123\\\"\\n                }\\n            ],\\n            \\\"outputs\\\": [\\n                {\\n                    \\\"name\\\": \\\"output_1\\\"\\n                }\\n            ]\\n        }\\n    ]\\n}\",\n      \"language\": \"json\"\n    }\n  ]\n}\n[/block]\n# Auto Ordering Task\n\nThe auto ordering task places an order for a catalog ID. If the catalog ID has already been delivered to the catalog, the task will return its S3 location. If the catalog ID needs to be ordered, the task will place the order and wait for it to be delivered. Once the order has been delivered, the task will return the S3 location.\n\n**GBDX Registered Task Name**: Auto_Ordering\n**Input**: a single Catalog ID\n**Output**: S3 location for the catalog ID\n\n## Task Definition\nThis is the task definition for the auto ordering task. \n\n```json\n{\n    \"inputPortDescriptors\": [\n        {\n            \"required\": true,\n            \"description\": \"Catalog Id\",\n            \"name\": \"cat_id\",\n            \"type\": \"string\"\n        }\n    ],\n    \"outputPortDescriptors\": [\n        {\n            \"name\": \"s3_location\",\n            \"type\": \"string\"\n        }\n    ],\n    \"containerDescriptors\": [\n        {\n            \"type\": \"DOCKER\",\n            \"command\": \"\",\n            \"properties\": {\n                \"image\": \"tdgp/auto_ordering\",\n                \"mounts\": [\n                    {\n                        \"local\": \"$task_data_dir\",\n                        \"container\": \"/mnt/work\",\n                        \"read_only\": false\n                    }\n                ]\n            }\n        }\n    ],\n    \"description\": \"GBDX Auto Ordering Task\",\n    \"name\": \"Auto_Ordering\",\n    \"properties\": {\n        \"authorizationRequired\": true,\n        \"timeout\": 36000,\n        \"isPublic\": true\n    }\n}\n```\n\n## How to Use the Auto Ordering Task\n\nSubmit the following Workflow to the workflows/v1/workflows endpoint with a valid Cat_ID:\n\n```json\n    {\n        \"name\": \"Auto_Ordering_Workflow\",\n        \"tasks\": [\n            {\n                \"outputs\": [\n                    {\n                        \"name\": \"s3_location\"\n                    }\n                ],\n                \"name\": \"auto_ordering\",\n                \"taskType\": \"Auto_Ordering\",\n                \"impersonation_allowed\": true,\n                \"inputs\": [\n                    {\n                        \"name\": \"data\",\n                        \"value\": \"<string>\"\n                    }\n                ]\n            }\n        ]\n    }\n```\n\n## Rules for the Auto Ordering Task\n\nThe Auto Ordering task will submit an order using your GBDX credentials to the Ordering Endpoint `orders/v2/ordercb` The \"ordercb\" endpoint returns a callback when the order is delivered. \n\n- If the order has already been delivered, the  task will return the S3 location immediately.\n\n- If the order has not been delivered, the task will wait for the order to be fulfilled. \n\n- Once the order is delivered,a callback is sent. When the callback is received, the  Auto Ordering task will resume and return the S3 Location.\n    \n- When the Auto Ordering task is in the \"waiting\" state, it will return the order ID as a \"note.\" You can use this order ID to query the Orders API for status. See [Get Order Status v2](doc:get-order-status-v2). \n\n- The S3 Location returned from the Auto Ordering task can be piped into another task by using the output port `s3_location`\n\n## Workflow Events for the Auto Ordering Task\nThe following are workflow events for the Auto Ordering Task:\n\n    {\n        \"Events\": [\n            {\n                \"task\": \"auto_ordering\",\n                \"timestamp\": \"2016-09-01T19:15:42.795775+00:00\",\n                \"when\": \"20 minutes ago\",\n                \"note\": \"\",\n                \"state\": \"pending\",\n                \"event\": \"submitted\"\n            },\n            {\n                \"task\": \"auto_ordering\",\n                \"timestamp\": \"2016-09-01T19:16:45.145101+00:00\",\n                \"when\": \"19 minutes ago\",\n                \"note\": \"instance_id: i-86f35cb7\",\n                \"state\": \"pending\",\n                \"event\": \"scheduled\"\n            },\n            {\n                \"task\": \"auto_ordering\",\n                \"timestamp\": \"2016-09-01T19:16:45.225323+00:00\",\n                \"when\": \"19 minutes ago\",\n                \"note\": \"instance_id: i-14933a25, domain: default\",\n                \"state\": \"running\",\n                \"event\": \"started\"\n            },\n            {\n                \"task\": \"auto_ordering\",\n                \"timestamp\": \"2016-09-01T19:17:40.404544+00:00\",\n                \"when\": \"18 minutes ago\",\n                \"note\": \"instance_id: i-14933a25,  Note: Waiting for Ordering System to Complete Order. OrderId: b77224ec-3015-4460-89f9-17ddaba9a6bc\",\n                \"state\": \"pending\",\n                \"event\": \"waiting\"\n            },\n            {\n                \"task\": \"auto_ordering\",\n                \"timestamp\": \"2016-09-01T19:33:57.300772+00:00\",\n                \"when\": \"2 minutes ago\",\n                \"note\": \"instance_id: i-86f35cb7\",\n                \"state\": \"pending\",\n                \"event\": \"scheduled\"\n            },\n            {\n                \"task\": \"auto_ordering\",\n                \"timestamp\": \"2016-09-01T19:33:57.417425+00:00\",\n                \"when\": \"2 minutes ago\",\n                \"note\": \"instance_id: i-959039a4, domain: default\",\n                \"state\": \"running\",\n                \"event\": \"started\"\n            },\n            {\n                \"task\": \"auto_ordering\",\n                \"timestamp\": \"2016-09-01T19:35:23.740966+00:00\",\n                \"when\": \"seconds ago\",\n                \"note\": \"instance_id: i-959039a4,  Note: Waiting for Ordering System to Complete Order. OrderId: b77224ec-3015-4460-89f9-17ddaba9a6bc, Order has been Completed, Location: s3://receiving-dgcs-tdgplatform-com/055567707010_01_003\",\n                \"state\": \"complete\",\n                \"event\": \"succeeded\"\n            }\n        ]\n    }\n\n\n## Workflow and Task Callbacks\n\nWorkflows and tasks within a workflow can have a URL callback defined. The callback is an HTTP POST to the URL that is defined in the task or workflow. It includes a small JSON packet describing the event that triggered the callback. \n\nWorkflow callbacks are set as part of the workflow definition.\n\nTask callbacks are only set at the \"run\" time of a task, not definition time. That means you'll include it in the task definition within the workflow, not in the task definition in the task registry. \n\n### Task Callbacks\nTo set a callback on a Task, add the following to the Workflow JSON that is passed when starting a workflow with a set of tasks:\n\n\n        {\n          \"name\": \"AOP_Strip_Processor_protogenV2LULC\",\n          \"callback\": \"http://www.somehost.tld/some/url\",\n          \"tasks\": [\n            {\n              \"name\": \"AOP\",\n              \"outputs\": [\n                {\n                  \"name\": \"log\"\n                }\n              ],\n              \"inputs\": [\n                {\n                  \"name\": \"data\",\n                  \"value\": \"<string>\"\n                }\n              ],\n              \"taskType\": \"AOP_Strip_Processor\",\n              callback\": \"http://www.somehost.tld/someother/url\"\n            }\n          ]\n        }\n \nNote that while the example shows two different URL end points, that is not a requirement. However the data that is passed to the endpoint will be different depending on whether the callback is from a task or from a workflow. The message that is sent when a task changes state is below. This is the same message structure for all event states.\n\n\n      {\n          \"userName\": \"testuser\",\n          \"payload\": {\n            \"taskDomain\": \"default\",\n            \"instanceSize\": \"LOCAL\",\n            \"workflowName\": \"AOP_Strip_Processor_protogenV2LULC\",\n            \"instanceId\": \"LOCAL\",\n            \"taskState\": \"succeeded\",\n            \"workflowId\": \"4517305590090108113\",\n            \"workflowState\": \"running\",\n            \"taskId\": \"4517305590044912472\",\n            \"taskType\": \"AOP_Strip_Processor\",\n            \"taskName\": \"Automated LULC\"\n          },\n          \"entity\": \"task\",\n          \"environment\": \"alpha\",\n          \"source\": \"worker\",\n          \"action\": \"state changed\",\n          \"properties\": {\n          },\n          \"accountId\": \"<YOUR ACCOUNT ID>\"\n        }\n\n\nA Workflow event message is similar but has fewer fields:\n\n    {\n          \"message\": {\n            \"userName\": \"testuser\",\n            \"payload\": {\n              \"workflowId\": \"4517315362701210938\",\n              \"workflowName\": \"AOP_Strip_Processor_protogenV2LULC\",\n              \"workflowState\": \"succeeded\"\n            },\n            \"entity\": \"workflow\",\n            \"environment\": \"LOCAL\",\n            \"source\": \"decider\",\n            \"action\": \"state changed\",\n            \"properties\": {\n\n            },\n            \"accountId\": \"<YOUR ACCOUNT ID>\"\n          }\n        }\n\n### Additional Information\n\nThe URL that is called with the JSON message can be any **publicly** accessible URL. For example, you won't be able to test with \"localhost\" or anything that can't be accessed from the GBDX environment. \n\nThe callback manager will wait for five seconds to connect with the URL endpoint and an additional five seconds for the endpoint to respond. The endpoint must respond with an HTTP \"200\" code from the callback. If it responds with anything else or if the call times out, the system will try again in 30 seconds. It will try this 5 times before giving up. \n\nThe current configuration accepts any SSL certificates. You can use a self-signed certificate for the callback URL.","excerpt":"Overview of the the GBDX Workflow system","slug":"workflow-api-course","type":"basic","title":"Workflow API Course"}

Workflow API Course

Overview of the the GBDX Workflow system

A "workflow" is a series of tasks chained together to run on the GBDX platform. Each "task" is an individual process that performs a specific action. In order to run as part of a workflow, a task must be registered in the workflow system's task registry. Tasks are run from Docker containers that are available via Docker Hub. User inputs must be validated against the system's task and workflow JSON schemas. When a workflow is run, the system generates events that indicate status. For more information on how tasks integrate into the workflow system, read our [Task and Workflow Course](doc:task-and-workflow-course) #Workflow Resources The Workflow API allows users to: * register a task in the workflow system's task registry * manage tasks in the task registry * create a new workflow by submitting a workflow definition * see the status of a workflow * search for workflows by status, time range, or owner. [block:parameters] { "data": { "h-0": "Resource", "h-1": "Description", "0-0": "Workflows", "0-1": "A series of tasks chained together to complete an action", "1-0": "Tasks", "1-1": "Atomic activities that perform a given action. They have one or more inputs and can have one or more outputs.", "2-0": "Workflow Events", "2-1": "System status events generated by the system, the events for a task include submission, starting, running and completing", "3-0": "Schemas", "3-1": "JSON schemas used to validate user inputs. The two schemas are the task definition schema and the workflow schema." }, "cols": 2, "rows": 4 } [/block] [block:api-header] { "type": "basic", "title": "Registering a Task on GBDX" } [/block] If you are an algorithm producer, you will need to register your task on GBDX. This requires you to "Dockerize" your task and submit a JSON task definition to the GBDX Workflow system. ## To learn how to create and Dockerize a task to run on GBDX, see <a href="http://gbdxstories.digitalglobe.com/create-task">How to Create a Task</a> ## Add "tdgpdeploy" as a Collaborator Before you register your task in the GBDX task registry (see below), you'll need to add "tdgpdeploy" as a collaborator in Docker Hub. Depending on the type of Docker Hub account you have, you'll either add it as a direct collaborator or as a member of your read-only access collaborator team. This example shows how to add "tdgpdeploy" as a direct collaborator. Type the name in the "username" box and click "add user". It will appear in the user list with "Collaborator" access. [block:image] { "images": [ { "image": [ "https://files.readme.io/58bb0f8-Task_collaborator.png", "Task collaborator.png", 1308, 275, "#e7ecf4" ] } ] } [/block] If your Docker Hub account level includes organizations, refer to the <a href="https://docs.docker.com/docker-hub/orgs/">Docker Hub documentation</a> to learn how to add tdgpdeploy to your readonly access team. ## Register the Task in the GBDX Task Registry Tasks must be registered with the workflows task registry before they can be added to a workflow. If you are using a task that's already been registered on the GBDX platform, you can skip this step. To register a task, the definition JSON is created and and sent as a Body POST to the /workflows/v1/tasks endpoint. ### Rules for Tasks * Tasks require a name * Tasks require a version number. When registering a new version of a task, the version number must be incremented. * When a new task is registered, the "isPublic" task must be set to "false" or not included in the task definition. Only a superuser can set a task to "public". *If a new version of a public task is registered, the owner can set the new version to public. * Tasks require at least one input port and one output port. * Task ports require a name and a type * Tasks require one container. Only Docker is currently supported. All tasks must be defined and registered in the GBDX Task Registry. The Task definition JSON file includes the following sections: ### Create the Task Definition The task definition has four main sections: 1. Name and Properties 2. Input Port Descriptors 3. Output Port Descriptors 4. Container Descriptors #### Example Task Definition [block:code] { "codes": [ { "code": "{\n\t\"name\": \"test-success\",\n\t\"description\": \"Runs a no-op task that writes successful output status.\",\n\t\"version\": \"0.0.1\",\n\t\"properties\": {\n\t\t\"isPublic\": false,\n\t\t\"timeout\": 7200,\n \"authorizationRequired\": true\n\t},\n\t\"inputPortDescriptors\": [{\n\t\t\t\"required\": true,\n\t\t\t\"description\": \"A string input.\",\n\t\t\t\"name\": \"inputstring\",\n\t\t\t\"type\": \"string\"\n\t\t},\n\t\t{\n\t\t\t\"name\": \"dependency_input\",\n\t\t\t\"type\": \"string\"\n\t\t}\n\t],\n\t\"outputPortDescriptors\": [{\n\t\t\"name\": \"dependency_output\",\n\t\t\"type\": \"string\"\n\t}],\n\t\"containerDescriptors\": [{\n\t\t\"type\": \"DOCKER\",\n\t\t\"command\": \"\",\n\t\t\"properties\": {\n\t\t\t\"image\": \"tdgp/test-success\",\n\t\t\t\"mounts\": [{\n\t\t\t\t\"local\": \"$task_data_dir\",\n\t\t\t\t\"container\": \"/mnt/work\",\n\t\t\t\t\"read_only\": false\n\t\t\t}]\n\t\t}\n\t}]\n\n}", "language": "json" } ] } [/block] #### Name, Version, Description, and Properties Element | Description --- | --- Name | The name you define for the task Version | The version number for the task. New tasks default to 0.0.1 if the version is not defined. See [How to Version a Task](doc:how-to-version-a-task) Description | A brief description of the task Properties | See below The properties section includes the following: Property | Description --- | --- isPublic | Boolean value that determines whether a task is private or public. timeout | The task will time out if not completed within the set time frame. The timeout value is in seconds. See below for minimum, default, and maximum values allowed. authorizationRequired | Boolean value. If authorizationRequired is True, the task will capture the requestor's GBDX token and make GBDX requests on their behalf. If the task has impersionation_allowed is True in the Workflow definition, authorizationRequired is required to be True. See [Task User Impersonation](#task-user-impersonation) for details. Note: The properties section is not required. If the properties are not included as part of the task definition the default values will be used. See below for default values. However, if the properties section is part of your task definition, the "timeout" property is required and a value must be set. Example: ### Private and Public tasks These are the business rules for the "isPublic" flag: * New tasks must be registered as private tasks. * Only a member of the GBDX team at DigitalGlobe can change a new task from "private" to "public". *If a new version of a public task is published, the task owner can set the new version to public. *The task owner can change a task from "public" to "private". When a new task is registered on GBDX, it must be set to "private". "Private" means only users from the account that registered the task can run it. ####Definitions Value | Description --- | --- Public | Any GBDX user can access or use this task, regardless of what account they're associated with. Private | Only users associated with the account the task is registered under can access or use this task. A "private" task cannot be shared between multiple accounts. To register a private task: In the JSON task definition, under "properties", set ```"isPublic": false``` If the properties section is not included in the task definition, the task will default to ```"isPublic": false```. #### Flag Settings Setting | Result --- | --- "isPublic": true | The task will be public "isPublic": false | the task will be private "isPublic": [no value] | The default value of "false" will be used. The task will be private. 'isPublic" not included in the task definition | The default value of "false" will be used. The task will be private. ####Process Before a task can be set to public, a review process is required. Setting a task to "public" is part of the algorithm submission process. The task is set to public by the GBDX team once it has been reviewed. ####Error Conditions Submitting a task definition for a new task with ```isPublic":true``` will result in the following error, and the task will not be registered. Status code:403, Message: 'Creating a public task is unauthorized.' ###Define a Time-out threshold for a Task By default, tasks time out after 7200 seconds (2 hours). The default time out value can be changed in the properties section of the task definition jSON. You can do this by setting a value in seconds for "timeout": [value]. Type | Value --- | --- Default | 7200 (2 hours) Minimum | 0 Maximum | 172800 (48 hours) Important: If the "properties" section is included in the task definition, the "timeout" property is required and a value must be set. ### Updating the Task Definition All fields in the task definition except Name and Version can be updated in a PUT request. A PUT request does not require the version to be incremented, and it does not trigger a task Docker migration. Make a ```PUT``` request to the /workflows/v1/tasks endpoint. Include the full task definition, including the changes. #### Input Port Descriptors Use this section to define the task input ports for this task. Input ports have: Element | Value --- | --- Name | The input port name Type | input port type (string, directory) Description | A short description of the input type for this port required | boolean #### Output Port Descriptors Use this section to define the task output ports for this task. Element | Value --- | --- Name | The output port name Type | output port type (string, directory) Description | A short description of the output type for this port [block:callout] { "type": "info", "title": "Important: Symbolic links should not be written as task output." } [/block] #### Container Descriptors Use this section to define the container in which the task is run. Only Docker containers are supported on GBDX. Element | Value --- | --- Type | The domain on which the task is run Image | The name of the Docker image that is pulled from Docker Hub Command | The command to run within the container To see tasks that are already registered, run [List Tasks](doc:list-tasks-in-thetask-registry) To see the task descriptor schema, run [Task Definition Schema](doc:get-the-task-definition-schema) [block:api-header] { "type": "basic", "title": "Search for a Workflow" } [/block] You can search for workflows using the following criteria. __Filter__ | __Description__ --- | --- | Search by State |The state of the workflow. For example submitted, failed, timedout, succeeded. You can only search by one state at a time. Search by Time Range | The lookback time (last N hours) is based on the time the workflow search query is run. For example, if a query is ran at 14:00 and the value is set to 2, the workflows that started at 12:00 and on will be listed. The default value is 24. Search by Owner | Search by workflow owner will return all workflows that have the specific owner listed. [block:api-header] { "type": "basic", "title": "Run a Workflow" } [/block] To run a workflow, make a POST request to https://geobigdata.io/workflows/v1/workflows with the workflow definition in the request body. The request will list each task that is to be run in the workflow, in the order it should be run, going from top to bottom. The request will include a workflow name, and the task name, task type, and inputs and outputs for each task. To see the workflow definition schema, see [Workflow Definition Schema](doc:get-the-workflow-schema) . To see an example workflow definition as a POST request body, see [Submit a Workflow](doc:submit-a-workflow) [block:api-header] { "type": "basic", "title": "Retrieve STDOUT output from a completed task in a workflow" } [/block] GET to /workflows/v1/workflows/<workflow_id>/tasks/<task_id>/stdout Response: 200 OK Response body contains the STDOUT from the Docker container that ran the task. When there is no data, the request will return "empty." [block:api-header] { "type": "basic", "title": "Retrieve STDERR output from a completed task in a workflow" } [/block] GET to /workflows/v1/workflows/<workflow_id>/tasks/<task_id>/stderr Response: 200 OK Response body contains the STDERR from the Docker container that ran the task. When there is no data, the request will return "empty." [block:api-header] { "type": "basic", "title": "Task User Impersonation" } [/block] Tasks can perform GBDX operations on behalf of a user. For example, a task can make a catalog query, place an order, or use any other GBDX API. This means a task can run in a workflow without authorization. In order for a task to performs GBDX operations on behalf of the user, the task definition and the workflow definition must allow it. To allow user impersonation: * In the task definition, "authorizationRequired" must be set to true (false is the default if the value is not set). * In the workflow definition, "impersonation_allowed" must be set to true (false is the default if the value is not set). When impersonation is allowed, the user token is passed as part of the runtime information, and will be available in gbdx_runtime.json in user_token field. ## Task Definition Name | Description | Type |Default --- | --- | --- | --- | --- authorizationRequired | Indicator that the task logic will require GBDX authorization from the running user. If this is true and during workflow invocation 'impersonation_allowed' flag is not set on the task, workflow will fail. | boolean | false While the task descriptor schema has changed, tasks that are already registered are not updated with the authorizationRequired flag. These tasks, and any new tasks registered without the flag, will default to "false." To add the flag or change the value for an existing task, the task must be re-registered. This example shows a task descriptor with "authorizationAllowed" = true. [block:code] { "codes": [ { "code": "{\n \"inputPortDescriptors\": [\n {\n \"name\": \"dummy\",\n \"type\": \"string\"\n }\n ],\n \"containerDescriptors\": [\n {\n \"type\": \"DOCKER\",\n \"command\": \"\",\n \"properties\": {\n \"image\": \"dummy\"\n }\n }\n ],\n \"description\": \"Test task\",\n \"name\": \"test-task-with-required-authorization\",\n \"properties\": {\n \"timeout\": 7200,\n \"authorizationRequired\": true\n }\n}", "language": "json" } ] } [/block] ## Workflow Definition Name | Description | Type| Default --- | --- | --- | --- | --- | --- impersonation_allowed | when set to true, the task can use user security token. Impersonation is allowed. | boolean | false When there are multiple tasks defined in a single workflow, the tasks can have different values and the workflow will run successfully. This example shows a workflow definition with three tasks. Two allow impersonation and one does not. [block:code] { "codes": [ { "code": "{\n \"name\": \"AOP_stagetoS3\",\n \"tasks\": [\n {\n \"name\": \"AOP\",\n \"outputs\": [\n {\n \"name\": \"data\"\n },\n {\n \"name\": \"log\"\n }\n ],\n \"inputs\": [\n {\n \"name\": \"data\",\n \"value\": \"<bucket location>\"\n },\n {\n \"name\": \"enable_acomp\",\n \"value\": \"true\"\n }\n ],\n \"taskType\": \"AOP_Strip_Processor\",\n \"impersonation_allowed\": false\n },\n {\n \"name\": \"StagetoS3_Data\",\n \"inputs\": [\n {\n \"name\": \"data\",\n \"source\": \"AOP:data\"\n },\n {\n \"name\": \"destination\",\n \"value\": \"<bucket location>\"\n }\n ],\n \"taskType\": \"StageDataToS3\",\n \"impersonation_allowed\": true\n }\n ]\n}\n", "language": "json" } ] } [/block] ## Scenarios Number | Scenario | Result --- | --- | --- Scenario #1 | Task Definition "authorizationRequired" = true; Workflow Definition task impersonation_allowed = true | Task runs successfully Scenario #2 | Task Definition "authorizationRequired" = false; Workflow Definition task impersonation_allowed = false | Task runs successfully Scenario #3 |Task Definition "authorizationRequired" = true; Workflow Definition task impersonation_allowed = false | Workflow is rejected with an error that task requires impersonation Scenario #4 | Task Definition "authorizationRequired" = false; Workflow Definition task impersonation_allowed = true | Task runs successfully # Batch Workflows This example is from a batch workflow named "batch_test." This is the batch value list for the input port named "input_data." [block:code] { "codes": [ { "code": " {\n \"name\": \"batch_test\",\n \"batch_values\": [\n {\n \"name\": \"input_data\",\n \"values\": [\n \"CAT1\",\n \"CAT2\",\n \"CAT3\"\n ]\n }", "language": "json" } ] } [/block] This is the **old** method for defining a batch value for a port descriptor. With this breaking change, this format will cause an error. [block:code] { "codes": [ { "code": " \"inputs\": [\n {\n \"name\": \"data\",\n \"value\": \"$batch_value:input_data\"\n }", "language": "json" } ] } [/block] This is the **new** method for defining a batch value for a port descriptor: [block:code] { "codes": [ { "code": " \n \"inputs\": [\n {\n \"name\": \"data\",\n \"value\": \"{{input_data}}\"\n }", "language": "json" } ] } [/block] ## Batch workflow overview With batch workflows, you can submit multiple workflows with one API request. A batch workflow request spawns multiple workflows that run the same tasks on multiple inputs concurrently. In a batch workflow request, a single workflow definition is created, but multiple input values are allowed for an input port. Each input value is processed as a separate workflow. Each instance will have its own workflow ID. In a batch workflow definition, output ports can also have batch values. When this is true, the batch value list for the output port descriptor must have the same number of values as the input ports for that task. ## Batch Workflow API The API endpoint for submitting batch workflows is /workflows/v1/batch_workflows This endpoint does not replace workflows/v1/workflows. Use this endpoint to submit a single workflow request. The batch workflow endpoint, workflows/v1/batch_workflows, accepts multiple values for task inputs in one submission. ## Batch workflow definition In the batch workflow definition: * the batch workflow is given a name * the batch value lists are defined for each input or output * the tasks are defined * input ports and output ports are defined to accept batch values if applicable. ### Business Rules 1. The number of values in the batch value lists determines the number of workflows that will be started by the batch workflow request. 2. All inputs and outputs that accept batch values must have the same number of values in the list. If not, the batch workflow will error out. 3. The batch value names in the batch workflow definition must be identical to the batch value names in the task definition. For example, if the name of batch value list in the workflow definition is "input_data", the batch value name defined on the input port must also be "input_data." This batch value list is named "input_data." [block:code] { "codes": [ { "code": " \"batch_values\": [\n {\n \"name\": \"input_data\",\n \"values\": [\n \"CAT1\",\n \"CAT2\",\n \"CAT3\"\n ]\n },", "language": "json" } ] } [/block] The list above corresponds to the value for the input port named "data." The value in double curly brackets is also named "input_data." [block:code] { "codes": [ { "code": " \"tasks\": [\n {\n \"name\": \"task_1\",\n \"outputs\": [\n {\n \"name\": \"data\",\n \"persist\": true,\n \"persistLocation\": \"{{destination}}\"\"\n }\n ],\n \"inputs\": [\n {\n \"name\": \"data\",\n \"value\": \"{{input_data}}\"\n }", "language": "json" } ] } [/block] ## Batch values The batch workflow definition includes a section called "batch_values." In the example below, each list has three values. As a result, three workflows will be spawned and run in parallel. Batch value lists must have the same number of values. The first value in the first input + the first value in the second input + the first value of the third input will create the first workflow. ## Batch workflow request example This is an example of a batch workflow request. In this example, there are two input ports and one output port that accept batch values. Each batch value list has three values. [block:code] { "codes": [ { "code": "{\n \"name\": \"batch_test\",\n \"batch_values\": [\n {\n \"name\": \"input_data\",\n \"values\": [\n \"CAT1\",\n \"CAT2\",\n \"CAT3\"\n ]\n },\n {\n \"name\": \"input_dem\",\n \"values\": [\n \"SRTM90\",\n \"SRTM120\",\n \"SRTM150\"\n ]\n },\n {\n \"name\": \"destination\",\n \"values\": [\n \"result/1\",\n \"result/2\",\n \"result/3\"\n ]\n }\n ],\n \"tasks\": [\n {\n \"name\": \"task_1\",\n \"outputs\": [\n {\n \"name\": \"data\",\n \"persist\": true,\n \"persistLocation\": \"{{destination}}\"\"\n }\n ],\n \"inputs\": [\n {\n \"name\": \"data\",\n \"value\": \"{{input_data}}\"\n },\n {\n \"name\": \"demspecifier\",\n \"value\": \"{{input_dem}}\"\n }\n ],\n \"taskType\": \"test_task\"\n }\n ]\n }", "language": "json" } ] } [/block] ## Rate Limits Batch workflows can run up to 100 workflows concurrently. This number may vary, depending on your account limits. If too many inputs are submitted in a single batch workflow request, the system will return a "too many workflows launched" error message. ## Checking the batch workflow's status Checking a batch workflow's status will return a list of all workflows in the batch and their individual states. Although there is a batch workflow ID, there is no batch workflow state. To check the states of the workflows in a batch workflow, see [Get a batch workflow](doc:get-a-batch-workflow) . ## Canceling a Batch Workflow Submitting a request to cancel a batch workflow will cancel all individual workflows in the batch that have not already completed. For more about canceling a batch workflow, see [Cancel a batch workflow](doc:cancel-a-batch-workflow). To cancel a single workflow in a batch, use the workflows endpoint to submit the request. See [Cancel a Workflow](doc:cancel-a-workflow) . ## Error messaging If the value names in the workflow and task sections don't match, or If there is a batch value list in the workflow definition, but no corresponding batch value, the system will return the following error: "batch value names do not match in the workflow JSON" If the number of items in the input value lists don't match, the system will return the following error: "invalid number of parameters. batch values must be consistent across all workflow." # Multiplex Ports Multiplex ports can be used when a task has a variable number of inputs and outputs. A port is defined as "multiplex" during task definition. Then when a workflow runs the task, multiple inputs can be defined for that port. ## Create a multiplex port in a task Multiplex ports are set in the task definition. To define a task input port as a multiplex port, 1. Give the input port a name 2. Add "multiplex": true to the inputPort Descriptors For example, in the task “DGLayers”, the input port named “SRC” can accept more than one input because it was defined as a multiplex port. The number of inputs that will be accepted is determined when the task is run as part of a workflow. When the DGLayers task was defined, this port was given a single name, “SRC”, and "multiplex": true was included in the inputportDescriptor. [block:code] { "codes": [ { "code": "\"inputPortDescriptors\": [\n {\n \"name\": \"SRC\",\n \"required\": true,\n \"type\": \"directory\",\n \"description\": \"S3 path containing input layers.\",\n \"multiplex\": true\n },\n {\n \"name\": \"recipe_filename\",\n \"required\": true,\n \"type\": \"string\"\n },\n", "language": "json" } ] } [/block] Output ports can also be set up as multiplex if the number of output files will vary. In the DGLayers example, the outputPort “DST” is a multiplex port. [block:code] { "codes": [ { "code": "\"outputPortDescriptors\": [\n {\n \"required\": true,\n \"type\": \"directory\",\n \"multiplex\": true,\n \"name\": \"DST\",\n \"description\": \"Output directory.\"\n },", "language": "json" } ] } [/block] When the task is run as part of a workflow, the multiple values for the multiplex port are defined. This example shows the convention used to describe the values. [block:code] { "codes": [ { "code": "“inputs”: [\n{\n “name”: “SRC_1”,\n “value”: “val_1”\n},\n{\n “name”: “SRC_2”,\n “value”: “val_2”\n}", "language": "json" } ] } [/block] ### gbdxtools example If you're running <a href="https://github.com/digitalglobe/gbdxtools" target="_blank">**gbdxtools**</a>, use this example. Multiplex ports are supported like normal ports when a workflow is run. [block:code] { "codes": [ { "code": "dgl_task = gbdx.Task(\"DGLayers_v_2_0\", SRC_1=”val_1”, SRC_2=”val_2”, SRC_3=”val_3”, recipe_dir=recipe_dir, recipe_filename=recipe_filename)\nsave_dgl_task = gbdx.Task(\"StageDataToS3\",data=dgl_task.outputs.DST.value,destination=out_classmap_loc)\nworkflow = gbdx.Workflow([ dgl_task,save_dgl_task ])", "language": "json" } ] } [/block] Note: Name/Source pairs are supported in addition to name/value pairs. # Workflow Library Tasks A workflow library task contains a workflow definition. The purpose of the workflow library task is to let a user run the same series of tasks multiple times without having to recreate the workflow. For example, let's say you want to run the following tasks in this order: 1. AOP_Strip_Processor 2. protogenV2PANTEX10 3. StageDataToS3 To run these tasks, you create a workflow. The workflow defines which task are run, the input values and other optional values for those tasks, and the order they're run in. Now you want to run this same set of tasks again, and you'll likely run them many times in the future. This is where the workflow library task is useful. You'll create a task that includes that workflow definition in it. Then you'll run a workflow that runs only the workflow library task. ## Rules for Workflow Library Tasks •Workflow Library Tasks require a name •Workflow Library Tasks require at least one input port •Workflow Library Task ports require a name and a type •Workflow Library Tasks require a taskSequenceDescriptor. •TaskSequenceDescriptor has the same requirements as a workflow with exception of an outputMapping object •Values are mapped from the inputPortDescriptors to the taskSequenceDescriptor via "$ref" ## Register a Workflow Library Task: A task is registered by posting the task definitions to the Tasks endpoint. POST to /workflows/v1/tasks [block:code] { "codes": [ { "code": "{\n \"name\": \"test_sequence\",\n \"description\": \"Runs a sequence of task.\",\n \"inputPortDescriptors\": [\n {\n \"name\": \"input_1\",\n \"description\": \"String input 1.\",\n \"required\": true,\n \"type\": \"string\"\n },\n {\n \"name\": \"input_2\",\n \"description\": \"String input 2.\",\n \"type\": \"string\"\n }\n ],\n \"outputPortDescriptors\": [\n {\n \"name\": \"output_1\",\n \"type\": \"string\"\n }\n ],\n \"taskSequenceDescriptor\": {\n \"name\": \"test_name\",\n \"tasks\": [\n {\n \"name\": \"Task_1\",\n \"taskType\": \"test-success\",\n \"inputs\": [\n {\n \"name\": \"inputstring\",\n \"value\": \"$ref:input_1\"\n }\n ],\n \"outputs\": [\n {\n \"name\": \"dependency_output\"\n }\n ]\n },\n {\n \"name\": \"Task_2\",\n \"taskType\": \"test-success\",\n \"inputs\": [\n {\n \"name\": \"inputstring\",\n \"source\": \"Task_1:dependency_output\"\n },\n {\n \"name\": \"dependency_input\",\n \"value\": \"$ref:input_2\"\n }\n ],\n \"outputs\": [\n {\n \"name\": \"dependency_output\"\n }\n ]\n }\n ],\n \"outputMapping\": [\n {\n \"name\": \"output_1\",\n \"source\": \"Task_2:dependency_output\"\n }\n ]\n }\n}\n", "language": "json" } ] } [/block] Response: test_sequence successfully registered. ##Run a workflow library task in a workflow This is a sample workflow library task that would be run in a workflow. [block:code] { "codes": [ { "code": "{\n \"name\": \"Sample workflow with task sequence.\",\n \"tasks\": [\n {\n \"name\": \"Task_Sequence_1\",\n \"taskType\": \"test_sequence\",\n \"inputs\": [\n {\n \"name\": \"input_1\",\n \"value\": \"test123\"\n }\n ],\n \"outputs\": [\n {\n \"name\": \"output_1\"\n }\n ]\n }\n ]\n}", "language": "json" } ] } [/block] # Auto Ordering Task The auto ordering task places an order for a catalog ID. If the catalog ID has already been delivered to the catalog, the task will return its S3 location. If the catalog ID needs to be ordered, the task will place the order and wait for it to be delivered. Once the order has been delivered, the task will return the S3 location. **GBDX Registered Task Name**: Auto_Ordering **Input**: a single Catalog ID **Output**: S3 location for the catalog ID ## Task Definition This is the task definition for the auto ordering task. ```json { "inputPortDescriptors": [ { "required": true, "description": "Catalog Id", "name": "cat_id", "type": "string" } ], "outputPortDescriptors": [ { "name": "s3_location", "type": "string" } ], "containerDescriptors": [ { "type": "DOCKER", "command": "", "properties": { "image": "tdgp/auto_ordering", "mounts": [ { "local": "$task_data_dir", "container": "/mnt/work", "read_only": false } ] } } ], "description": "GBDX Auto Ordering Task", "name": "Auto_Ordering", "properties": { "authorizationRequired": true, "timeout": 36000, "isPublic": true } } ``` ## How to Use the Auto Ordering Task Submit the following Workflow to the workflows/v1/workflows endpoint with a valid Cat_ID: ```json { "name": "Auto_Ordering_Workflow", "tasks": [ { "outputs": [ { "name": "s3_location" } ], "name": "auto_ordering", "taskType": "Auto_Ordering", "impersonation_allowed": true, "inputs": [ { "name": "data", "value": "<string>" } ] } ] } ``` ## Rules for the Auto Ordering Task The Auto Ordering task will submit an order using your GBDX credentials to the Ordering Endpoint `orders/v2/ordercb` The "ordercb" endpoint returns a callback when the order is delivered. - If the order has already been delivered, the task will return the S3 location immediately. - If the order has not been delivered, the task will wait for the order to be fulfilled. - Once the order is delivered,a callback is sent. When the callback is received, the Auto Ordering task will resume and return the S3 Location. - When the Auto Ordering task is in the "waiting" state, it will return the order ID as a "note." You can use this order ID to query the Orders API for status. See [Get Order Status v2](doc:get-order-status-v2). - The S3 Location returned from the Auto Ordering task can be piped into another task by using the output port `s3_location` ## Workflow Events for the Auto Ordering Task The following are workflow events for the Auto Ordering Task: { "Events": [ { "task": "auto_ordering", "timestamp": "2016-09-01T19:15:42.795775+00:00", "when": "20 minutes ago", "note": "", "state": "pending", "event": "submitted" }, { "task": "auto_ordering", "timestamp": "2016-09-01T19:16:45.145101+00:00", "when": "19 minutes ago", "note": "instance_id: i-86f35cb7", "state": "pending", "event": "scheduled" }, { "task": "auto_ordering", "timestamp": "2016-09-01T19:16:45.225323+00:00", "when": "19 minutes ago", "note": "instance_id: i-14933a25, domain: default", "state": "running", "event": "started" }, { "task": "auto_ordering", "timestamp": "2016-09-01T19:17:40.404544+00:00", "when": "18 minutes ago", "note": "instance_id: i-14933a25, Note: Waiting for Ordering System to Complete Order. OrderId: b77224ec-3015-4460-89f9-17ddaba9a6bc", "state": "pending", "event": "waiting" }, { "task": "auto_ordering", "timestamp": "2016-09-01T19:33:57.300772+00:00", "when": "2 minutes ago", "note": "instance_id: i-86f35cb7", "state": "pending", "event": "scheduled" }, { "task": "auto_ordering", "timestamp": "2016-09-01T19:33:57.417425+00:00", "when": "2 minutes ago", "note": "instance_id: i-959039a4, domain: default", "state": "running", "event": "started" }, { "task": "auto_ordering", "timestamp": "2016-09-01T19:35:23.740966+00:00", "when": "seconds ago", "note": "instance_id: i-959039a4, Note: Waiting for Ordering System to Complete Order. OrderId: b77224ec-3015-4460-89f9-17ddaba9a6bc, Order has been Completed, Location: s3://receiving-dgcs-tdgplatform-com/055567707010_01_003", "state": "complete", "event": "succeeded" } ] } ## Workflow and Task Callbacks Workflows and tasks within a workflow can have a URL callback defined. The callback is an HTTP POST to the URL that is defined in the task or workflow. It includes a small JSON packet describing the event that triggered the callback. Workflow callbacks are set as part of the workflow definition. Task callbacks are only set at the "run" time of a task, not definition time. That means you'll include it in the task definition within the workflow, not in the task definition in the task registry. ### Task Callbacks To set a callback on a Task, add the following to the Workflow JSON that is passed when starting a workflow with a set of tasks: { "name": "AOP_Strip_Processor_protogenV2LULC", "callback": "http://www.somehost.tld/some/url", "tasks": [ { "name": "AOP", "outputs": [ { "name": "log" } ], "inputs": [ { "name": "data", "value": "<string>" } ], "taskType": "AOP_Strip_Processor", callback": "http://www.somehost.tld/someother/url" } ] } Note that while the example shows two different URL end points, that is not a requirement. However the data that is passed to the endpoint will be different depending on whether the callback is from a task or from a workflow. The message that is sent when a task changes state is below. This is the same message structure for all event states. { "userName": "testuser", "payload": { "taskDomain": "default", "instanceSize": "LOCAL", "workflowName": "AOP_Strip_Processor_protogenV2LULC", "instanceId": "LOCAL", "taskState": "succeeded", "workflowId": "4517305590090108113", "workflowState": "running", "taskId": "4517305590044912472", "taskType": "AOP_Strip_Processor", "taskName": "Automated LULC" }, "entity": "task", "environment": "alpha", "source": "worker", "action": "state changed", "properties": { }, "accountId": "<YOUR ACCOUNT ID>" } A Workflow event message is similar but has fewer fields: { "message": { "userName": "testuser", "payload": { "workflowId": "4517315362701210938", "workflowName": "AOP_Strip_Processor_protogenV2LULC", "workflowState": "succeeded" }, "entity": "workflow", "environment": "LOCAL", "source": "decider", "action": "state changed", "properties": { }, "accountId": "<YOUR ACCOUNT ID>" } } ### Additional Information The URL that is called with the JSON message can be any **publicly** accessible URL. For example, you won't be able to test with "localhost" or anything that can't be accessed from the GBDX environment. The callback manager will wait for five seconds to connect with the URL endpoint and an additional five seconds for the endpoint to respond. The endpoint must respond with an HTTP "200" code from the callback. If it responds with anything else or if the call times out, the system will try again in 30 seconds. It will try this 5 times before giving up. The current configuration accepts any SSL certificates. You can use a self-signed certificate for the callback URL.