Jobs
Jobs are a way to schedule recurring or batched/grouped executions. Some common use cases:
-
Periodic scraping jobs. Lets assume that you have a scraper that needs to run every hour to get the latest data. In this case, you can build a set of APIs that do the scraping and return the data, and then setup a job to run every hour and send the data to a webhook or store it in S3.
-
Lets say that you are trying to automate a set of tasks that need to be completed on demand. In this case, you can build a set of APIs that will do those tasks and setup a job without a schedule that you can trigger whenever you need to run those tasks. The payload can be one or more APIs each with the parameters to run it.
You can think of jobs as a higher level abstraction on top of the direct run API. Lets assume that you don’t want to use Jobs, what you would do is directly call the run api periodically or when needed.
Each job has payload (what to run), configuration (how to run it), sink (where to send the result, optional), schedule (when to run - optional). More on this later.
Job runs
Think of the job as a template/plan of execution, whenever an execution happen, it is called a job run.
Whenever a job is triggered, a job run is created. The job run contains information about the status of that specific run and the number of pending and completed payloads. Any API runs resulting from the job runs are also associated with that job run, so you can track the input and results of specific APIs in the job run.
Jobs management and operations
Jobs management and operations can be done via UI or API. So for anything mentioned here, you can either do it in the UX (by going to the jobs tab as shown below) or via APIs.
Jobs operations
- Pause jobs: This will pause all in-progress job runs, stopping them for executing new payloads (running payloads will continue to run, but won’t be retried). It will also pause the schedule of the job, so no new job runs will be created.
- Resume jobs: This will resume all paused job runs and the schedule of the job.
- Trigger jobs: This will manually trigger the job to run immediately, regardless of the schedule. Jobs cannot be triggered if they are paused.
- Terminate job run: terminates a specific job run.
Job properties
Job payload (what to run)
The payload configures what APIs to run as part of the job run. It is an array of payload objects. Each payload object has api to run and parameters to run it with.
During a job run, the payload of the job can be extended to include new APIs (with new params). Checkout Nested scheduling for more info.
Configuration (how to run)
The configuration of a job configures the retry policy and the maximum amount of requests to run concurrently.
Retry policy
The retry policy consists of the following properties:
maximumAttempts
: The job-level maximum number of attempts to run a payload. This value can be overridden by the payload.initialInterval
: The initial interval between retries, as milliseconds or an ms-formatted string.maximumInterval
: The maximum interval between retries, as milliseconds or an ms-formatted string.backoffCoefficient
: The exponential backoff coefficient to calculate the next retry interval. The total time to wait between retries is calculated as follows (i
is the current attempt number, starting from 0):
If no retry policy is provided, it will default to 3 maximum attempts with no delay.
The retries in jobs are not guaranteed to be executed in order. The job run will continue to process other payloads while the retries are pending.
Maximum concurrent requests
The maximum concurrent requests configuration controls at most how many payloads can run at the same time. This does not guarantee that the payloads will start running at the same time.
If this is not configured, the job will default to running 5 payloads concurrently.
Schedule (when to run)
The schedule is an optional property to configures job runs to happen periodically.
It can be configured using two methods:
- Intervals: A simple way to configure the job to run every X period. The period can be specified as milliseconds or an ms-formatted string. All intervals are relative to the Unix epoch time.
- Calendars: A granular way to control when a job is scheduled to run. Calendars are objects that define the days and times when a job should run (similar to a cron string, but more verbose). They can be configured to run on specific years, months, days of month, days of week, hours, minutes, and seconds, and can be set to be a single value, a range, or a list of ranges. An example of a calendar configuration is:
The job will be scheduled based on the union on the intervals and calendars provided. For example, if it is configured to run every 7 days and at the first day of every month, the job will trigger when either one is reached.
Job sink (where to send the results)
You can optionally sink the results of the job to a destination, it can be:
- Webhook: The configuration includes the URL to send the results to. The results will be sent as a POST request.
- S3: The configuration includes the bucket name, region, credentials, and the key to store the results. The credentials must have write access to the bucket.
Check out the API reference for sinks for more information about the configuration and output format of the sinks.