Multiple Tasks

Scoopi can execute multiple tasks for a locator and also multiple tasks on multiple locatorGroups.

Multiple tasks and single Locator group

The Example-1 from fin folder scrape Price data from acme-snapshot.html page while Example-2 extracts Snapshot data from the same page. One option is to define acme-snapshot.html locator in two locatorGroups and assign task to each of them. This unnecessarily downloads acme-snapshot.html twice. Instead, it is better to define single locatorGroup and assign two tasks - priceTask and snapshotTask so that page downloads only once.

The Example 6 extracts price and snapshot data from acme-snapshot.html page.

To run multiple task we define taskGroups as below

defs/examples/fin/jsoup/ex-6/job.yml

locatorGroups:

  snapshotGroup:
    locators: [
       { name: acme, url: "/defs/examples/fin/page/acme-snapshot.html" }
    ]

taskGroups:

  snapshotGroup:

    priceTask:
      dataDef: price

    snapshotTask:
      dataDef: snapshot

The above snippet defines two task, the priceTask applies price dataDef and the next one, snapshotTaks applies snapshot dataDef on all locators of snapshotGroup.

Multiple tasks and multiple locator groups

The Example 7 extends on previous one which executes multiple task on multiple locators group. It scrapes price and snapshot data from acme-snapshot.html and bs data from acme-bs.html page. Task and locator snippet from the example is as below.

defs/examples/fin/jsoup/ex-7/job.yml

locatorGroups:

  snapshotGroup:
    locators: [
       { name: acme, url: "/defs/examples/fin/page/acme-snapshot.html" }
    ]

  bsGroup:
    locators: [
       { name: acme, url: "/defs/examples/fin/page/acme-bs.html" }
    ]

taskGroups:

  snapshotGroup:
    priceTask:
      dataDef: price

    snapshotTask:
      dataDef: snapshot

  bsGroup:
    bsTask:
      dataDef: bs

It defines two locator groups and task groups - snapshotGroup and bsGroup. The snapshotGroup defines two tasks priceTask and snapshotTask and second task group, bsGroup defines single task bsTask.

In all, Scoopi executes three tasks

  1. priceTask parses locator acme-snapshot.html with price dataDef.
  2. snapshotTask again parses same instance of acme-snapshot.html with snapshot dataDef.
  3. bsTask parses acme-bs.html with bs dataDef.

Output of all the three tasks go to output/data.txt.

But, there is a problem in output data - the date for bs data is in MMM ‘YY format while in price and snapshot the date is ISO Date format. If we try to import the output file to database, it fails.

We have to plugin converter before data is appended to output file to format or change value. In the next chapter, we explain Scoopi workflow, step and plugin design and show how to override default workflow steps.