We can define multiple task for tasks and also, multiple tasks are allowed in a job file.

The Example-1 scrapes Price data from acme-quote.html page while Example-2 extracts Snapshot data from the same page. We can add these two task within tasks element.

But before doing that let’s introduce one more feature - steps reference - which extracts steps into separate entity which can be used by both tasks.

 
 

Steps Reference

In Example-6, steps is moved out of task element as a standalone element under <fields>

<fields name="locator" class="org.codetab.gotz.model.Locator"
    xmlns="http://codetab.org/xfields">

  <steps name="commonSteps">
    <step name="seeder"
            class="org.codetab.gotz.step.extract.LocatorSeeder">
            <nextStep>loader</nextStep>
    </step>
    ....
  </steps>

By doing so we can refer the steps definition from multiple task.

Multiple Task

Next, to run multiple task - price and snapshot - on acme-quote.html we define tasks as show below

<fields name="locator" class="org.codetab.gotz.model.Locator"
    xmlns="http://codetab.org/xfields">

    <steps name="commonSteps">
        .... steps ....
    </steps>

    <tasks name="quote tasks" group="quote">
        <task name="snapshot task" dataDef="snapshot">
            <steps ref="commonSteps" />
        </task>
        <task name="price task" dataDef="price">
            <steps ref="commonSteps" />
        </task>
    </tasks>
</fields>

The above snippet defines two task, first one applies snapshot dataDef and second one price dataDef on locators of group quote i.e. acme-quote.html page and both task use commonSteps through <steps ref=“commonSteps” />

 
 

Multiple tasks

The Example-7 extends on previous one which executes multiple tasks on multiple locators group. It scrapes price and snapshot data from acme-quote.html and bs data from acme-bs.html. Relevant snippet from the example is show below.

<locators group="quote">
    <locator name="acme" url="/defs/examples/page/acme-quote.html" />
</locators>

<locators group="bs">
    <locator name="acme" url="/defs/examples/page/acme-bs.html" />
</locators>

<fields name="locator" class="org.codetab.gotz.model.Locator"
    xmlns="http://codetab.org/xfields">

    <steps name="commonSteps">
        .... steps ....
    </steps>

    <tasks name="quote tasks" group="quote">
        <task name="snapshot task" dataDef="snapshot">
            <steps ref="commonSteps" />
        </task>
        <task name="price task" dataDef="price">
            <steps ref="commonSteps" />
        </task>
    </tasks>

    <tasks name="bs" group="bs">
        <task name="bs" dataDef="bs">
            <steps ref="commonSteps" />                    
        </task>
    </tasks>
</fields>

It defines two locators group quote and bs and with corresponding tasks one for group quote and another one for bs. The first tasks (group quote) has two task - price and snapshot and second tasks (group bs) has one tasks - bs.

In all, Gotz executes three task

  1. price task parses locator acme-quote.html with price dateDef.
  2. snapshot task agian parses locator acme-quote.html with snapshot dateDef.
  3. bs task parses acme-bs.html with bs dateDef.

Output of all the three tasks go to output/data.txt. But, there is a problem in output data - the date for bs data is in MMM ‘YY format and database throws error when we try to push the data with DBAppender.

We need to apply converters to change format or adjust value etc., and next chapter does exactly that.