Gotz uses datadef to define data. Datadef contains axis, query, script and members which collectively defines the data to be scrapped from the HTML page.

The job.xml in Example 1 uses a simple DataDef which scrape one data point i.e. price of the company’s share from defs/examples/page/acme-quote.html page.

The datadef snippet from defs/examples/jsoup/ex-1/job.xml is as below

<dataDef name="price">
    <axis name="col">
        <xf:fields>
            <xf:script script="configs.getRunDateTime()" />
        </xf:fields>
        <member name="date" />
    </axis>
    <axis name="row">
        <member name="Price" value="Price" />
    </axis>
    <axis name="fact">
        <xf:fields>
            <xf:query region="div#price_tick" field="\*" />
        </xf:fields>
    </axis>
</dataDef>

It defines a dataDef named price with three axes.

 
 

Axis

In Gotz, data is defined by axis which is similar in concept to that of spreadsheet.

Gotz DataDef Axis

In datadef, we define three axis - COL, ROW and FACT. The value we are interested in is called as Fact which is same as value held by a cell in spreadsheet. The other two axes, Col and Row say something about the Fact.

For example, in the price datadef, the axis Col is date and axis Row is Price and if price of company share is say, 121.80 then the axis values are as below

COL : 01-01-2018 ROW : Price FACT : 121.80

From the combination of three axes we can deduce that price is 121.80 as on Jan 1st 2018.

The concept of axis and fact is borrowed Multidimensional Expression (MDX) language used in Data Warehouse which allows us in future to add more axis to construct multidimensional data.

As of now only allowed axis names are - col, row and fact.

 
 

Fields, Queries and Scripts

Axis can contain queries or scripts and they defined within <xf:fields> element. In dataDef, only fields element and its children are prefixed with xf: prefix while all other elements are without prefix. Later chapter on namspaces explains this in detail.

The fact axis, defines following query.

<axis name="fact">
    <xf:fields>
        <xf:query region="div#price_tick" field="\*" />
    </xf:fields>

<xf:query> has two attributes region and field and they are the selectors used query the data from page. We explain how to construct them in the next chapter.

The col axis defines following script.

<axis name="col">
    <xf:fields>
        <xf:script script="configs.getRunDateTime()" />
    </xf:fields>

Script gets the value using the Script Engine. Here we are call getRunDateTime() method on configs object which returns the date and time when GotzEngine run started.

The fact axis should compulsorily contain either query or script while there is no such requirement for col and row axis.

Member

In datadef, member is used to hold the value returned either by script or query.

For col axis, we defined one member named date.

<axis name="col">
    <xf:fields>
        <xf:script script="configs.getRunDateTime()" />
    </xf:fields>
    <member name="date" />
</axis>

The date returned by the script is assigned to the member as value.

The row axis doesn’t contain any query or script, so we are defining a member named Price and directly assigning its value to Price.

<axis name="row">
    <member name="Price" value="Price" />
</axis>

For fact axis, there is no need to add any member because Gotz implicitly adds default member to hold the fact value.

In the next chapter, we explain how to construct the query with selectors.