DataDef
Scoopi uses datadef to define data. Datadef contains axis, query, script and members which collectively defines the data to be scrapped from the HTML page.
In this chapter, we go through Example-1 job.xml to explain dataDef. This job.yml uses a simple DataDef which scrape one data point i.e. price of the company share from defs/examples/fin/page/acme-snapshot.html page.
The datadef snippet from defs/examples/fin/jsoup/ex-1/job.yml
is as below
dataDefs:
price:
axis:
fact:
query:
region: "div#price_tick"
field: "*"
col:
query:
script: configs.getRunDateTime()
members: [
member: {name: date},
]
row:
members: [
member: {name: Price, value: Price},
]
It defines a dataDef named price with three axes.
Axis
Data is defined by axis which is similar in concept to that of spreadsheet.

For a datadef, we define three axis - FACT, COL and ROW. The data we are interested in is called as Fact which is same as value held by a cell in spreadsheet. The other two axes, Col and Row say something about the Fact.
For example, in the price datadef, the axis Col is date and axis Row is Price and if price of company share is say, 121.80 then the axis values are as below
FACT : 121.80
COL : 01-01-2018
ROW : Price
From the combination of three axes we deduce that price as on Jan 1st 2018 is 121.80.
The concept of axis and fact is borrowed Multidimensional Expression (MDX) language used in Data Warehouse which allows us in future to add more axis to construct multidimensional data. As of now only allowed axis names are - fact, col and row.
Fields, Queries and Scripts
Axis can contain queries or scripts.
The fact axis, defines following query.
fact:
query:
region: "div#price_tick"
field: "*"
Query has two properties region and field and they are the selectors used query the data from page. We explain how to construct the selectors in the next chapter. For now, we concentrate on structure of dataDef.
The col axis defines following script.
col:
query:
script: configs.getRunDateTime()
Script gets the value using the Script Engine. Here we are call getRunDateTime() method on configs object which returns the date and time when ScoopiEngine run started.
The row axis doesn’t contains neither the script nor the query. The fact axis should compulsorily contain either query or script while there is no such requirement for col and row axis.
Member
In datadef, member is used to hold the value returned either by script or query.
For col axis, we defined one member named date. The date returned by the script is assigned to the member as value.
col:
query:
script: configs.getRunDateTime()
members: [
member: {name: date},
]
The row axis doesn’t contain any query or script, so we are defining a member named Price and directly assigning its value to Price.
row:
members: [
member: {name: Price, value: Price},
]
For fact axis, there is no need to add any member because Scoopi implicitly adds default member to hold the fact value.
In the next chapter, we explain how to construct the query with selectors.