Members
This chapter explores extracting multiple values with members, match property and dynamic query.
The Example 2 extracts ten data points shown below from defs/examples/fin/page/acme-snapshot.html page.
- MARKET CAP
- EPS (TTM)
- P/E
- P/C
- BOOK VALUE
- PRICE/BOOK
- DIV (%)
- DIV YIELD
- FACE VALUE
- INDUSTRY P/E
The snippet of HTML from the page is
<div id="snapshot">
<div>
<div>
<div>MARKET CAP</div>
<div>382,642.57</div>
<div></div>
</div>
<div>
<div>P/E</div>
<div>-</div>
<div></div>
</div>
<div>
<div>BOOK VALUE</div>
<div>27.89</div>
<div></div>
</div>
....
The datadef used to extract data from this page is
defs/examples/fin/jsoup/ex-2/job.yml
dataDefs:
snapshot:
axis:
fact:
query:
region: "div#snapshot"
field: "div:matchesOwn(^%{row.match}) + div"
col:
query:
script: "document.getFromDate()"
members: [
member: {name: date},
]
row:
query:
region: "div#snapshot"
field: "div:matchesOwn(^%{row.match})"
members: [
member: { name: "MC", match: "MARKET CAP" },
member: { name: "EPS", match: "EPS \\(TTM\\)" },
member: { name: "PE", match: "P/E" },
member: { name: "PC", match: "P/C" },
member: { name: "BV", match: "BOOK VALUE" },
member: { name: "PB", match: "PRICE/BOOK" },
member: { name: "DIV", match: "DIV \\(%\\)" },
member: { name: "DY", match: "DIV YIELD" },
member: { name: "FV", match: "FACE VALUE" },
member: { name: "IND PE", match: "INDUSTRY P/E" },
]
Here, row axis defines multiple members elements with name and match properties.
The previous example defined price as member: {name: “Price”, value: “Price”} with value property. The value property assigns the value directly to the member without any query. But, the above dataDef uses match property
member: { name: "FV", match: "FACE VALUE" }
When match property is defined it can be accessed through substitution variable %{<axisName>.match} and in the row query, we are using the match as substitution variable in the selector.
row:
query:
region: "div#snapshot"
field: "div:matchesOwn(^%{row.match})"
When Scoopi process row axis, for each member defined in the axis it gets the raw query and replaces its %{row.match} with the value of member’s match property and then dispatches the query to JSoup and once JSoup returns the content of selected item, it assigns the value to member’s value field.
Let’s see how match=“BOOK VALUE” is handled by each axis. For, HTML snippet
<div>
<div>BOOK VALUE</div>
<div>27.89</div>
<div></div>
</div>
when axis row axis is processed, selector selects the element <div>BOOK VALUE</div> as it contains matching text and returns its content which is nothing but “BOOK VALUE”
The fact axis uses slightly modified selector which is as below
fact:
query:
region: "div#snapshot"
field: "div:matchesOwn(^%{row.match}) + div"
The fact selector is same as row selector but with the trailing + div. This + div selects next sibling of the matched element which is <div>27.89</div> and returns its content i.e. 27.89. The match property comes handy when we need to pick fact and its attribute from different elements.
When we scrape limited number of items as in this example then it is convenient to use members directly either with value or match property. But, when we scrape large number of items then dataDef definition becomes lengthy and to overcome that, Scoopi comes with two more features indexRange and breakAfter.
The next chapter describe indexRange and how to define dataDef with it.