Filters

The extracted data of the previous example contains many junk items and use filters to remove them from output.

The filter is normally applied after parse is over and data is created. The Example 5 uses filter for items/item. The filter snippet is as below

defs/examples/fin/jsoup/ex-5/job.yml

dataDefs:
  bs:
    query:
      block: "table:contains(Sources Of Funds)"
      selector: "tr:nth-child(%{item.index}) > td:nth-child(%{dim.year.index})"       
    items:  
      - item:
          name: item
          selector: "tr:nth-child(%{index}) > td:nth-child(1)"                        
          index: 5
          breakAfter:
            - "Book Value (Rs)"
          filters: 
            - filter: { type: value, pattern: "" }
            - filter: { type: value, pattern: "Sources Of Funds" }
            - filter: { type: value, pattern: "Application Of Funds" }                   

    dims:  
      - item:
          name: year
          selector: "tr:nth-child(1) > td:nth-child(%{index})"
          indexRange: 2-6

The filter definition remove the members whose axis item (item) value is

  • blank
  • null
  • Sources Of Funds
  • Application Of Funds

As the filter specifies type: value, pattern is applied to axis value field. The filter property type can be value or match. When type: match pattern is compared with axis’s match field. Pattern property can be plain text for simple comparison or regex for complex pattern matching.

When filter is true for an axis, then the enclosing data item is removed from the output. For example, when filter for dim axis (year) is

dims:
  - item:
      name: year
  ...

  filters: [ 
    filter: { type: value, pattern: "Dec 16" },
  ]

and data item axis are

 dim : Dec 16
 item : Equity
 fact : 20.00

Then as dim axis matches with the pattern and the whole data item is removed from data even though filter is not specified for other two axis.

The next chapter shows how to find out Selector or XPath with Google Chrome, FireFox browsers or through Scoopi Query Analyzer tool.