Browser interaction with WebDriver

In this previous chapter, we scrape data from page with JavaScript created contents. In this chapter, we extend it further and show how to interact with WebDriver to emulate browser automation.

Browser Scrolling and Ajax

The Quote Example 2 scrape quotations from Scroll - infinite scrolling pagination page loads next set of quotes through Ajax call when we scroll to the bottom of browser.

The locator and task snippet from this example is as below.

defs/examples/quote/jsoup/ex-1/job.yml

locatorGroups:
  quoteGroup:
    locators: <a href="https://www.seleniumhq.org/docs/03_webdriver.jsp#introducing-the-selenium-webdriver-api-by-example" target="_blank">
       { name: quotes, url: "http://quotes.toscrape.com/scroll" }  
    ]

taskGroups:
  quoteGroup:
    quoteTask:
      dataDef: quote
      steps: 
        jsoupDefault:
          loader:
            class: "org.codetab.scoopi.step.extract.DomLoader"
            previous: seeder 
            next: parser
            plugins: [
              plugin: { 
                name: script, 
                class: "org.codetab.scoopi.plugin.script.BasicScript",
                script: "/defs/examples/quote/jsoup/ex-2/script.js",
                entryPoint: "execute", }                            
            ]

The quoteTask uses jsoupDefault steps and override its loader step to use DomLoader class as we saw in the last chapter. However, in this example we add a script plugin which calls execute function defined in file defs/examples/quote/jsoup/ex-2/script.js.

In this script we place the Selenium WebDriver and JavaScript code to scroll the window as shown below.

defs/examples/quote/jsoup/ex-2/script.js

function execute(webDriver) {

    var Select = Java.type('org.openqa.selenium.support.ui.Select');
    var By = Java.type('org.openqa.selenium.By');

    var pagesToScroll = 4;

    while (true) {

       // scroll
       webDriver
          .executeScript("window.scrollTo(0, document.body.scrollHeight)");

       // wait
       webDriver
          .executeAsyncScript("window.setTimeout(arguments[arguments.length - 1], 500);");

       var eles = webDriver.findElements(By.className("quote"));
       if (eles.size() >= pagesToScroll * 10) {
          break;
       }
    }

  }

When DomLoader step calls the script execute() function, it passes instance of WebDriver. In while loop, the webdriver method executeScript() scrolls the browser window down which triggers page’s ajax call to fetch quotes for next page and then the executeAsyncScript() method wait for 500ms so that DOM is loaded. After that, findElements() method selects and returns list of HTML elements with class name quote. The while loop breaks when list size is more than 40.

Script Engine can execute any Java method of WebDriver class, but to execute method any other class of Selenium such as Select class or By class we need to map those class to JavaScript variables using Java.type() calls as done at the start of script.

WebDriver has easy to explore and understand API to select elements, navigate pages and execute page script. Refer [Selenium WebDriver Documentation to learn more about it.

In the next section, we explain features to manage Scoopi.