If you use Docker to run applications, then the easiest way install Gotz is to pull its image from DockerHub and run it straight away with an added convince that it comes MariaDB preconfigured. In case, you are not using Docker then you can also download its release from GitHub. We explain both the options here.

Install Gotz from Docker Image

Gotz releases are available as docker image from DockerHub. To run Gotz container, we needDocker installed in the system and additionally, to run it with database, we also need Docker Compose. The total download size of Gotz is about 120MB and Mariadb 130MB.

The following command downloads Gotz docker image, creates container named gotz and run it.

docker run --name gotz codetab/gotz

It executes example 1 and output one line of data to output/data.txt. But, we will not be able to view the output file nor modify the conf files as they are within the container. To overcome this, we need to externalize these folders with following commands.

mkdir gotz
cd gotz
docker cp gotz:/gotz/conf .
docker cp gotz:/gotz/output .
docker cp gotz:/gotz/docker .
docker cp gotz:/gotz/defs .
docker cp gotz:/gotz/logs .

First, we make a folder named gotz and then copy conf, output, docker, defs and logs folders from the container to it. This allows us to modify conf, def files and also, view the output file without login into the container. We can now remove the container as we are going to recreate it with a new set of parameters.

docker rm gotz

Let’s run example 11 to output more data. To do that, edit conf/gotz.properties file and change beanFile property as gotz.beanFile=/defs/examples/jsoup/ex-11/bean.xml and run gotz

docker run --name gotz --rm -p 9010:9010 -v "$PWD"/defs:/gotz/defs -v "$PWD"/conf:/gotz/conf -v "$PWD"/output:/gotz/output codetab/gotz

Above, we mount externalized folders using -v option. When container run, it uses definitions from jsoup/ex-11 and on completion, we should have a new data.txt file in output folder with 281 lines of data

Gotz comes with a nice Angular dashboard which displays internal metrics of the app and it can be accessed via http://localhost:9010 while Gotz is running.

Gotz install metrics dashboard

Gotz with MariaDB

To use MariaDB as datastore, we need Docker Compose which runs Gotz and MariaDB in separate containers. First, move the docker-compose.yml to gotz folder

cd gotz
mv docker/docker-compose.yml .

Next, edit conf/gotz.properties and modify useDataStore property as gotz.useDatastore=true. As Gotz need to connect database running on container we also need to change database connection url. Edit conf/jdoconfig.properties and change ConnectionURL as javax.jdo.option.ConnectionURL=jdbc:mariadb://db:3306/gotz. Once configuration is ready, start database.

docker-compose up db

Docker downloads the latest MariaDB image and run it as container. On first run, it also initializes the database, create users and grant required privileges. Once database is up and running, stop it with Ctrl-C. It creates new folder named data which houses the MariaDB data files.

With that, one time setup is complete and from now on, we start using Gotz with MariaDB with the following command.

docker-compose up --abort-on-container-exit

Install Gotz from GitHub

Alternatively, install Gotz either by downloading the release package which contains all dependencies or by building the source code with Maven. In this case, if we wish to run Gotz with datastore support, then we have to manually install database such as MariaDB or HSQLDB and configure it. We explain the HSQLDB installation in a later chapter on persistence.

Download and install the Release package

Download the latest release zip file gotz-x.x.x-production.zip from GitHub Releases and extract the zip file to some location.

Download and build the Source

Alternatively, we can download the Gotz source code zip from GitHub. To build it, extract it somewhere and from the project root folder run

mvn package -DskipTests

Maven compiles the source, downloads the dependencies and package the app as gotz-x.x.x-production.zip in target folder. Extract target/gotz-x.x.x-production.zip to some location.

Download and install JRE 8 or above

To run, Gotz requires JRE 8 or above. It is tested both with OpenJDK as well as Oracle Java SE.

 
 

Quick start

Go to the extracted folder of gotz-x.x.x-production.zip. The directory structure is as below.

gotz-0.9.0-beta/
├── conf
│   ├── gotz.properties
│   ├── jdoconfig.properties
│   ├── log4j.properties
│   └── logback.xml
├── defs
│   └── examples
│       └── jsoup
│       └── htmlunit
│       └── page
├── gotz.bat
├── gotz.sh
└── lib
    └── gotz-0.9.0-beta.jar
    └── ....

Application jar file gotz-x.x.x.jar is in lib folder along with other dependencies.

The conf folder holds the configuration files and the main configuration file is gotz.properties. By default, following two properties are defined.

gotz.beanFile=/defs/examples/jsoup/ex-1/bean.xml
gotz.useDatastore=false

The property gotz.beanFile points to example 1 which is loaded when we run Gotz. The other property gotz.useDatastore is set to false which allows us to run Gotz without setting up database. In a later chapter, we show how to setup database and use it to persist Gotz objects. Till then, set it to false.

Let’s run Gotz and check the installation.

cd gotz-x.x.x
gotz.sh               // gotz.bat for windows

It starts GotzEngine and loads files defined in defs/examples/jsoup/ex-1 folder and outputs data to output/data.txt file.

As we progress through the guide, we cover examples one by one. To load other examples, modify the gotz.beanFile property in conf/gotz.properties and run Gotz.

 
 

Examples

Gotz comes with a set of example definition files : ex-1 to ex-14. Examples uses HTML pages from examples/page folder and extract data from it. The pages are financial data such as Balance Sheet, Profit and Loss Account and Share price etc., of a company. Each example builds on the previous one, so that major portion of definitions remains same throughout this guide for easy understanding of the concepts.

Examples come in two flavors - JSoup which uses selectors to query data and HtmlUnit which uses XPath as query.

This guide focus on JSoup examples, as JSoup is easy to use and light on memory. HtmlUnit examples are same as JSoup ones but uses XPath for queries.

In the next chapter, we start with Example 1.