Scoopi Cluster Containers

We can also run Scoopi Cluster as plain Docker containers.

Install Scoopi from Docker Image

Scoopi releases are available as docker image from DockerHub. To run the image you need Docker installed in the system. The following command pulls Scoopi image, creates and run container named scoopi.

docker run --name scoopi codetab/scoopi

It executes quick-start example which outputs single record to an output file. However, we will not be able to view the output file nor modify the conf files as they are within the container. We need to externalize these folders with following commands.

mkdir scoopi
cd scoopi
docker cp scoopi:/scoopi/conf .
docker cp scoopi:/scoopi/output .
docker cp scoopi:/scoopi/docker .
docker cp scoopi:/scoopi/defs .
docker cp scoopi:/scoopi/logs .

Here, we make a folder named scoopi and then copy conf, output, docker, defs and logs folders from the container to the scoopi folder. Now, we can modify conf, def files and also, view the output file without login into the container. Next, remove the container as we are going to recreate it with a new set of parameters.

docker rm scoopi

By default, Scoopi runs in solo mode. Change to cluster mode via scoopi.cluster.enable config in conf/scoopi.properties file. Optionally, change scoopi.defs.dir from quickstart example to ex-13. The scoopi.properties file is in conf directory we have copied above.


scoopi.cluster.enable=true

scoopi.defs.dir=/defs/examples/fin/jsoup/ex-13

Docker Compose

It is quite easy to run Scoopi cluster with Docker Compose. The docker folder contains docker-cluster.yml which boots up Scoopi cluster with three servers. Change to folder where you installed scoopi from docker image and run


cd scoopi
cp docker/docker-cluster.yml .

docker-compose -f docker-cluster.yml up

In case docker compose is not available, then we can bring up the cluster with docker run command as explained in next section.

With Docker Run

Install Scoopi from Docker image and update scoopi.properties as explained above. By default, to form the cluster, we have to run three Scoopi containers. Open terminal and run following command. Wait till Cluster boots up and shows message: wait for cluster quorum of 3 nodes, timeout 60.


NODE_NAME=scoopi-node-1
JAVA_OPTS="-Dscoopi.cluster.config.file=/hazelcast-multicast.xml"

docker run --name $NODE_NAME -d \
    -v $PWD/conf:/scoopi/conf -v $PWD/defs:/scoopi/defs \
    -v $PWD/logs:/scoopi/logs  -v $outputDir:/scoopi/output \
    -v $PWD/data:/scoopi/data \
    -e JAVA_OPTS="$JAVA_OPTS" codetab/scoopi:latest

Next, run above command by changing NODE_NAME to scoopi-node-1. Start the third instance with metrics enabled with following command.


NODE_NAME=scoopi-node-3
JAVA_OPTS="-Dscoopi.cluster.config.file=/hazelcast-multicast.xml"
JAVA_OPTS+=" -Dscoopi.metrics.server.enable=true"

docker run --name $NODE_NAME -d \
    -v $PWD/conf:/scoopi/conf -v $PWD/defs:/scoopi/defs \
    -v $PWD/logs:/scoopi/logs  -v $outputDir:/scoopi/output \
    -v $PWD/data:/scoopi/data \
    -e JAVA_OPTS="$JAVA_OPTS" codetab/scoopi:latest

Once third instance boots up, cluster is formed and scoopi proceeds and completes the quick start example and the output folder will have scraped data.

For clients, we need to update conf/hazelcast-client.xml file with the bridge network used by docker containers for connection between them. When you run the server instance, at the startup the IP:Port used by server instance is output in console something like - this member address: /172.17.0.2:5701. Update the network element in conf/hazelcast-client.xml as


    <network>
        <cluster-members>
                <address>172.17.0.1</address>
                <address>172.17.0.2</address>
                <address>172.17.0.3</address>
        </cluster-members>
    </network>

The subnet may vary in your system, use the subnet used by your docker bridge network. Now, run client instance with,


NODE_NAME=scoopi-node-4
JAVA_OPTS="-Dscoopi.cluster.mode=client"
JAVA_OPTS+=" -Dscoopi.cluster.config.file=/hazelcast-client.xml"

docker run --name $NODE_NAME -d \
    -v $PWD/conf:/scoopi/conf -v $PWD/defs:/scoopi/defs \
    -v $PWD/logs:/scoopi/logs  -v $outputDir:/scoopi/output \
    -v $PWD/data:/scoopi/data \
    -e JAVA_OPTS="$JAVA_OPTS" codetab/scoopi:latest

For more about configuring the hazelcast network in various scenarios refer Configuring Hazelcast in non-orchestrated Docker environments.