Scoopi Cluster

Scoopi Cluster scale horizontally by distributing tasks across the cluster nodes. It is designed to run in various environments; in bare JVM or in Docker containers or even on high end container orchestration platforms such as Kubernetes. Scoopi Cluster uses Hazelcast IMDG, a fault-tolerant distributed in-memory computing platform, as its clustering library.

Let’s see Scoopi Cluster in action with multiple JVM on single host.


Download the latest release zip file from GitHub Scoopi Releases and extract the zip file to some location.

By default, Scoopi runs in solo mode. Change to cluster mode via scoopi.cluster.enable config in conf/ Optionally, change scoopi.defs.dir from quickstart example to ex-13.



By default, we have to run three instance of Scoopi to form the cluster. Open two terminals, change to folder where the Scoopi release zip is extracted, and run following command in each terminals. Wait till Cluster boots up and shows the message: wait for cluster quorum of 3 nodes, timeout 60.

java -cp module/*:conf:. $JAVA_OPTS org.codetab.scoopi.Scoopi

Scoopi will not proceed further as we have started just two instances. Open one more terminal, change to Scoopi installation folder and run following command.

JAVA_OPTS="-Dscoopi.cluster.config.file=/hazelcast-tcp.xml "

java -cp module/*:conf:. $JAVA_OPTS org.codetab.scoopi.Scoopi

It starts the third instance with metrics server enabled. Once third instance boots up, cluster is formed and scoopi proceeds and completes the example and the output folder will have scraped data. The metrics server is optional and you can very well start the third instance disabling it.

Once you have three node cluster up and running, you can start any number of lightweight client instances with following command.

JAVA_OPTS+=" -Dscoopi.cluster.config.file=/hazelcast-client.xml"
java -cp module/*:conf:. $JAVA_OPTS org.codetab.scoopi.Scoopi

Clients are lightweight Hazelcast instances that doesn’t hold any distributed data structures, instead fetch jobs from servers and run them, whereas server holds and distributes the jobs among other servers in cluster and also, run scrape jobs.