Dashboard

Scoopi Dashboard is an nice little Angular web app that displays useful information such as system, task and pool stats.

The screenshot of the Scoopi dashboard is

Scoopi Dashboard

Embedded Jetty Web Server serves the dashboard and we can access the Scoopi dashboard at http://localhost:9010 while Scoopi is running. The dashboard can kept running even after scoopi is finished and use the same one for next run as it stops when scoopi terminates and refreshes itself once it detects that scoopi is running again.

System Stats

It shows info such as up time, JVM memory usage etc. The system load average is useful to adjust pool size which is explained below.

Page Stats

Page Stats table displays the number pages fetched from web, pages parsed and existing data reused without parsing the pages etc.

Task Execution Time

Task execution time provides insight into the minimum, maximum and mean time taken by tasks. Task to watch out for is parser as it takes much CPU resource to parse the page.

Pools

Pools table show the statistics which is useful to adjust the pool size. By default, the pool size of various pool are is

start - 4
seeder - 6
loader - 4
parser - 4
process - 4
converter - 4
appender - 2

These sizes are fine as long as pages are less than 50. But, when pages are more we need to adjust pool size for better performance.

When pools are properly tuned, as soon as a task is queued it taken up for processing by spare threads. Normally, the gap between queued and finished columns in Tasks table is less but when queued count is substantially more than finished count then it means that many tasks are waiting in queue to run.

When pages are more, especially for parser and start pools, you notice task are waiting in queue. For example when Scoopi parse about 5000 pages on a low end system with Core 2 Dual processor, the optimal pool size for parser is 6 and start it is 8. They are set in conf/scoopi.properties configuration file as below.

conf/scoopi.properties

scoopi.poolsize.start=8
scoopi.poolsize.parser=6

When parser pool is size is greater than 6 then system load average shoots up and Scoopi performance is adversely affected. But in a more powerful system with latest processors or on a server class machine, we can increase it further as long as load average is within limits.