Scoopi Dashboard is an nice little Angular web app that displays useful information such as system, task and pool stats.
The screenshot of the Scoopi dashboard is
Embedded Jetty Web Server serves the dashboard and we can access the
Scoopi dashboard at
http://localhost:9010 while Scoopi is running. The
dashboard can kept running even after scoopi is finished and use the
same one for next run as it stops when scoopi terminates and refreshes
itself once it detects that scoopi is running again.
It shows info such as up time, JVM memory usage etc. The system load average is useful to adjust pool size which is explained below.
Page Stats table displays the number pages fetched from web, pages parsed and existing data reused without parsing the pages etc.
Task Execution Time
Task execution time provides insight into the minimum, maximum and mean time taken by tasks. Task to watch out for is parser as it takes much CPU resource to parse the page.
Pools table show the statistics which is useful to adjust the pool size. By default, the pool size of various pool are is
start - 4 seeder - 6 loader - 4 parser - 4 process - 4 converter - 4 appender - 2
These sizes are fine as long as pages are less than 50. But, when pages are more we need to adjust pool size for better performance.
When pools are properly tuned, as soon as a task is queued it taken up for processing by spare threads. Normally, the gap between queued and finished columns in Tasks table is less but when queued count is substantially more than finished count then it means that many tasks are waiting in queue to run.
When pages are more, especially for parser and start pools, you notice
task are waiting in queue. For example when Scoopi parse about 5000
pages on a low end system with Core 2 Dual processor, the optimal pool
size for parser is 6 and start it is 8. They are set in
conf/scoopi.properties configuration file as below.
When parser pool is size is greater than 6 then system load average shoots up and Scoopi performance is adversely affected. But in a more powerful system with latest processors or on a server class machine, we can increase it further as long as load average is within limits.