Thursday, 29 January 2009

SA1 Nagios Deployment Update

The EGEE SA1 Nagios bundle was released yesterday with significant updates.
  • GOCDB Integration: A list of Sites can now be collected using the GOCDB's new API. In particular a list of sites in a ROC or in a Country can be monitored extending the previous LDAP filter on Sites.
  • GOCDB Downtimes: Downtimes entered in the GOCDB are now also pulled into and inserted as NAGIOS downtimes for your services.
  • HGSM Integration: HGSM is the SouthEast Europe equivalent to the GOCDB.
  • NDOUtils Installed: NDOUtils sits behind NAGIOS and fills in a MySQL database with NAGIOS's configuration and metric and test results.
  • New SRM Tests: These mimic some of the logic of the existing SAM SRM tests. The eventual replacment to the SAM SRM tests. In NAGIOS speak we now have an active check that submits scripts and returns passive results for each of steps of the lcg-cr, lcg-rep, lcg-del seem before.
  • NSCA Installed: Especially for the case where two nodes are used, a NAGIOS node and NRPE triggered UI then passive test results are submitted back via NSCA from the NRPE-UI. Well almost - Bug.
  • New BDII Checks: These are the checks taken directly from the gstat2 work but now running against your services.
  • New msg-to-queue Service: Running on a NAGIOS box this subscribes to externally executed test results for your Site or ROC from the ActiveMQ messaging system. Currently nothing is actually coming in but much of the infastructure is now there.
As before installation can still be done completly via YAIM both for a site or ROC. New packages can be followed for i386 or x86_64. And of course bugs and feedback are always welcome.

The update contains work from Emir, James, Laurence, Konstantin and myself.