[TriLUG] Net monitoring
30 May 2002 16:57:17 -0400
On Thu, 2002-05-30 at 14:51, Chris Knowles wrote:
> What are you using to do network monitoring? (By Network Monitoring, I
> mean, Is a host up, are all of it's services up, page somebody if
> they're not.)
OpenNMS. I'm a big fan. They still have a lot of work to do, IMHO but
the basic functionality that you're asking about is there.
The caveat is that it doesn't work well with DHCP. That is, if a host
changes IP's from time to time it is going to muck around with the
usefulness of this system. But if you do static IP's or DHCP
reservations, I am confident that you'll like this.
> It has to have a web interface to see what's happening, and preferably
> as a way to modify it's behavior. Also, ease of use is probably a
> pretty good thing.
Yep. When you first log into OpenNMS you are presented with a summary
page that shows on the left side what hosts are "down" (with a clickable
link to the detail page for each one). In the middle you get a service
availability summary broken down by services and three decimal place
percentage of availability. From the front page you can also get to a
page with metrics for a specific host.
Intalling onto any RPM-based system is easy. Just make sure you've got
a JDK installed first.
> Of course, stories of really good hacks, or amazing adventures in
> monitoring would be appreciated as well.
Not yet, but we're working on it. I'm learning the ins & outs of
OpenNMS then will start hacking it. I'm thinking along the lines of
glue code to make it work with some sort of PHP-based help desk
management software so when a user reports a problem it will
pre-populate much of the ticket from the OpenNMS database. Or open a
ticket automagically if a key service goes down, and automatically
update that ticket as necessary when the status changes. The idea being
(1) to reduce unnecessary manual entry of data, thus removing one
possible opportunity for user error while also increasing IT staff
productivity and (2) to provide accurate metrics to management on the
availability of services and IT staff response time.