[TriLUG] Odd Network problem

Jason Tower jason at cerient.net
Sun Feb 13 11:23:39 EST 2005


when the nagios service stops working (can't ping), can the OS still 
ping the remote host?  that's the first step - eliminate nagios itself 
as the culprit.

assuming that isn't the issue, it sounds like an arp problem of some 
sort.  either the nagios host isn't keeping track of mac-to-ip mappings 
or one of the switches on the network isn't.  

try running 'arp -a' on the nagios box, see if the arp table looks 
correct when a remote host can't be pinged, if it does make sure the 
mac addy is correct for the corresponsing ip.  if all that checks out, 
one of the switches is probably the culprit.  if they're managed 
switches you should be able to check out the mac table, otherwise 
there's very little you can do except swap it out and see if that 
helps.

are you doing any cloning of systems there?  mac addresses can be 
spoofed, either on purpose or accidentally, and if a switch or host 
sees two IPs with the same mac it might get confused.

good luck - jason

On Sunday 13 February 2005 01:06, Chris Knowles wrote:
> Got a weird one.
>
> (Oh, regarding that crashing box, further investigation pointed at
> the motherboard as a culprit.)
>
> I've got a Nagios server in place that's been happily warning us of
> doom and gloom for over a year.  It's one of the great success
> stories for Linux at our company.
>
> Until now.
>
> Starting this morning, it has been randomly unable to ping various
> boxes on our network.  That is, until you ping the nagios server from
> the "unpingable" server.   Then Nagios can ping that server all it
> wants.
>
> This is all local network, no routing involved.
>
> Any idears as to what could be causing this?  (This is a simple
> switched network, and other than this seems to be working fine.)
>
> Any help is appreciated.
>
> CJK



More information about the TriLUG mailing list