[TriLUG] Odd Network problem
mutterc at nc.rr.com
Wed Feb 16 23:21:41 EST 2005
Chris Knowles wrote:
> Got a weird one.
> (Oh, regarding that crashing box, further investigation pointed at the
> motherboard as a culprit.)
> I've got a Nagios server in place that's been happily warning us of doom and
> gloom for over a year. It's one of the great success stories for Linux at
> our company.
> Until now.
> Starting this morning, it has been randomly unable to ping various boxes on
> our network. That is, until you ping the nagios server from the "unpingable"
> server. Then Nagios can ping that server all it wants.
> This is all local network, no routing involved.
> Any idears as to what could be causing this? (This is a simple switched
> network, and other than this seems to be working fine.)
> Any help is appreciated.
As Jason mentions, this is an arp problem. The nagios box is either not
sending arp requests, or not listening to the replies. When another box
arps for the nagios, it hears that request and replies, at the same time
populating its cache, so it can send packets to that box then. (I see
this kind of one-way pingability a lot in my day job of debugging
The best bet is to run the all-seeing, all-knowing Ethereal on both the
nagios box and the 'other' box, or its command-line cousin tcpdump (you
needn't even put them in promiscuous mode, as you're tracing packets
destined to the boxes in question). Then you can see what's going wrong
with those arp packets.
More information about the TriLUG