[TriLUG] server up for 262 days

scott jacobs sjacobs at plurimus.com
Fri Nov 30 10:01:12 EST 2001


On Thu, Nov 29, 2001 at 09:59:12PM -0500, Andrew C. Oliver wrote:
> Just my little linux stability testimonial. 

And here's mine... :)

We had two boxes make it to 497 days, 2 hours, and 17 or 26 minutes
depending on the box.  i.e. They both died within 9 minutes of each other.
(Within the +- 5 minute range of our custom heartbeat monitoring)  Do
some math and you'll see that is about 2^32/100 seconds, the point at which 
that particular kernel's jiffies counter rolls over.  The load on the boxes
spiked dramatically and they went down... down... down...  We don't know
which particular program or kernel piece couldn't handle it, but it would
require a lot of patience to debug. (Yes, we considered building a kernel 
with an uptime counter that doesn't start at 0. ;) )  Instead, we'll just 
reboot before that from now on.  We've got 7 other boxes +300 days.  I
think the boxes that died were RH 5.2 boxes running 2.0.36.

> 
> So anyhow, once it rolls around to about a year I'll reboot it.
> 
> -Andy

You can push it longer than that. :)

scott


-- 
---------------------------------------------------------------------
scott jacobs                                     plurimus corporation
---------------------------------------------------------------------



More information about the TriLUG mailing list