[TriLUG] Diagnosing RCU stall warnings

Brian Henning via TriLUG trilug at trilug.org
Mon Dec 7 12:27:58 EST 2015


Hi folks,

We had a server become unresponsive recently.  Symptoms included still accepting TCP connections but the underlying services never responding, and a long series of "self-detected stall" messages on the console.

I found this link:
https://www.kernel.org/doc/Documentation/RCU/stallwarn.txt

which talks about what a stall warning means and some typical causes.  It refers to examining stack traces to find the offender, but I don't know where to find said stack traces (after power-cycling the machine).  I looked in a bunch of files in /var/log with no useful results.

The kernel running (4.1.9) is a much more recent version than what the installed Debian distribution came with (2.6.32), due to the need for some newer features.  Could some outdated system utility be causing problems against the newer kernel?  We've had one or two kernel panics on the machine recently as well, but I don't have records of the cause(s).  Should I just rebuild the OS?

Thanks,
-Brian


More information about the TriLUG mailing list