[TriLUG] System overload issues

Brian McCullough bdmc at buadh-brath.com
Fri May 24 11:28:24 EDT 2013


On Fri, May 24, 2013 at 09:56:56AM -0400, Bill Farrow wrote:
> On Fri, May 24, 2013 at 9:34 AM, Brian McCullough <bdmc at buadh-brath.com> wrote:
> > Frequently during the day, the system will become ( or the web sites will become )
> > non-responsive for periods ranging from one minute to well over an hour.
> 
> Have you thought about putting limits on processes to prevent them
> from taking the system to it's knees ?  I would start by looking at
> ulimit.  If you can prevent the system from becoming un-responsive,
> then you can start investigating which process is going haywire and
> hopefully fix it properly.

Thank you, Bill.  I hadn't thought of ulimit, since I have only used
that to limit disk space ( if I remember correctly ) in the past.


> > Now, other things seem to be showing failure symtoms; for instance, bzip2, which
> > compresses the MySQL database backup seems to take hours instead of minutes;
> 
> How big is the mysql dump file that is being compressed ?  

I think it is somewhere about 7.5G; it compresses to 1.1G.  It am in the
process of unpacking one of the backups to confirm the original size.

> Time how
> long it takes when the system is running normally, and compare with
> when the system is under load.
> 
> time bzip2 test-backup
> 
> 
> I'm going to second Ron Kelley's suggestion that it might be a bad
> hard drive.  Check dmesg and syslog for hard drive error messages.  

I just took a look at dmesg, I haven't for a while, I guess, and find
something that I think is MUCH more interesting.

My ( gut ) feeling has been that things are thrashing, and I see
something at the bottom of the current dmesg that suggests that that may
be ( part of ) the issue.

What I see is:


Swap cache: add 17613573, delete 17613356, find 25621613/26574285, race
41+1296
Free swap  = 0kB
Total swap = 4192888kB
Free swap:            0kB
2293760 pages of RAM
249431 reserved pages
311175 pages shared
585 pages swap cached
Out of memory: Killed process 21911, UID 48, (httpd).


There are more DMA statistics and CPU statistics prior to that, but the
"Free swap: 0kB" is a red flag to me.

Am I correct, and should I start by increasing swap space, or should I
work on reducing the need for it?


Brian




More information about the TriLUG mailing list