[TriLUG] System overload issues

Max TenEyck Woodbury max at mtew.isa-geek.net
Fri May 24 10:25:01 EDT 2013


On 05/24/2013 09:56 AM, Bill Farrow wrote:
> On Fri, May 24, 2013 at 9:34 AM, Brian McCullough <bdmc at buadh-brath.com> wrote:
>> Frequently during the day, the system will become ( or the web sites will become )
>> non-responsive for periods ranging from one minute to well over an hour.
>
> Have you thought about putting limits on processes to prevent them
> from taking the system to it's knees ?  I would start by looking at
> ulimit.  If you can prevent the system from becoming un-responsive,
> then you can start investigating which process is going haywire and
> hopefully fix it properly.
>
>> Now, other things seem to be showing failure symtoms; for instance, bzip2, which
>> compresses the MySQL database backup seems to take hours instead of minutes;
>
> How big is the mysql dump file that is being compressed ?  Time how
> long it takes when the system is running normally, and compare with
> when the system is under load.
>
> time bzip2 test-backup
>
>
> I'm going to second Ron Kelley's suggestion that it might be a bad
> hard drive.  Check dmesg and syslog for hard drive error messages.  I
> had this happen on a RAID1 (mirror) system at work: it would normally
> run fine but grind to a snail pace when it happened to read a bad
> block on one of the drives.  I was disappointed that Linux software
> RAID1 did not help in this situation.
>
> Bill
>
Try running a temperature monitor on the disks.  I have seen cases
where very warm drives simply take forever to process requests without
actually going bad. In particular, I had one drive that ate its
machine's CPU until I put a cooler on it.  I still use the drive, but
not for operations that put it under heavy load for any length of time.



More information about the TriLUG mailing list