[TriLUG] Ubuntu FakeRAID or Software

Sat Nov 6 12:29:29 EDT 2010

On Fri, Nov 5, 2010 at 10:32 PM, Rodney Radford <rradford at mindspring.com> wrote:
>
> As I mentioned earlier, when you create a new RAID, it starts off in a degraded state and is then rebuilt - possibly taking hours to complete.  It does this as it initially lists one of your drives as a spare, writing just to the other drives, and then incorporates the spare in during the rebuild.
>
> You can over-ride this as shown in this link:
>
> http://serverfault.com/questions/43575/how-to-create-a-software-raid5-array-without-a-spare
>
> -----Original Message-----
>>From: Rodney Radford <rradford at mindspring.com>
>>Sent: Nov 5, 2010 9:56 PM
>>To: Triangle Linux Users Group General Discussion <trilug at trilug.org>
>>Subject: Re: [TriLUG] Ubuntu FakeRAID or Software
>>
>>
>>Did you examine /proc/mdstat before you rebooted? I suspect you were already in a degraded mode then as the system was still building the RAID. So when you rebooted, it started back again rebuilding the drive, you removed the drive (but it it still writing to it), so you get a busy when you try to add it back.
>>
>>Repeat the steps again, but this time check /proc/mdstat and verify it is complete (all UUUU) before the reboot. If you do happen to reboot before it is rebuilt, just be patient and wait for it to complete before changing it.
>>
>>Depending on the drive speed, RAID size, and RAID tuning parameters, it can take several hours for the initial build to complete.
>>
>>Also, check out this link for info on how to decrease the RAID rebuild time. I played with this about a year ago and was able to reduce a RAID rebuild on a 2TB array from several hours down to less than 30 minutes.
>>
>>http://www.ducea.com/2006/06/25/increase-the-speed-of-linux-software-raid-reconstruction/
>>
>>Good luck..

Yes, After creating the RAID I run mdadm --detail /dev/md0 and waiting
until everything was in sync. It showed the array to clean and
everything sync'd. Only then did I format it and mount it as a test.
Then when that all worked I rebooted.

I'm not seeing anything particularly wrong in dmesg:

[ 3100.990730] md: md0 stopped.
[ 3101.150132] md: bind<sdd2>
[ 3101.150364] md: bind<sde2>
[ 3101.150646] md: bind<sdc2>
[ 3101.208927] raid5: device sdc2 operational as raid disk 1
[ 3101.208931] raid5: device sde2 operational as raid disk 3
[ 3101.208935] raid5: device sdd2 operational as raid disk 2
[ 3101.209617] raid5: allocated 4282kB for md0
[ 3101.209664] 1: w=1 pa=0 pr=4 m=1 a=2 r=4 op1=0 op2=0
[ 3101.209669] 3: w=2 pa=0 pr=4 m=1 a=2 r=4 op1=0 op2=0
[ 3101.209673] 2: w=3 pa=0 pr=4 m=1 a=2 r=4 op1=0 op2=0
[ 3101.209677] raid5: raid level 5 set md0 active with 3 out of 4
devices, algorithm 2
[ 3101.209697] RAID5 conf printout:
[ 3101.209700]  --- rd:4 wd:3
[ 3101.209703]  disk 1, o:1, dev:sdc2
[ 3101.209705]  disk 2, o:1, dev:sdd2
[ 3101.209708]  disk 3, o:1, dev:sde2
[ 3101.209749] md0: detected capacity change from 0 to 112743677952
[ 3101.209898]  md0: unknown partition table

This just shows me it stopped the md0 and bound every disk in the RAID
except sdb which is the disk in question.

I see basically the same in /var/log/messages. In /var/log/daemon.log
I see the following:

Nov  5 20:26:18 porsche mdadm[1229]: DeviceDisappeared event detected
on md device /dev/md0
Nov  5 21:17:41 porsche mdadm[1229]: DegradedArray event detected on
md device /dev/md0

I think I'm going to rebuild this machine from scratch and see if
there is anything different that happens.

thanks,
Brian