I am not the first one with a stuck mdadm --grow
but I think that mine is a bit different from the others. In my case all devices are faulty and the state says FAILED:
root@linux:~# mdadm --detail /dev/md126
/dev/md126:
Version : 1.2
Creation Time : Tue Jan 26 11:57:52 2021
Raid Level : raid5
Array Size : 1953258496 (1862.77 GiB 2000.14 GB)
Used Dev Size : 976629248 (931.39 GiB 1000.07 GB)
Raid Devices : 4
Total Devices : 4
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Mon Aug 19 14:56:31 2024
State : active, FAILED, reshaping
Active Devices : 0
Failed Devices : 4
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Consistency Policy : bitmap
Reshape Status : 38% complete
Number Major Minor RaidDevice State
0 8 17 0 faulty /dev/sdb1
1 8 65 1 faulty /dev/sde1
3 8 49 2 faulty /dev/sdd1
4 8 33 3 faulty /dev/sdc1
I created the RAID5 with 3 1TB SSD disks and used it completely for LVM. Yesterday I added a 4th 1TB SSD disk and did the following commands:
mdadm --add /dev/md126 /dev/sdc1
mdadm --grow /dev/md126 --raid-devices=4
At first there was no problem. The RAID5 was still active and slowly accessible. About 4 hours later something must have happened. This morning I checked and the status of mdadm
did not change but I lost my RAID5 in LVM although the LVs are still mounted but somehow crippled.
With dmesg
I get errors like:
[81591.695415] EXT4-fs (dm-5): I/O error while writing superblock
[81591.710467] EXT4-fs error (device dm-5): __ext4_get_inode_loc_noinmem:4617: inode #524289: block 2097184: comm ls: unable to read itable block
[81591.710488] Buffer I/O error on dev dm-5, logical block 0, lost sync page write
[81591.710495] EXT4-fs (dm-5): I/O error while writing superblock
[82806.711267] sd 0:0:0:0: [sdb] tag#8 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[82806.711279] sd 0:0:0:0: [sdb] tag#8 CDB: ATA command pass through(16) 85 06 2c 00 00 00 00 00 00 00 00 00 00 00 e5 00
[82806.711333] sd 5:0:0:0: [sdc] tag#9 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[82806.711339] sd 5:0:0:0: [sdc] tag#9 CDB: ATA command pass through(16) 85 06 2c 00 00 00 00 00 00 00 00 00 00 00 e5 00
[82806.711382] sd 4:0:0:0: [sdd] tag#11 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[82806.711388] sd 4:0:0:0: [sdd] tag#11 CDB: ATA command pass through(16) 85 06 2c 00 00 00 00 00 00 00 00 00 00 00 e5 00
[82806.711431] sd 1:0:0:0: [sde] tag#21 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[82806.711436] sd 1:0:0:0: [sde] tag#21 CDB: ATA command pass through(16) 85 06 2c 00 00 00 00 00 00 00 00 00 00 00 e5 00
The mdadm --examine --scan
gives no output at all, it just stops. The current content is:
# automatically tag new arrays as belonging to the local system
HOMEHOST <system>
# instruct the monitoring daemon where to send mail alerts
MAILADDR root
# definitions of existing MD arrays
ARRAY /dev/md126 level=raid5 num-devices=3 metadata=1.2 name=horus:0 UUID=b187df52:41d7a47e:98e7fa00:cae9bf67
devices=/dev/sda1,/dev/sdb1,/dev/sdc1
By looking at it now I notice that the devices were not correct. Since 2021 the devices must have changed.
There has always been something strange about my RAID5. It changed from /dev/md0
to /dev/md127
and back once in a while and now, after adding the 4th disk, /dev/md126
.
Is there a solution for this? Should I stop mdadm
and restart the grow? Something else?