mdadm - Linux Software Raid
主要套件:mdadm
主要服務:mdmonitor (監控 Raid 狀態)
系統管理注意要項:
使用心得:
Raid 0
# mdadm --create --verbose /dev/md0 --level=0 --raid-devices=2 /dev/sdb /dev/sdc
Raid 1
# mdadm --create --verbose /dev/md0 --level=1 --raid-devices=2 /dev/sdb /dev/sdc --spare-devices=/dev/sdd
Raid 5
# mdadm --create --verbose /dev/md0 --level=5 --raid-devices=3 /dev/sdb /dev/sdc /dev/sdd --spare-devices=/dev/sde
# mkfs.ext4 /dev/md0 # mkdir /data01 # mount /dev/md0 /data01
Raid 設定檔 /etc/mdadm.conf
重新掃描 raid 與更新設定檔 mdadm --verbose --detail -scan > /etc/mdadm.conf
設定警告郵件通知
/etc/mdadm.conf,加上這行 (NOTE: 只能設定一個信箱)
MAILADDR user@my.company.com
顯示 Raid 整體狀態
cat /proc/mdstat
顯示 /dev/md0 詳細資訊
# mdadm --detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Mon Nov 3 06:03:03 2014 Raid Level : linear Array Size : 4194288 (4.00 GiB 4.29 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Update Time : Mon Nov 3 06:03:03 2014 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Rounding : 0K Name : localhost.localdomain:0 (local to host localhost.localdomain) UUID : a50ac9f2:62646d92:725255bd:7f9d30e3 Events : 0 Number Major Minor RaidDevice State 0 8 16 0 active sync /dev/sdb 1 8 32 1 active sync /dev/sdc
更換磁碟
// 將磁碟標示為 Fail mdadm /dev/raid127 --fail /dev/sdc // 移除磁碟 mdadm /dev/raid127 --remove /dev/sdc // 加入磁碟 mdadm /dev/raid127 --add /dev/sdc
/proc/mdstat
# awk '/^md/ {printf "%s: ", $1}; /blocks/ {print $NF}' </proc/mdstat md126: [UU] md127: [UUUUU] # watch -t 'cat /proc/mdstat'
iotop 檢視狀態
# iotop -a -p $(sed 's, , -p ,g' <<<`pgrep "_raid|_resync|jbd2"`)
iostat 檢視狀態
# iostat -dmy 1 /dev/md127 # iostat -dmy 1 /dev/md126
mdmonitor 警告郵件測試
# mdadm --monitor --scan --oneshot --test
警告通知的郵件內容
This is an automatically generated mail message from mdadm
running on plinux.localdomain
A Fail event had been detected on md device /dev/md/pv00.
It could be related to component device /dev/sdc1.
Faithfully yours, etc.
P.S. The /proc/mdstat file currently contains the following:
Personalities : [raid1] [raid6] [raid5] [raid4]
md126 : active raid1 sda2[0] sdb2[1]
1049536 blocks super 1.0 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk
md127 : active raid6 sda3[0] sdb3[1] sdc1[2](F) sde1[4] sdd1[3]
2809500672 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/4] [UU_UU]
bitmap: 0/1 pages [0KB], 65536KB chunk
unused devices: <none>
停止並永久移除
# mdadm --stop /dev/md0 # mdadm --remove /dev/md0
啟動 raid
# mdadm --asemble /dev/md0
NOTE : The assemble command reads the /etc/mdadm.conf file to start the array. In case you did not save your configuration in mdadm.conf before stopping the array, this command would fail. You can use the below command to recreate the mdadm.conf file :
# mdadm –examine –scan > /etc/mdadm.conf
優點:
編輯 /etc/mdadm.conf,加上
PROGRAM /path/to/raid-event.sh
/path/to/raid-event.sh
#!/bin/bash # # mdadm RAID health check # # Events are being passed to xmessage via $1 (events) and $2 (device) # # Setting variables to readable values event=$1 device=$2 # Check event and then popup a window with appropriate message based on event if [ $event == "Fail" ];then message="A failure has been detected on device $device" else if [ $event == "FailSpare" ]; then message="A failure has been detected on spare device $device" else if [ $event == "DegradedArray" ]; then message="A Degraded Array has been detected on device $device" else if [ $event == "TestMessage" ]; then message="A Test Message has been generated on device $device" fi fi fi fi output="/tmp/my.log" echo "event=$event" > $output echo "device=$device" >> $output echo $message >> $output echo "EOF" >> $output
請參閱:smartctl - Test If Linux Server SCSI / SATA / SSD Hard Disk Going Bad
Images 1 | ||
---|---|---|
Raid Rebuildingmdadm-rebuild.png |