FAQ
Q: 無法啟動 node 服務
Application rabbit exited with reason: {{could_not_write_file,"/var/lib/rabbitmq/mnesia/rabbit@tpeeaprmq982/cluster_nodes.config",enospc},{rabbit,start,[normal,[]]}}
Solution:
可能是磁碟空間使用爆了,移除目錄 /var/lib/rabbitmq/mnesia 底下 node 資料子目錄。
[root@tpeeaprmq981 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 1.8G 0 1.8G 0% /dev
tmpfs 1.8G 4.0K 1.8G 1% /dev/shm
tmpfs 1.8G 24M 1.8G 2% /run
tmpfs 1.8G 0 1.8G 0% /sys/fs/cgroup
/dev/mapper/rootvg-rootlv 9.0G 4.0G 5.1G 45% /
/dev/sda2 1014M 344M 671M 34% /boot
/dev/sda1 599M 5.8M 594M 1% /boot/efi
/dev/mapper/rootvg-mqdatalv 5.0G 5.0G 20K 100% /var/lib/rabbitmq
/dev/mapper/rootvg-homelv 507M 30M 478M 6% /home
/dev/mapper/rootvg-worktmp 507M 46M 462M 9% /worktmp
/dev/mapper/rootvg-optlv 2.0G 997M 1.1G 49% /opt
tmpfs 364M 0 364M 0% /run/user/0
[root@tpeeaprmq981 ~]#
[root@tpeeaprmq981 ~]#
[root@tpeeaprmq981 ~]# du -csh /var/lib/rabbitmq/mnesia/*
204K /var/lib/rabbitmq/mnesia/rabbit@rmq981
4.0K /var/lib/rabbitmq/mnesia/rabbit@rmq981-feature_flags
0 /var/lib/rabbitmq/mnesia/rabbit@rmq981-plugins-expand
300K /var/lib/rabbitmq/mnesia/rabbit@tpeeaprmq98
5.0G /var/lib/rabbitmq/mnesia/rabbit@tpeeaprmq981
4.0K /var/lib/rabbitmq/mnesia/rabbit@tpeeaprmq981-feature_flags
0 /var/lib/rabbitmq/mnesia/rabbit@tpeeaprmq981-plugins-expand
4.0K /var/lib/rabbitmq/mnesia/rabbit@tpeeaprmq98-feature_flags
0 /var/lib/rabbitmq/mnesia/rabbit@tpeeaprmq98-plugins-expand
5.0G total
[root@tpeeaprmq981 ~]# rm -rf /var/lib/rabbitmq/mnesia/rabbit@tpeeaprmq981
[root@tpeeaprmq981 ~]# rm /var/lib/rabbitmq/mnesia/rabbit@tpeeaprmq981-feature_flags
rm: remove regular file '/var/lib/rabbitmq/mnesia/rabbit@tpeeaprmq981-feature_flags'? y
[root@tpeeaprmq981 ~]#
[root@tpeeaprmq981 ~]# rm /var/lib/rabbitmq/mnesia/rabbit@tpeeaprmq981-plugins-expand
rm: cannot remove '/var/lib/rabbitmq/mnesia/rabbit@tpeeaprmq981-plugins-expand': Is a directory
[root@tpeeaprmq981 ~]# rm -rf /var/lib/rabbitmq/mnesia/rabbit@tpeeaprmq981-plugins-expand
Q: 無法加入 Cluster
[error] Node rabbit@tpeeaprmq98 thinks it's clustered with node rabbit@tpeeaprmq982, but rabbit@tpeeaprmq982 disagrees
Solution:
到 node rabbit@tpeeaprmq98 執行 rabbitmqctl cluster_status
,如果有顯示 node rabbit@tpeeaprmq982 ,執行強制移除指令。
# On the node rabbit@tpeeaprmq98
rabbitmqctl forget_cluster_node rabbit@tpeeaprmq982
Q: Network partition detected
Web UI 出現告警訊息:
Node 執行 rabbitmqctl cluster_status
出現 Network Partitions
Network Partitions
Node rabbit@tpeeaprmq98 cannot communicate with rabbit@tpeeaprmq982
Node rabbit@tpeeaprmq981 cannot communicate with rabbit@tpeeaprmq982
原因:tpeeaprmq982 由於硬體或網路異常造成意外的離線,當 node 重新恢復網路連線後,Cluster 會觸發 Network Partition 事件(aka split-brain 腦裂事件),且停止 quorum queue 的資料複寫,必須盡速完成修復。
解決方案:修復 Network Partition 異常事件
重啟發生問題 Node tpeeaprmq982 的服務
rabbitmqctl stop
systemctl start rabbitmq-server
No Comments