Skip to main content

FAQ

Q: 無法啟動 node 服務

Application rabbit exited with reason: {{could_not_write_file,"/var/lib/rabbitmq/mnesia/rabbit@tpeeaprmq982/cluster_nodes.config",enospc},{rabbit,start,[normal,[]]}}

Solution:

可能是磁碟空間使用爆了,移除目錄 /var/lib/rabbitmq/mnesia 底下 node 資料子目錄。

[root@tpeeaprmq981 ~]# df -h
Filesystem                   Size  Used Avail Use% Mounted on
devtmpfs                     1.8G     0  1.8G   0% /dev
tmpfs                        1.8G  4.0K  1.8G   1% /dev/shm
tmpfs                        1.8G   24M  1.8G   2% /run
tmpfs                        1.8G     0  1.8G   0% /sys/fs/cgroup
/dev/mapper/rootvg-rootlv    9.0G  4.0G  5.1G  45% /
/dev/sda2                   1014M  344M  671M  34% /boot
/dev/sda1                    599M  5.8M  594M   1% /boot/efi
/dev/mapper/rootvg-mqdatalv  5.0G  5.0G   20K 100% /var/lib/rabbitmq
/dev/mapper/rootvg-homelv    507M   30M  478M   6% /home
/dev/mapper/rootvg-worktmp   507M   46M  462M   9% /worktmp
/dev/mapper/rootvg-optlv     2.0G  997M  1.1G  49% /opt
tmpfs                        364M     0  364M   0% /run/user/0
[root@tpeeaprmq981 ~]#
[root@tpeeaprmq981 ~]#
[root@tpeeaprmq981 ~]# du -csh /var/lib/rabbitmq/mnesia/*
204K    /var/lib/rabbitmq/mnesia/rabbit@rmq981
4.0K    /var/lib/rabbitmq/mnesia/rabbit@rmq981-feature_flags
0       /var/lib/rabbitmq/mnesia/rabbit@rmq981-plugins-expand
300K    /var/lib/rabbitmq/mnesia/rabbit@tpeeaprmq98
5.0G    /var/lib/rabbitmq/mnesia/rabbit@tpeeaprmq981
4.0K    /var/lib/rabbitmq/mnesia/rabbit@tpeeaprmq981-feature_flags
0       /var/lib/rabbitmq/mnesia/rabbit@tpeeaprmq981-plugins-expand
4.0K    /var/lib/rabbitmq/mnesia/rabbit@tpeeaprmq98-feature_flags
0       /var/lib/rabbitmq/mnesia/rabbit@tpeeaprmq98-plugins-expand
5.0G    total

[root@tpeeaprmq981 ~]# rm -rf /var/lib/rabbitmq/mnesia/rabbit@tpeeaprmq981
[root@tpeeaprmq981 ~]# rm /var/lib/rabbitmq/mnesia/rabbit@tpeeaprmq981-feature_flags
rm: remove regular file '/var/lib/rabbitmq/mnesia/rabbit@tpeeaprmq981-feature_flags'? y
[root@tpeeaprmq981 ~]#
[root@tpeeaprmq981 ~]# rm /var/lib/rabbitmq/mnesia/rabbit@tpeeaprmq981-plugins-expand
rm: cannot remove '/var/lib/rabbitmq/mnesia/rabbit@tpeeaprmq981-plugins-expand': Is a directory
[root@tpeeaprmq981 ~]# rm -rf /var/lib/rabbitmq/mnesia/rabbit@tpeeaprmq981-plugins-expand
Q: 無法加入 Cluster

[error] Node rabbit@tpeeaprmq98 thinks it's clustered with node rabbit@tpeeaprmq982, but rabbit@tpeeaprmq982 disagrees

Solution:

到 node rabbit@tpeeaprmq98 執行 rabbitmqctl cluster_status ,如果有顯示 node rabbit@tpeeaprmq982 ,執行強制移除指令。

# On the node rabbit@tpeeaprmq98
rabbitmqctl forget_cluster_node rabbit@tpeeaprmq982
Q: Network partition detected

Web UI 出現告警訊息:

network-partitions.png

Node 執行 rabbitmqctl cluster_status 出現 Network Partitions

Network Partitions

Node rabbit@tpeeaprmq98 cannot communicate with rabbit@tpeeaprmq982
Node rabbit@tpeeaprmq981 cannot communicate with rabbit@tpeeaprmq982

原因:tpeeaprmq982 由於硬體或網路異常造成意外的離線,當 node 重新恢復網路連線後,Cluster 會觸發 Network Partition 事件(aka split-brain 腦裂事件),且停止 quorum queue 的資料複寫,必須盡速完成修復。

解決方案:修復 Network Partition 異常事件

重啟發生問題 Node tpeeaprmq982 的服務

rabbitmqctl stop
systemctl start rabbitmq-server

network-partitions-2.png