RabbitMQ Cluster
- Clustering Guide — RabbitMQ
- RabbitMQ Learning III: RabbitMQ Clustering and Load Balancing
- How to Set Up the RabbitMQ Cluster on Ubuntu/Debian Linux
- Clustering Guide (vmware.com)
- RabbitMQ 集群搭建 (with HAProxy)
- RabbitMQ集群架构模式 | guaosi的博客
For Windows only
A few things to RabbitMQ Cluster
- Classic Queues 預設只會在 Cluster 其中一個 node,而訊息仍可以透過其他 node 發送或接收;如果 Queues 要抄寫至所有 nodes,必須設定為 Quorum Queues。
- Cluster 裡每一個 node 都是視為一個 peer,彼此都是相同的,沒有主次的分別。
- Cluster 的每個 node 之間使用 cookie 認證方式,cookie 檔路徑是
/var/lib/rabbitmq/.erlang.cookie
。 - 一個 Cluster 組成 的 Node 數量,應該為奇數,例如 3、5、7等。原因是這樣的個數,才能在有 node 中斷服務時,讓 Cluster 有多數決的識別並形成共識。
- 所有 Node 主機重新啟動時,如果只剩 1 個 Node 主機可正常運行,將無法正常啟動服務,直到第 2 個Node 主機恢復連線,服務才會重新恢復運作。
- Cluster 的網路架構必須是 LAN,而不要在 WAN。原因是:同個 Cluster 的不同 Node 間的網路通訊必須保持連線,一旦有任一個 Node 失去網路連線超過 60 秒,Cluster 就會發生 Network Partition 異常,也就是 split-brain (腦裂) 事件 。這個異常可能會導致 Cluster 無法工作,因應的處理程序可以參考:Clustering and Network Partitions
- 同個 Cluster 的所有節點的設定,例如帳號認證、Queues、Exchanges、Routing Key以及其他設定均保持同步。
- Node Plugin 設定不會同步。
實驗節點
- tpeeaprmq98 (node01)
- tpeeaprmq981 (node02)
- tpeeaprmq982 (node03)
/etc/hosts:
10.14.2.51 tpeeaprmq98
10.4.1.33 tpeeaprmq981
10.4.1.34 tpeeaprmq982
安裝 RabbitMQ
所有節點主機須完成 RabbitMQ 主程式安裝。
同步 Cookie 認證檔
scp /var/lib/rabbitmq/.erlang.cookie root@tpeeaprmq981:/var/lib/rabbitmq/
scp /var/lib/rabbitmq/.erlang.cookie root@tpeeaprmq982:/var/lib/rabbitmq/
新增 Cluster
Detach the service of all nodes
建立新的 Cluster 之前,所有 node 必須先卸載舊的 Cluster。
# On Node01
rabbitmq-server -detached
# On Node02
rabbitmq-server -detached
# On Node03
rabbitmq-server -detached
Verify the cluster status
[root@tpeeaprmq98 ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@tpeeaprmq98 ...
Basics
Cluster name: rabbit@tpeeaprmq98
Disk Nodes
rabbit@tpeeaprmq98
Running Nodes
rabbit@tpeeaprmq98
Versions
rabbit@tpeeaprmq98: RabbitMQ 3.10.7 on Erlang 25.0.4
Maintenance status
Node: rabbit@tpeeaprmq98, status: not under maintenance
Alarms
(none)
Network Partitions
(none)
Listeners
Node: rabbit@tpeeaprmq98, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@tpeeaprmq98, interface: [::], port: 1883, protocol: mqtt, purpose: MQTT
Node: rabbit@tpeeaprmq98, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@tpeeaprmq98, interface: [::], port: 15690, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Feature flags
Flag: classic_mirrored_queue_version, state: enabled
Flag: drop_unroutable_metric, state: disabled
Flag: empty_basic_get_metric, state: disabled
Flag: implicit_default_bindings, state: enabled
Flag: maintenance_mode_status, state: enabled
Flag: quorum_queue, state: enabled
Flag: stream_queue, state: enabled
Flag: user_limits, state: enabled
Flag: virtual_host_metadata, state: enabled
[root@tpeeaprmq981 ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@tpeeaprmq981 ...
Basics
Cluster name: rabbit@tpeeaprmq981
Disk Nodes
rabbit@tpeeaprmq981
Running Nodes
rabbit@tpeeaprmq981
Versions
rabbit@tpeeaprmq981: RabbitMQ 3.10.7 on Erlang 25.0.4
Maintenance status
Node: rabbit@tpeeaprmq981, status: not under maintenance
Alarms
(none)
Network Partitions
(none)
Listeners
Node: rabbit@tpeeaprmq981, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@tpeeaprmq981, interface: [::], port: 1883, protocol: mqtt, purpose: MQTT
Node: rabbit@tpeeaprmq981, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@tpeeaprmq981, interface: [::], port: 15690, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Feature flags
Flag: classic_mirrored_queue_version, state: enabled
Flag: drop_unroutable_metric, state: enabled
Flag: empty_basic_get_metric, state: enabled
Flag: implicit_default_bindings, state: enabled
Flag: maintenance_mode_status, state: enabled
Flag: quorum_queue, state: enabled
Flag: stream_queue, state: enabled
Flag: user_limits, state: enabled
Flag: virtual_host_metadata, state: enabled
[root@tpeeaprmq982 ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@tpeeaprmq982 ...
Basics
Cluster name: rabbit@tpeeaprmq982
Disk Nodes
rabbit@tpeeaprmq982
Running Nodes
rabbit@tpeeaprmq982
Versions
rabbit@tpeeaprmq982: RabbitMQ 3.10.7 on Erlang 25.0.4
Maintenance status
Node: rabbit@tpeeaprmq982, status: not under maintenance
Alarms
(none)
Network Partitions
(none)
Listeners
Node: rabbit@tpeeaprmq982, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@tpeeaprmq982, interface: [::], port: 1883, protocol: mqtt, purpose: MQTT
Node: rabbit@tpeeaprmq982, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@tpeeaprmq982, interface: [::], port: 15690, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Feature flags
Flag: classic_mirrored_queue_version, state: enabled
Flag: drop_unroutable_metric, state: enabled
Flag: empty_basic_get_metric, state: enabled
Flag: implicit_default_bindings, state: enabled
Flag: maintenance_mode_status, state: enabled
Flag: quorum_queue, state: enabled
Flag: stream_queue, state: enabled
Flag: user_limits, state: enabled
Flag: virtual_host_metadata, state: enabled
Creating a Cluster
將 node02 與 node03 加入到 node01。
# On Node02
rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl join_cluster rabbit@tpeeaprmq98
rabbitmqctl start_app
# On Node03
rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl join_cluster rabbit@tpeeaprmq98
rabbitmqctl start_app
加入 cluster 時,出現以下訊息,可以直接忽略。
[root@tpeeaprmq981 ~]# rabbitmqctl join_cluster rabbit@tpeeaprmq98
Clustering node rabbit@tpeeaprmq981 with rabbit@tpeeaprmq98
15:10:18.438 [warning] Feature flags: the previous instance of this node must have failed to write the `feature_flags` file at `/var/lib/rabbitmq/mnesia/rabbit@tpeeaprmq981-feature_flags`:
15:10:18.438 [warning] Feature flags: - list of previously disabled feature flags now marked as such: [:maintenance_mode_status]
15:10:18.561 [warning] Feature flags: the previous instance of this node must have failed to write the `feature_flags` file at `/var/lib/rabbitmq/mnesia/rabbit@tpeeaprmq981-feature_flags`:
15:10:18.561 [warning] Feature flags: - list of previously enabled feature flags now marked as such: [:maintenance_mode_status]
15:10:18.598 [error] Failed to create a tracked connection table for node :rabbit@tpeeaprmq981: {:node_not_running, :rabbit@tpeeaprmq981}
15:10:18.598 [error] Failed to create a per-vhost tracked connection table for node :rabbit@tpeeaprmq981: {:node_not_running, :rabbit@tpeeaprmq981}
15:10:18.599 [error] Failed to create a per-user tracked connection table for node :rabbit@tpeeaprmq981: {:node_not_running, :rabbit@tpeeaprmq981}
Verify the cluster status
每個 node 的 Cluster 狀態應該都要一樣,除了 Cluster name 會顯示目前所在的 node name。
[root@tpeeaprmq98 ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@tpeeaprmq98 ...
Basics
Cluster name: rabbit@tpeeaprmq98
Disk Nodes
rabbit@tpeeaprmq98
rabbit@tpeeaprmq981
rabbit@tpeeaprmq982
Running Nodes
rabbit@tpeeaprmq98
rabbit@tpeeaprmq981
rabbit@tpeeaprmq982
Versions
rabbit@tpeeaprmq98: RabbitMQ 3.10.7 on Erlang 25.0.4
rabbit@tpeeaprmq981: RabbitMQ 3.10.7 on Erlang 25.0.4
rabbit@tpeeaprmq982: RabbitMQ 3.10.7 on Erlang 25.0.4
Maintenance status
Node: rabbit@tpeeaprmq98, status: not under maintenance
Node: rabbit@tpeeaprmq981, status: not under maintenance
Node: rabbit@tpeeaprmq982, status: not under maintenance
Alarms
(none)
Network Partitions
(none)
Listeners
Node: rabbit@tpeeaprmq98, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@tpeeaprmq98, interface: [::], port: 1883, protocol: mqtt, purpose: MQTT
Node: rabbit@tpeeaprmq98, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@tpeeaprmq98, interface: [::], port: 15690, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@tpeeaprmq981, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@tpeeaprmq981, interface: [::], port: 1883, protocol: mqtt, purpose: MQTT
Node: rabbit@tpeeaprmq981, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@tpeeaprmq981, interface: [::], port: 15690, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@tpeeaprmq982, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@tpeeaprmq982, interface: [::], port: 1883, protocol: mqtt, purpose: MQTT
Node: rabbit@tpeeaprmq982, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@tpeeaprmq982, interface: [::], port: 15690, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Feature flags
Flag: classic_mirrored_queue_version, state: enabled
Flag: drop_unroutable_metric, state: enabled
Flag: empty_basic_get_metric, state: enabled
Flag: implicit_default_bindings, state: enabled
Flag: maintenance_mode_status, state: enabled
Flag: quorum_queue, state: enabled
Flag: stream_queue, state: enabled
Flag: user_limits, state: enabled
Flag: virtual_host_metadata, state: enabled
Node 管理
建議不要設定自動啟動,以避免重啟系統後,在沒有經過人工確認的程序,系統就自動加入原有的 Cluster,可能對 Cluster 線上服務造成影響。
systemctl disable rabbitmq-server
重啟 Node
# Recommend using systemd
systemctl stop rabbitmq-server
systemctl start rabbitmq-server
# Using rabbitmqctl + systemd
rabbitmqctl stop
systemctl start rabbitmq-server
# Using rabbitmqctl
# Stop the node
rabbitmqctl stop
# Satrt the node
rabbitmq-server -detached
# Verify if the node is awaiting schema table sync
rabbitmq-diagnostics check_running
# Forcing node boot
rabbitmqctl force_boot
增加新 Node
NOTE: 加入到 Cluster 後,相關的認證、Queues、Exchange 以及其他設定均會同步
。到新 Node。
# Find out the path of Erlang Cookie file
rabbitmq-diagnostics erlang_cookie_sources
# Copy Cookie from one node of the cluster
scp /var/lib/rabbitmq/.erlang.cookie root@<new-node>:/var/lib/rabbitmq/
# Join a new node into the cluster rabbit@tpeeaprmq98
rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl join_cluster <cluster-name>
rabbitmqctl start_app
# Alternatively, you can join it as RAM node by following command
rabbitmqctl join_cluster <cluster-name> --ram
移除 Node
NOTE: Node 卸載 Cluster 後,這個 Node 的相關的認證、Queues、Exchange 以及其他設定均會被清除。
# 正常卸載 node
# 在要卸載的 node 執行
rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl start_app
rabbitmqctl cluster_status
# 強制移除 node
# 在 Cluster 的其他 node 執行
rabbitmqctl forget_cluster_node <node-name>
Rebalance the queues across node
每一個 quorum queue 在任一個 node 上完成宣告後,會自動同步到每個其他的 node,並且隨機的區分為一個 Leader node,其餘則為 Follower node。
Queue 平常的工作負載以 Leader node 為主,除非遇到 Leader node 停止服務,系統就會從現有 Follower nodes 挑選其中一個成為新的 Leader node。
所有的 quorum queue 的 Leader node 應該要均衡分配到每一個 node 之間,這樣的效能就可以平均負載到所有 node。
- After restarting a node
- After joining a node
NOTE: 每個 queue 目前的 Leader node 可以從 Web-UI 得知。
queue 的 Leader Node 與 Channel Node 不一定會是同一個,有時會不同。
rabbitmq-queues rebalance all
rabbitmq-queues rebalance "all" --vhost-pattern "itp_server" --queue-pattern ".*"