Skip to main content

RabbitMQ Cluster

For Windows only

A few things to RabbitMQ Cluster
  • Classic Queues 預設只會在 Cluster 其中一個 node,而訊息仍可以透過其他 node 發送或接收;如果 Queues 要抄寫至所有 nodes,必須設定為 Quorum Queues
  • Cluster 裡每一個 node 都是視為一個 peer,彼此都是相同的,沒有主次的分別。
  • Cluster 的每個 node 之間使用 cookie 認證方式,cookie 檔路徑是 /var/lib/rabbitmq/.erlang.cookie
  • 一個 Cluster 組成 的 Node 數量,應該為奇數,例如 3、5、7等。原因是這樣的個數,才能在有 node 中斷服務時,讓 Cluster 有多數決的識別並形成共識。
  • 所有 Node 主機重新啟動時,如果只剩 1 個 Node 主機可正常運行,將無法正常啟動服務,直到第 2 個Node 主機恢復連線,服務才會重新恢復運作。 
  • Cluster 的網路架構必須是 LAN,而不要在 WAN。原因是:同個 Cluster 的不同 Node 間的網路通訊必須保持連線,一旦有任一個 Node 失去網路連線超過 60 秒,Cluster 就會發生 Network Partition 異常,也就是 split-brain (腦裂) 事件 。這個異常可能會導致 Cluster 無法工作,因應的處理程序可以參考:Clustering and Network Partitions
  • 同個 Cluster 的所有節點的設定,例如帳號認證、Queues、Exchanges、Routing Key以及其他設定均保持同步。
  • Node Plugin 設定不會同步。
實驗節點
  1. tpeeaprmq98 (node01)
  2. tpeeaprmq981 (node02)
  3. tpeeaprmq982 (node03)

/etc/hosts:

10.14.2.51      tpeeaprmq98
10.4.1.33       tpeeaprmq981
10.4.1.34       tpeeaprmq982
安裝 RabbitMQ

所有節點主機須完成 RabbitMQ 主程式安裝。

不同 node 之間的通訊是以 Erlang cookie 檔做認證

通常啟動 RabbitMQ 服務時,cookie 檔案會自動建立。可以自行變更檔案內容的複雜度,且檔案權限必須是 0600 。同一個 Cluster 的每一個 node 必須要有相同的 cookit 檔

scp /var/lib/rabbitmq/.erlang.cookie root@tpeeaprmq981:/var/lib/rabbitmq/
scp /var/lib/rabbitmq/.erlang.cookie root@tpeeaprmq982:/var/lib/rabbitmq/

預設路徑是 /var/lib/rabbitmq/.erlang.cookie

rabbitmq-diagnostics erlang_cookie_sources

新增 Cluster

Detach the service of all nodes

建立新的 Cluster 之前,所有 node 必須先卸載舊的 Cluster。

# On Node01
rabbitmq-server -detached

# On Node02
rabbitmq-server -detached

# On Node03
rabbitmq-server -detached

Verify the cluster status

[root@tpeeaprmq98 ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@tpeeaprmq98 ...
Basics

Cluster name: rabbit@tpeeaprmq98

Disk Nodes

rabbit@tpeeaprmq98

Running Nodes

rabbit@tpeeaprmq98

Versions

rabbit@tpeeaprmq98: RabbitMQ 3.10.7 on Erlang 25.0.4

Maintenance status

Node: rabbit@tpeeaprmq98, status: not under maintenance

Alarms

(none)

Network Partitions

(none)

Listeners

Node: rabbit@tpeeaprmq98, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@tpeeaprmq98, interface: [::], port: 1883, protocol: mqtt, purpose: MQTT
Node: rabbit@tpeeaprmq98, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@tpeeaprmq98, interface: [::], port: 15690, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0

Feature flags

Flag: classic_mirrored_queue_version, state: enabled
Flag: drop_unroutable_metric, state: disabled
Flag: empty_basic_get_metric, state: disabled
Flag: implicit_default_bindings, state: enabled
Flag: maintenance_mode_status, state: enabled
Flag: quorum_queue, state: enabled
Flag: stream_queue, state: enabled
Flag: user_limits, state: enabled
Flag: virtual_host_metadata, state: enabled
[root@tpeeaprmq981 ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@tpeeaprmq981 ...
Basics

Cluster name: rabbit@tpeeaprmq981

Disk Nodes

rabbit@tpeeaprmq981

Running Nodes

rabbit@tpeeaprmq981

Versions

rabbit@tpeeaprmq981: RabbitMQ 3.10.7 on Erlang 25.0.4

Maintenance status

Node: rabbit@tpeeaprmq981, status: not under maintenance

Alarms

(none)

Network Partitions

(none)

Listeners

Node: rabbit@tpeeaprmq981, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@tpeeaprmq981, interface: [::], port: 1883, protocol: mqtt, purpose: MQTT
Node: rabbit@tpeeaprmq981, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@tpeeaprmq981, interface: [::], port: 15690, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0

Feature flags

Flag: classic_mirrored_queue_version, state: enabled
Flag: drop_unroutable_metric, state: enabled
Flag: empty_basic_get_metric, state: enabled
Flag: implicit_default_bindings, state: enabled
Flag: maintenance_mode_status, state: enabled
Flag: quorum_queue, state: enabled
Flag: stream_queue, state: enabled
Flag: user_limits, state: enabled
Flag: virtual_host_metadata, state: enabled
[root@tpeeaprmq982 ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@tpeeaprmq982 ...
Basics

Cluster name: rabbit@tpeeaprmq982

Disk Nodes

rabbit@tpeeaprmq982

Running Nodes

rabbit@tpeeaprmq982

Versions

rabbit@tpeeaprmq982: RabbitMQ 3.10.7 on Erlang 25.0.4

Maintenance status

Node: rabbit@tpeeaprmq982, status: not under maintenance

Alarms

(none)

Network Partitions

(none)

Listeners

Node: rabbit@tpeeaprmq982, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@tpeeaprmq982, interface: [::], port: 1883, protocol: mqtt, purpose: MQTT
Node: rabbit@tpeeaprmq982, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@tpeeaprmq982, interface: [::], port: 15690, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0

Feature flags

Flag: classic_mirrored_queue_version, state: enabled
Flag: drop_unroutable_metric, state: enabled
Flag: empty_basic_get_metric, state: enabled
Flag: implicit_default_bindings, state: enabled
Flag: maintenance_mode_status, state: enabled
Flag: quorum_queue, state: enabled
Flag: stream_queue, state: enabled
Flag: user_limits, state: enabled
Flag: virtual_host_metadata, state: enabled

Creating a Cluster

將 node02 與 node03 加入到 node01。

# On Node02
rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl join_cluster rabbit@tpeeaprmq98
rabbitmqctl start_app

# On Node03
rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl join_cluster rabbit@tpeeaprmq98
rabbitmqctl start_app

加入 cluster 時,出現以下訊息,可以直接忽略。

[root@tpeeaprmq981 ~]# rabbitmqctl join_cluster rabbit@tpeeaprmq98
Clustering node rabbit@tpeeaprmq981 with rabbit@tpeeaprmq98

15:10:18.438 [warning] Feature flags: the previous instance of this node must have failed to write the `feature_flags` file at `/var/lib/rabbitmq/mnesia/rabbit@tpeeaprmq981-feature_flags`:

15:10:18.438 [warning] Feature flags:   - list of previously disabled feature flags now marked as such: [:maintenance_mode_status]

15:10:18.561 [warning] Feature flags: the previous instance of this node must have failed to write the `feature_flags` file at `/var/lib/rabbitmq/mnesia/rabbit@tpeeaprmq981-feature_flags`:

15:10:18.561 [warning] Feature flags:   - list of previously enabled feature flags now marked as such: [:maintenance_mode_status]

15:10:18.598 [error] Failed to create a tracked connection table for node :rabbit@tpeeaprmq981: {:node_not_running, :rabbit@tpeeaprmq981}

15:10:18.598 [error] Failed to create a per-vhost tracked connection table for node :rabbit@tpeeaprmq981: {:node_not_running, :rabbit@tpeeaprmq981}

15:10:18.599 [error] Failed to create a per-user tracked connection table for node :rabbit@tpeeaprmq981: {:node_not_running, :rabbit@tpeeaprmq981}

Verify the cluster status

每個 node 的 Cluster 狀態應該都要一樣,除了 Cluster name 會顯示目前所在的 node name。

[root@tpeeaprmq98 ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@tpeeaprmq98 ...
Basics

Cluster name: rabbit@tpeeaprmq98

Disk Nodes

rabbit@tpeeaprmq98
rabbit@tpeeaprmq981
rabbit@tpeeaprmq982

Running Nodes

rabbit@tpeeaprmq98
rabbit@tpeeaprmq981
rabbit@tpeeaprmq982

Versions

rabbit@tpeeaprmq98: RabbitMQ 3.10.7 on Erlang 25.0.4
rabbit@tpeeaprmq981: RabbitMQ 3.10.7 on Erlang 25.0.4
rabbit@tpeeaprmq982: RabbitMQ 3.10.7 on Erlang 25.0.4

Maintenance status

Node: rabbit@tpeeaprmq98, status: not under maintenance
Node: rabbit@tpeeaprmq981, status: not under maintenance
Node: rabbit@tpeeaprmq982, status: not under maintenance

Alarms

(none)

Network Partitions

(none)

Listeners

Node: rabbit@tpeeaprmq98, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@tpeeaprmq98, interface: [::], port: 1883, protocol: mqtt, purpose: MQTT
Node: rabbit@tpeeaprmq98, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@tpeeaprmq98, interface: [::], port: 15690, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@tpeeaprmq981, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@tpeeaprmq981, interface: [::], port: 1883, protocol: mqtt, purpose: MQTT
Node: rabbit@tpeeaprmq981, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@tpeeaprmq981, interface: [::], port: 15690, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@tpeeaprmq982, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@tpeeaprmq982, interface: [::], port: 1883, protocol: mqtt, purpose: MQTT
Node: rabbit@tpeeaprmq982, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@tpeeaprmq982, interface: [::], port: 15690, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0

Feature flags

Flag: classic_mirrored_queue_version, state: enabled
Flag: drop_unroutable_metric, state: enabled
Flag: empty_basic_get_metric, state: enabled
Flag: implicit_default_bindings, state: enabled
Flag: maintenance_mode_status, state: enabled
Flag: quorum_queue, state: enabled
Flag: stream_queue, state: enabled
Flag: user_limits, state: enabled
Flag: virtual_host_metadata, state: enabled
Node 管理

建議不要設定自動啟動,以避免重啟系統後,在沒有經過人工確認的程序,系統就自動加入原有的 Cluster,可能對 Cluster 線上服務造成影響。

systemctl disable rabbitmq-server

重啟 Node

# Recommend using systemd
systemctl stop rabbitmq-server
systemctl start rabbitmq-server

# Using rabbitmqctl + systemd
rabbitmqctl stop
systemctl start rabbitmq-server

# Using rabbitmqctl
# Stop the node
rabbitmqctl stop
# Satrt the node
rabbitmq-server -detached
# Verify if the node is awaiting schema table sync 
rabbitmq-diagnostics check_running

# Forcing node boot
rabbitmqctl force_boot

增加新 Node

NOTE: 加入到 Cluster 後,相關的認證、Queues、Exchange 以及其他設定均會同步到新 Node。

移除 Node

NOTE: Node 卸載 Cluster 後,這個 Node 的相關的認證、Queues、Exchange 以及其他設定均會被清除。

# 正常卸載 node
# 在要卸載的 node 執行
rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl start_app
rabbitmqctl cluster_status

# 強制移除 node
# 在 Cluster 的其他 node 執行 
rabbitmqctl forget_cluster_node <node-name>

Rebalance the queues across node

每一個 quorum queue 在任一個 node 上完成宣告後,會自動同步到每個其他的 node,並且隨機的區分為一個 Leader node,其餘則為 Follower node

Queue 平常的工作負載以 Leader node 為主,除非遇到 Leader node 停止服務,系統就會從現有 Follower nodes 挑選其中一個成為新的 Leader node。

所有的 quorum queue 的 Leader node 應該要均衡分配到每一個 node 之間,這樣的效能就可以平均負載到所有 node。

  • After restarting a node
  • After joining a node

NOTE: 每個 queue 目前的 Leader node 可以從 Web-UI 得知。

queue 的 Leader Node 與 Channel Node 不一定會是同一個,有時會不同。

rabbitmq-queues rebalance all
rabbitmq-queues rebalance "all" --vhost-pattern "itp_server" --queue-pattern ".*"