vSphere Monitoring

Method #1: Telegraf + InfluxDB

Install Telegraf

Download: https://portal.influxdata.com/downloads/

yum localinstall telegraf-1.18.3-1.x86_64.rpm

Configure Telegraf

Create a configuration file

telegraf config > /etc/telegraf/telegraf-vmware.conf

vi /etc/telegraf/telegraf-vmware.conf

Log file

...
[agent]
...
  logfile = "/var/log/telegraf/telegraf-vmware.log"
...
  ## If set to true, do no set the "host" tag in the telegraf agent.
  omit_hostname = true

Output for InfluxDB 1.x

# Configuration for sending metrics to InfluxDB 1.x
[[outputs.influxdb]]
    urls = ["http://10.10.2.209:8086"]
    database = "vmware"
    timeout = "0s"
    username = "admin"
    password = "dba4mis"
    retention_policy = "200d"

Output for InfluxDB 2.x

[[outputs.influxdb_v2]]
  ## The URLs of the InfluxDB cluster nodes.
  ##
  ## Multiple URLs can be specified for a single cluster, only ONE of the
  ## urls will be written to each interval.
  ##   ex: urls = ["https://us-west-2-1.aws.cloud2.influxdata.com"]
  urls = ["http://127.0.0.1:8086"]

  ## Token for authentication.
  token = "Your-Token"

  ## Organization is the name of the organization you wish to write to.
  organization = "Your-Org-Name"

  ## Destination bucket to write into.
  bucket = "Tour-Bucket-Name"
  
  ## Timeout for HTTP messages.
  timeout = "5s"

Input

參考範例: Telegraf: VMware vSphere Input Plugin

###############################################################################
#                            INPUT PLUGINS                                    #
###############################################################################


## Realtime instance
[[inputs.vsphere]]
  interval = "60s"

  ## List of vCenter URLs to be monitored. These three lines must be uncommented
  ## and edited for the plugin to work.
  vcenters = [ "https://vcenter-server-ip/sdk" ]
  username = "admin@vsphere.local"
  password = "ThisPassword"

  # Exclude all historical metrics
  datastore_metric_exclude = ["*"]
  cluster_metric_exclude = ["*"]
  datacenter_metric_exclude = ["*"]
  resourcepool_metric_exclude = ["*"]

  #max_query_metrics = 256
  #timeout = "60s"
  insecure_skip_verify = true
  force_discover_on_init = true

  collect_concurrency = 5
  discover_concurrency = 5


## Historical instance
[[inputs.vsphere]]
 interval = "300s"

  vcenters = [ "https://vcenter-server-ip/sdk" ]
  username = "admin@vsphere.local"
  password = "ThisPassword"

  host_metric_exclude = ["*"] # Exclude realtime metrics
  vm_metric_exclude = ["*"] # Exclude realtime metrics

  insecure_skip_verify = true
  force_discover_on_init = true
  max_query_metrics = 256
  collect_concurrency = 3

Configure systemd

cp /usr/lib/systemd/system/telegraf.service /usr/lib/systemd/system/telegraf-vmware.service
sed -i 's/telegraf.conf/telegraf-vmware.conf/g' /usr/lib/systemd/system/telegraf-vmware.service

Startup Telegraf

systemctl daemon-reload
systemctl start telegraf-vmware
systemctl enable telegraf-vmware

Configure InfluxDB

Set the retention policy

[root@mm-mon ~]# influx -username admin -password dba4mis
Connected to http://localhost:8086 version 1.8.5
InfluxDB shell version: 1.8.5
> show retention policies on vmware
name    duration shardGroupDuration replicaN default
----    -------- ------------------ -------- -------
autogen 0s       168h0m0s           1        true
> alter retention policy "autogen" on "vmware" duration 200d shard duration 1d
> show retention policies on vmware
name    duration  shardGroupDuration replicaN default
----    --------  ------------------ -------- -------
autogen 4800h0m0s 24h0m0s            1        true

Configure Grafana

Add a datasource for InfluxDB
- Name: VMware
- Type: InfluxDB
- Database: vmware
- Username: <InfluxDB Credential>
- Password: <InfluxDB Credential>
Import the dashboards

FAQ

Q: 之後新增的 VM 不會出現在 Dashoboard。

A: 先確認 InfluxDB 是否已寫入新 VM 的 data，如果有，只要更新 Dashboard Settings > Variables > virtualmachine > 執行 Update，檢查 Preview of values 是否有出現新 VM name。

檢查 InfluxDB

# Check all current VM names
select DISTINCT("vmname") from (select "ready_summation","vmname" from "vsphere_vm_cpu" WHERE time > now() - 10m)

Q: Telegraf 錯誤訊息

[inputs.vsphere] Error in plugin: while collecting vm: ServerFaultCode: A specified parameter was not correct: querySpec[0].endTime

A: 確認是否包含以下參數

force_discover_on_init = true

Q: Issue: VMware vSphere - Overview

vCenter CPU/RAM 區塊沒有圖形顯示

A: 編輯區塊 > Flux language syntax

將 <vcenter-name> 改成實際的 vm 名稱

from(bucket: v.defaultBucket)
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "vsphere_vm_cpu")
  |> filter(fn: (r) => r["_field"] == "usage_average")
  |> filter(fn: (r) => r["vmname"] == "<vcenter-name>_vCenter")
  |> group(columns: ["vmname"])
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
  |> yield(name: "mean")

Cluster 選單無法正確顯示 cluster name

A: 編輯 Dashboard > Variables > clustername > Flux language syntax

from(bucket: v.defaultBucket)
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "vsphere_host_cpu")
  |> filter(fn: (r) => r["clustername"] != "")
  |> filter(fn: (r) => r["vcenter"] == "${vcenter}")
  |> keep(columns: ["clustername"])
  |> distinct(column: "clustername")
  |> group()

Method #2: SexiGraf

Official: http://www.sexigraf.fr/quickstart/
OS-based: Ubuntu 16.04.6 LTS

Download the OVA appliance

vCenter/vSphere Credential for monitor only

vCenter Web Client > 功能表 > 系統管理 > Single Sign On: 使用者與群組 > 新增

使用者名稱: winmon
密碼: xxxx
確認密碼: xxxx

vCenter Web Client > 功能表 > 主機與叢集 > 權限 > 新增權限

使用者: vsphere.local , 搜尋 winmon
角色: 唯讀
散佈到子係: 勾選

Deploy the OVA to vCenter/ESXi

部署到 ESXi 6.5 時失敗，錯誤訊息

Line 163: Unable to parse 'tools.syncTime' for attribute 'key' on element 'Config'.

解決方法: 使用 OVF-Tool 先解開 OVA 檔，編輯 OVF 檔的內容

# Before
<vmw:Config ovf:required="true"  vmw:key="tools.syncTime" vmw:value="true"/>

# After
<vmw:Config ovf:required="false"  vmw:key="tools.syncTime" vmw:value="true"/>

存檔後，重新再部署一次。

First to Start the VM

1. SSH Credential: root / Sex!Gr@f

2. Need to manually configure the IP, Edit the /etc/network/interfaces .

3. Configure the hostname

hostnamectl set-hostname esx-mon

4. Configure the timezone and time server

timedatectl set-timezone Asia/Taipei

vi /etc/ntp.conf

#pool 0.ubuntu.pool.ntp.org iburst
#pool 1.ubuntu.pool.ntp.org iburst
#pool 2.ubuntu.pool.ntp.org iburst
#pool 3.ubuntu.pool.ntp.org iburst

# Use Ubuntu's ntp server as a fallback.
#pool ntp.ubuntu.com

# Added the local time server
server 192.168.21.86 prefer iburst

Restart the ntpd

systemctl stop ntp
systemctl start ntp

# Check the timeserver
ntpq -p