Nagios

Nagios是電腦系統和網絡監控程序,用於檢測主機和服務,當異常發生和解除時能提醒用戶;是基於GPLv2開發的開源軟體,可免費獲得及使用。 Nagios原名NetSaint,由Ethan Galstad開發並維護至今。

PNP4Nagios

官方首頁:https://www.pnp4nagios.org/

NOTE: 必須事先安裝及設定好 Nagios

開始安裝

 

# 相依性套件
yum install gcc perl-Time-HiRes rrdtool-perl make

tar xzf pnp4nagios-0.6.21.tar.gz
cd pnp4nagios-0.6.21
./configure 

如果 Nagios 的系統帳號與群組不是預設的 nagios,必須加上參數

./configure --with-nagios-user=icinga --with-nagios-group=icinga

如果出現以下訊息,表示 configure 完成。

*** Configuration summary for pnp4nagios-0.6.21 03-24-2013 ***

General Options:
------------------------- -------------------
Nagios user/group: nagios nagios
Install directory: /usr/local/pnp4nagios
HTML Dir: /usr/local/pnp4nagios/share
Config Dir: /usr/local/pnp4nagios/etc
Location of rrdtool binary: /usr/bin/rrdtool Version 1.3.8
RRDs Perl Modules: FOUND (Version 1.3008)
RRD Files stored in: /usr/local/pnp4nagios/var/perfdata
process_perfdata.pl Logfile: /usr/local/pnp4nagios/var/perfdata.log
Perfdata files (NPCD) stored in: /usr/local/pnp4nagios/var/spool

Web Interface Options:
------------------------- -------------------
HTML URL: http://localhost/pnp4nagios
Apache Config File: /etc/httpd/conf.d/pnp4nagios.conf


Review the options above for accuracy. If they look okay,
type 'make all' to compile.

開始編譯

make all
make fullinstall
設定 PNP4Nagios

編輯 /etc/httpd/conf.d/pnp4nagios.conf

...
AuthUserFile /etc/nagios/htpasswd.users <-- 將這行改成與 Nagio 設定相同

... 

瀏覽首頁:http://xxx.xxx.xxx.xxx/pnp4nagios/

如果頁面的內容沒有出現錯誤,將以下檔案作更名

mv /usr/local/pnp4nagios/share/install.php /usr/local/pnp4nagios/share/install.php.xxx
設定 Nagios

編輯 /etc/nagios/nagios.cfg

process_performance_data=1

service_perfdata_command=process-service-perfdata
host_perfdata_command=process-host-perfdata 

編輯 /etc/nagios/objects/commands.cfg

define command {
       command_name    process-host-perfdata
       command_line    /usr/bin/perl /usr/local/pnp4nagios/libexec/process_perfdata.pl -d HO
STPERFDATA
}


define command {
       command_name    process-service-perfdata
       command_line    /usr/bin/perl /usr/local/pnp4nagios/libexec/process_perfdata.pl
}

編輯 /etc/nagios/objects/templates.cfg

將 generic-host 與 generic-service 的 process_perf_data 改為 0,否則預設為 1 時,所有的 host 與 服務都會自動啟用這功能。

define host {
   name                            generic-host
   ...
   process_perf_data 0
   ...
}

define service {
   name                            generic-service
   ...
   process_perf_data 0
   ...
}
對特定 host 或 service 啟用圖形功能

編輯 /etc/nagios/objects/MES-servers.cfg,在 host 或 service 的設定裡加上 process_perf_data 1

註:MES-server.cfg 是以筆者環境為例

define host {
   use                            generic-host
   host_name                      ap1
   ...
   process_perf_data              1
   action_url                     /pnp4nagios/index.php/graph?host=$HOSTNAME$&srv=_HOST_
   ...
}

define service {
   use                            generic-service
   host_name                      ap1
   service_description            PING 
   ...
   process_perf_data              1
   action_url                     /pnp4nagios/index.php/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
   ...
}
實際案例:DB2 的資料表空間使用率監控

MES-server.cfg:

define service{
        use                        generic-service
        host_name                  bdb1
        service_description        MMDB_MMTBS01
        contact_groups             adm-alang
        notifications_enabled      0
        check_command              check_db2_tbs_usage!-d MMDB -t MMTBS01 -u istflr -p istflr
        max_check_attempts         1
        normal_check_interval      60
        process_perf_data          1
        action_url                 /pnp4nagios/index.php/graph?host=$HOSTNAME$&srv=$SERVICEDESC$' class='tips' rel='/pnp4nagios/index.php/popup?host=$HOSTNAME$&srv=$SERVICEDESC$
        }

commands.cfg:

# 'check_db2_tbs_usage' command definition
define command{
        command_name    check_db2_tbs_usage
        command_line    sudo -u db2inst ksh -c "~/bin/db2_check_tbs_usage.sh $ARG1$ "
        }

~dn2inst/bin/db2_check_tbs_usage.sh:

#!/bin/ksh
##############################################################
# Author: Felipe Alkain de Souza
#
# Script Name: db2_check_tbs_usage.sh
#
# Functionality: This script checks DB2 tablespace utilization
#
# Usage: ./db2_check_tbs_usage.sh -d <database_name> -t <tbs_name> -u <db_user> -p <db_pass>
#
# Requisite settings:
# - visudo
#   #Defaults    requiretty <== comment out this line
#   nagios  ALL=(ALL)       NOPASSWD: ALL
#
# - Create DB2 catalog for the DBs that are monitored.
#
#
# Update:
# 2013/9/17   by A-Lang
#
##############################################################

. $HOME/sqllib/db2profile

### Nagios RCs Variables
STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
STATE_UNKNOWN=3

Usage() { echo "Usage: $0 [-d <databas_name>] [-t <tbs_name>] [-u <db_user>] [-p <db_pass>]"; }

ToUpper() {
    echo $1 | tr "[:lower:]" "[:upper:]"
}

GetOut(){
    db2 terminate > /dev/null 2>&1
    sleep 2
    exit $1
}

while getopts ":d:t:u:p:" o; do
    case "$o" in
        d)
            d=$OPTARG
            ;;
        t)
            t=$OPTARG
            ;;
        u)
            u=$OPTARG
            ;;
        p)
            p=$OPTARG
            ;;
        \?)
            echo "Invalid option: -$OPTARG"
            Usage
            ;;
        :)
            echo "Option -$OPTARG requires an argument."
            Usage
            ;;
    esac
done

if [ $OPTIND -ne 9 ]; then
    echo "Invalid options entered."
    Usage
    exit $STATE_UNKNOWN
fi


DB_NAME=$(ToUpper $d)
DB_TBS=$(ToUpper $t)
DB_USER=$u
DB_PASS=$p

db2 terminate > /dev/null 2>&1
db2 connect to $DB_NAME user $DB_USER using $DB_PASS > /dev/null 2>&1

if [ $? -ne 0 ]
then
    echo "DB2 CRITICAL - The database $DB_NAME did not connect!"
    GetOut $STATE_CRITICAL
fi


TBS_USAGE=`db2 -x "select '      ' CONCAT (SUBSTR(CHAR(DECIMAL(USED_PAGES, 10, 2)/ \
DECIMAL(TOTAL_PAGES,10,2)*100),9,5)) CONCAT '%' as PERCENT_USED \
from table (snapshot_tbs_cfg('${DB_NAME}', 0)) as t \
where TABLESPACE_TYPE=0 and TABLESPACE_NAME='${DB_TBS}'" | sed -e 's/%//g' -e 's/ //g'`
#echo $TBS_USAGE

if [ -z $TBS_USAGE -o $? -ne 0 ]
then
    echo "Unknown Tablespace $DB_TBS !"
    GetOut $STATE_UNKNOWN
fi


PERF_DATA="|'Disk Utilization'=${TBS_USAGE}%;90;95;"

if [ $TBS_USAGE -lt 90 ]; then
    echo "TABLESPACE OK - The database $DB_NAME is healthy now , the used disk space of the tablespace $DB_TBS is ${TBS_USAGE}% . $PERF_DATA"
    GetOut $STATE_OK

elif [ $TBS_USAGE -gt 90 -a $TBS_USAGE -lt 95 ]; then
    echo "TABLESPACE WARNING - The used disk space of the tablespace $DB_TBS is ${TBS_USAGE}%, crossing the threshold. $PERF_DATA"
    GetOut $STATE_WARNING

else
    echo "TABLESPACE CRITICAL - The used disk space of the tablespace $DB_TBS is ${TBS_USAGE}%, crossing the threshold. $PERF_DATA"
    GetOut $STATE_CRITICAL

fi


#db2 terminate
db2 terminate > /dev/null 2>&1
sleep 1
設定 popup 顯示(optional)

從 pnp4nagios 安裝程式裡複製 status-header.ssi

cp <pnp4nagios 原始程式目錄>/contrib/ssi/status-header.ssi /usr/share/nagios/ssi

NOTE:

此檔不可有執行的權限

/usr/share/nagios/ssi 此目錄會因為 nagios 安裝版本不同有所差異

編輯 /etc/nagios/objects/MES-servers.cfg,改變 action_url

define host {
   use                            generic-host
   host_name                      ap1
   ...
   process_perf_data              1
   action_url                     /pnp4nagios/index.php/graph?host=$HOSTNAME$&srv=_HOST_' class='tips' rel='/pnp4nagios/index.php/popup?host=$HOSTNAME$&srv=_HOST_
   ...
}

define service {
   use                            generic-service
   host_name                      ap1
   service_description            PING 
   ...
   process_perf_data              1
   action_url                     /pnp4nagios/index.php/graph?host=$HOSTNAME$&srv=$SERVICEDESC$' class='tips' rel='/pnp4nagios/index.php/popup?host=$HOSTNAME$&srv=$SERVICEDESC$
   ...
}
Performance Data 客制化

Performance Data 格式

label=value[UOM];[warning-range];[critical-range];[min];[max]

HTTP 輸出資訊範例

HTTP OK: HTTP/1.1 200 OK - 46869 bytes in 0.294 second response time | time=0.294561s;;;0 size=46869B;;;0

Tip: 資訊內容從 | 符號以後的就是 Performance Data

Performance Data 格式更多詳細資訊如下: