Sunday, August 17, 2014

Using Logstash to process nagios performance data and sending it to Graphite

Nagios is a very powerful tool that lets you monitor various parts of your infrastructure. It collects a lot of information which can be used to learn more about your infrastructure.

Logstash is a tool to process / pipe all types of events and logs. There are many output filters for this tool one of which is graphite.

I have found it difficult to find specific examples on doing just this by googling. So I thought I will put my results here so that it might help other to adopt these tools easily.

The Nagios Section

The nagios specific configurations that I have done to process performance data that nagios monitors is.

process_performance_data=1

host_perfdata_command=process-host-perfdata
service_perfdata_command=process-service-perfdata


host_perfdata_file=/var/log/nagios/host-perfdata
service_perfdata_file=/var/log/nagios/service-perfdata

host_perfdata_file_template=[HOSTPERFDATA]\t$TIMET$\t$HOSTNAME$\t$HOSTEXECUTIONTIME$\t$HOSTOUTPUT$\t$HOSTPERFDATA$
service_perfdata_file_template=[SERVICEPERFDATA]\t$TIMET$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$


These settings enable nagios to process the services and hosts performance data. It creates /var/log/nagios/service-perfdata.out file with data described in the template.

A sample Current Load service will look like

1408303233      localhost       Current Load    OK      1       HARD    0.004   0.103   OK - load average: 0.59, 0.51, 0.54     load1=0.590;5.000;10.000;0; load5=0.510;4.000;6.000;0; load15=0.540;3.000;4.000;0;

We can use logstash to pick up this data process them into various fields and send the appropriate fields to graphite.

The LogStash Section
An example logstash configuration will look like

input {
        file {
                type => "serviceperf"
                path => "/var/log/nagios/service-perfdata.out"
        }
}
filter {
        if [type] == "serviceperf" {
                grok {
                        match => [ "message" , "%{NUMBER:timestamp}\t%{HOST:server}\tCurrent Load\t%{WORD:state}\t%{GREEDYDATA} load average: %{NUMBER:load_avg_1m}, %{NUMBER:load_avg_5m}, %{NUMBER:load_avg_15m}"]
                        add_tag => ["cpu"]
                }
                date {
                        match => [ "timestamp", "UNIX" ]
                }
        }
}
output {
        if  "cpu" in [tags] {
                graphite {
                        host => "localhost"
                        port => 2003
                        metrics => [ "%{server}.load_avg_1m","%{load_avg_1m}",
                                "%{server}.load_avg_5m","%{load_avg_5m}",
                                "%{server}.load_avg_15m","%{load_avg_15m}"]
                        }
        }
}



The configuration has 3 sections the
1. The input section will process the data that is generated by nagios.
2. Filter section will match and convert the plain row of text to a json format with key value pairs and we can use these fields to capture the desired values.
3. Output section send the output through carbon cache to graphite with the names of server and the captured values.

I am not including any setup of individual components in this blog post as each of them could be a whole another post. If I get some comments about questions I might write about that.

A good tool I used to get to grok filter definitions is Grok debugger. Basically you put in the line you want to process and you can start building the filter that meets your needs. I in this example wanted to extract the load average fields from the nagios check and plot them through graphite which looks the the image below.

Sample graphite graph
I will try to write about how to setup each component in the future.

Sunday, February 9, 2014

Two node PBX In A Flash (PIAF) Cluster using heartbeat and drbd

Install elrepo and epel on pbxin a flash both nodes.

# wget http://epel.mirror.freedomvoice.com/6/i386/epel-release-6-8.noarch.rpm
# yum install epel-release-6-8.noarch.rpm

# wget http://www.elrepo.org/elrepo-release-6-5.el6.elrepo.noarch.rpm
# yum install elrepo-release-6-5.el6.elrepo.noarch.rpm

Install DRBD

# yum install kmod-drbd83 drbd83-utils

Install heartbeat

# yum install heartbeat

Assuming the disk /dev/sdb is the one on both nodes to be mirrored.

create the file /etc/drbd.d/disk1.res with the following contents

resource disk1
{
startup {
wfc-timeout 30;
outdated-wfc-timeout 20;
degr-wfc-timeout 30;
}
net {
cram-hmac-alg sha1;
shared-secret sync_disk;
}
syncer {
rate 100M;
verify-alg sha1;
}
on node1 {
device /dev/drbd0;
disk /dev/sdb;
address ip-of-node1:7789;
meta-disk internal;
}
on node2 {
device /dev/drbd0;
disk /dev/sdb;
address ip-of-node2:7789;
meta-disk internal;
}
}



Create meta disk

#drbdadm create-md disk1


#service drbd start

On Node you want to make primary in this case node1
drbdadm -- --overwrite-data-of-peer primary disk1
To monitor the sync you can use
# watch "cat /proc/drbd"

Once the disk are synced  you should see something like this in
/proc/drbd

version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build32R6, 2013-09-27 15:59:12
 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:144600 nr:428 dw:145028 dr:32986 al:47 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0


Now make filesystem
# mkfs.ext4 /dev/drbd0

For some reason heartbeat does not work on ucast so I used mcast. Create your heartbeat configuration in /etc/ha.d/ha.cf as follows
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 300ms
deadtime 4
warntime 2
initdead 10
udpport 694
mcast eth0 239.1.2.3 694 1 0
#bcast eth0
auto_failback on
node node1 node2





Create haresource file
# cat /etc/ha.d/haresources
node1 drbddisk::disk1 Filesystem::/dev/drbd0::/disk1::ext4 mysqld
node1 cluster-ip/24/eth0/cluster-ip-broadcast-ip IPsrcaddr::cluster-ip asterisk httpd


Explanation
The first line is to make primary the drbd disk and mount it on the primary node and start MySQL
The 2nd line is to create a cluster-IP and send all outbound traffic through it and start asterisk (using amportal) and httpd.

Create authkeys file with permission 600 on both nodes
# dd if=/dev/urandom count=4 2>/dev/null | md5sum | cut -c1-32

# cat > /etc/ha.d/authkeys
auth 1
1 sha1

Create symlinks to mysqld and asterisk in /etc/ha.d/resources.d
# cd /etc/ha.d/resources.d
# ln -s /etc/init.d/mysqld
# ln -s /usr/local/sbin/amportal asterisk


Create a folder /disk1 so that when you start heartbeat the drbd disk is mounted
# mkdir /disk1

# service heartbeat start 


The above command will start mysqld asterisk and mount the drbd disk on /disk1


Stop mysqld and asterisk
Move /var/lib/mysql to /disk1/var/mysql
Move /var/lib/asterisk to /disk1/var/asterisk
Move /etc/asterisk to /disk1/etc/asterisk
Move /usr/lib/asterisk to /disk1/usr/asterisk
Move /tftpboot to /disk1/tftpboot
Move /var/spool/asterisk to /disk1/var/spool


on Node2 stop mysqld and asterisk
delete /var/lib/mysql /var/lib/asterisk /etc/asterisk /usr/lib/asterisk /tftpboot /var/spool/asterisk


Now create symlinks to the corresponding folders in disk1
example:
# cd /var/lib
# ln -s /disk1/var/mysql
# ln -s /disk1/var/asterisk
# cd /usr/lib
# ln -s /disk1/usr/asterisk
# cd /etc
# ln -s /disk1/etc/asterisk
# cd /
# ln -s /disk1/tftpboot
# cd /var/spool
# ln -s /disk1/var/spool/asterisk






If you have any other folders that need to be synced between two nodes just move them to the /disk1 and create symlinks on both nodes to point to it.

Disable mysqld and asterisk startup on boot using
# chkconfig mysqld off
# chkconfig asterisk off

Enable drbd to be started on boot on both nodes
# chkconfig drbd on

PIAF uses /etc/rc.local file to start asterisk so comment out the line  
/usr/local/sbin/amportal

Start heartbeat on second node. You have now a two node pbx in a flash cluster which works on the same database and same asterisk configuration. All you have to do is to use the cluster IP to manage the instance.

Thursday, January 16, 2014

Counting the total allocated disk space on a remote or local server.

If you are trying to count the total disk space allocated to a server local or remote the following would be useful

ssh user@server df -k | grep -v Filesystem|  awk '{print $2}' | paste -sd+ | bc 

the output will be total K of your disks.
grep -v Filesystem is to remove your header field for df -k command.

If you are looking for say a particular volume you could grep for it before awk.
For example you want to find the total allocated corresponding to a particular logical volume let say logVol1 you could use the following

ssh user@server df -k | grep logVol1| awk '{print $2}' | paste -sd+ | bc

If you want to do it on local server drop the ssh.