Nagios is a very powerful tool that lets you monitor various parts of your infrastructure. It collects a lot of information which can be used to learn more about your infrastructure.
Logstash is a tool to process / pipe all types of events and logs. There are many output filters for this tool one of which is graphite.
I have found it difficult to find specific examples on doing just this by googling. So I thought I will put my results here so that it might help other to adopt these tools easily.
The Nagios Section
The nagios specific configurations that I have done to process performance data that nagios monitors is.
process_performance_data=1
host_perfdata_command=process-host-perfdata
service_perfdata_command=process-service-perfdata
host_perfdata_file=/var/log/nagios/host-perfdata
service_perfdata_file=/var/log/nagios/service-perfdata
host_perfdata_file_template=[HOSTPERFDATA]\t$TIMET$\t$HOSTNAME$\t$HOSTEXECUTIONTIME$\t$HOSTOUTPUT$\t$HOSTPERFDATA$
service_perfdata_file_template=[SERVICEPERFDATA]\t$TIMET$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$
These settings enable nagios to process the services and hosts performance data. It creates /var/log/nagios/service-perfdata.out file with data described in the template.
A sample Current Load service will look like
1408303233 localhost Current Load OK 1 HARD 0.004 0.103 OK - load average: 0.59, 0.51, 0.54 load1=0.590;5.000;10.000;0; load5=0.510;4.000;6.000;0; load15=0.540;3.000;4.000;0;
We can use logstash to pick up this data process them into various fields and send the appropriate fields to graphite.
The LogStash Section
An example logstash configuration will look like
input {
file {
type => "serviceperf"
path => "/var/log/nagios/service-perfdata.out"
}
}
filter {
if [type] == "serviceperf" {
grok {
match => [ "message" , "%{NUMBER:timestamp}\t%{HOST:server}\tCurrent Load\t%{WORD:state}\t%{GREEDYDATA} load average: %{NUMBER:load_avg_1m}, %{NUMBER:load_avg_5m}, %{NUMBER:load_avg_15m}"]
add_tag => ["cpu"]
}
date {
match => [ "timestamp", "UNIX" ]
}
}
}
output {
if "cpu" in [tags] {
graphite {
host => "localhost"
port => 2003
metrics => [ "%{server}.load_avg_1m","%{load_avg_1m}",
"%{server}.load_avg_5m","%{load_avg_5m}",
"%{server}.load_avg_15m","%{load_avg_15m}"]
}
}
}
The configuration has 3 sections the
1. The input section will process the data that is generated by nagios.
2. Filter section will match and convert the plain row of text to a json format with key value pairs and we can use these fields to capture the desired values.
3. Output section send the output through carbon cache to graphite with the names of server and the captured values.
I am not including any setup of individual components in this blog post as each of them could be a whole another post. If I get some comments about questions I might write about that.
A good tool I used to get to grok filter definitions is Grok debugger. Basically you put in the line you want to process and you can start building the filter that meets your needs. I in this example wanted to extract the load average fields from the nagios check and plot them through graphite which looks the the image below.
I will try to write about how to setup each component in the future.
Logstash is a tool to process / pipe all types of events and logs. There are many output filters for this tool one of which is graphite.
I have found it difficult to find specific examples on doing just this by googling. So I thought I will put my results here so that it might help other to adopt these tools easily.
The Nagios Section
The nagios specific configurations that I have done to process performance data that nagios monitors is.
process_performance_data=1
host_perfdata_command=process-host-perfdata
service_perfdata_command=process-service-perfdata
host_perfdata_file=/var/log/nagios/host-perfdata
service_perfdata_file=/var/log/nagios/service-perfdata
host_perfdata_file_template=[HOSTPERFDATA]\t$TIMET$\t$HOSTNAME$\t$HOSTEXECUTIONTIME$\t$HOSTOUTPUT$\t$HOSTPERFDATA$
service_perfdata_file_template=[SERVICEPERFDATA]\t$TIMET$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$
These settings enable nagios to process the services and hosts performance data. It creates /var/log/nagios/service-perfdata.out file with data described in the template.
A sample Current Load service will look like
1408303233 localhost Current Load OK 1 HARD 0.004 0.103 OK - load average: 0.59, 0.51, 0.54 load1=0.590;5.000;10.000;0; load5=0.510;4.000;6.000;0; load15=0.540;3.000;4.000;0;
We can use logstash to pick up this data process them into various fields and send the appropriate fields to graphite.
The LogStash Section
An example logstash configuration will look like
input {
file {
type => "serviceperf"
path => "/var/log/nagios/service-perfdata.out"
}
}
filter {
if [type] == "serviceperf" {
grok {
match => [ "message" , "%{NUMBER:timestamp}\t%{HOST:server}\tCurrent Load\t%{WORD:state}\t%{GREEDYDATA} load average: %{NUMBER:load_avg_1m}, %{NUMBER:load_avg_5m}, %{NUMBER:load_avg_15m}"]
add_tag => ["cpu"]
}
date {
match => [ "timestamp", "UNIX" ]
}
}
}
output {
if "cpu" in [tags] {
graphite {
host => "localhost"
port => 2003
metrics => [ "%{server}.load_avg_1m","%{load_avg_1m}",
"%{server}.load_avg_5m","%{load_avg_5m}",
"%{server}.load_avg_15m","%{load_avg_15m}"]
}
}
}
The configuration has 3 sections the
1. The input section will process the data that is generated by nagios.
2. Filter section will match and convert the plain row of text to a json format with key value pairs and we can use these fields to capture the desired values.
3. Output section send the output through carbon cache to graphite with the names of server and the captured values.
I am not including any setup of individual components in this blog post as each of them could be a whole another post. If I get some comments about questions I might write about that.
A good tool I used to get to grok filter definitions is Grok debugger. Basically you put in the line you want to process and you can start building the filter that meets your needs. I in this example wanted to extract the load average fields from the nagios check and plot them through graphite which looks the the image below.
Sample graphite graph |