!!! ELK
__ELK__ Elasticsearch Logstash Kibana.
[{TableOfContents }]
!! Resources
* [elk-docker instructions|http://elk-docker.readthedocs.org/]
* [filebeat (former logstash forwarder)|https://www.elastic.co/guide/en/beats/filebeat/current/index.html]
* [logstash grok patterns|https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/grok-patterns]
* ...
!! Installing Elasticsearch
First pull the docker image and run it:
%%collapsebox
__download and run the docker image__
{{{
docker pull sebp/elk
.....
Digest: sha256:8e250160ac22d339e57ba20768137dbeca2187c94082959220569a9318f85134
Status: Downloaded newer image for sebp/elk:latest
metskem@athena:~$ docker run -p 5601:5601 -p 9200:9200 -p 5000:5000 -it --name elk sebp/elk
* Starting Elasticsearch Server sysctl: setting key "vm.max_map_count": Read-only file system
[ OK ]
logstash started.
waiting for Elasticsearch to be up (1/30)
waiting for Elasticsearch to be up (2/30)
waiting for Elasticsearch to be up (3/30)
waiting for Elasticsearch to be up (4/30)
waiting for Elasticsearch to be up (5/30)
waiting for Elasticsearch to be up (6/30)
waiting for Elasticsearch to be up (7/30)
waiting for Elasticsearch to be up (8/30)
* Starting Kibana4 [ OK ]
[2015-11-14 14:42:40,076][INFO ][node ] [Ardroman] initialized
[2015-11-14 14:42:40,077][INFO ][node ] [Ardroman] starting ...
[2015-11-14 14:42:40,141][WARN ][common.network ] [Ardroman] publish address: {0.0.0.0} is a wildcard address, falling back to first non-loopback: {172.17.1.56}
[2015-11-14 14:42:40,141][INFO ][transport ] [Ardroman] publish_address {172.17.1.56:9300}, bound_addresses {[::]:9300}
[2015-11-14 14:42:40,197][INFO ][discovery ] [Ardroman] elasticsearch/SGBOqCisRoK5aXakkplosQ
[2015-11-14 14:42:43,259][INFO ][cluster.service ] [Ardroman] new_master {Ardroman}{SGBOqCisRoK5aXakkplosQ}{172.17.1.56}{172.17.1.56:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2015-11-14 14:42:43,335][WARN ][common.network ] [Ardroman] publish address: {0.0.0.0} is a wildcard address, falling back to first non-loopback: {172.17.1.56}
[2015-11-14 14:42:43,336][INFO ][http ] [Ardroman] publish_address {172.17.1.56:9200}, bound_addresses {[::]:9200}
[2015-11-14 14:42:43,336][INFO ][node ] [Ardroman] started
[2015-11-14 14:42:43,337][INFO ][gateway ] [Ardroman] recovered [0] indices into cluster_state
[2015-11-14 14:42:55,965][INFO ][cluster.metadata ] [Ardroman] [.kibana] creating index, cause [api], templates [], shards [1]/[1], mappings [config]
[2015-11-14 14:45:40,093][INFO ][cluster.metadata ] [Ardroman] [logstash-2015.11.14] creating index, cause [auto(bulk api)], templates [logstash], shards [5]/[1], mappings [logs, _default_]
[2015-11-14 14:45:40,357][INFO ][cluster.metadata ] [Ardroman] [logstash-2015.11.14] update_mapping [logs]
[2015-11-14 14:46:53,017][INFO ][cluster.metadata ] [Ardroman] [.kibana] create_mapping [index-pattern]
[2015-11-14 14:47:42,004][INFO ][cluster.metadata ] [Ardroman] [.kibana] update_mapping [config]
[2015-11-14 14:48:50,680][INFO ][cluster.metadata ] [Ardroman] [.kibana] create_mapping [dashboard]
}}}
%%
The docker images also runs logstash, which we don't need now (see further), we will send the processed logs directly to elasticsearch.
Then install filebeat of an active (web)server to get some real logdate to process:
%%collapsebox
__download and install filebeat__
{{{
root@apollo:~# curl -L -O https://download.elastic.co/beats/filebeat/filebeat_1.0.0-rc1_i386.deb
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 3390k 100 3390k 0 0 1443k 0 0:00:02 0:00:02 --:--:-- 1443k
root@apollo:~# dpkg -i filebeat_1.0.0-rc1_i386.deb
Selecting previously unselected package filebeat.
(Reading database ... 190915 files and directories currently installed.)
Preparing to unpack filebeat_1.0.0-rc1_i386.deb ...
Unpacking filebeat (1.0.0~rc1) ...
Setting up filebeat (1.0.0~rc1) ...
Processing triggers for ureadahead (0.100.0-16) ...
root@apollo:~# dpkg --listfiles filebeat
/.
/usr
/usr/share
/usr/share/doc
/usr/share/doc/filebeat
/usr/share/doc/filebeat/changelog.Debian.gz
/usr/bin
/usr/bin/filebeat-god
/usr/bin/filebeat
/etc
/etc/filebeat
/etc/filebeat/filebeat.template.json
/etc/filebeat/filebeat.yml
/etc/init.d
/etc/init.d/filebeat
root@apollo:~#
}}}
%%
Then edit the /etc/filebeat/filebeat.yml file, set paths to /var/log/apache2/access.log, frequency to 3s and hosts: "athena:9200"
Next load the index template in Elasticsearch.
{{{
root@apollo:/etc/filebeat# curl -XPUT 'http://athena:9200/_template/filebeat?pretty' -d@/etc/filebeat/filebeat.template.json
{
"acknowledged" : true
}
root@apollo:/etc/filebeat#
}}}
And start filebeat:
{{{
root@apollo:/etc/filebeat# /etc/init.d/filebeat start
root@apollo:/var/log# ps -ef|grep filebeat|grep -v grep
root 6672 1 0 16:18 pts/1 00:00:00 /usr/bin/filebeat-god -r / -n -p /var/run/filebeat.pid -- /usr/bin/filebeat -c /etc/filebeat/filebeat.yml
root 6673 6672 4 16:18 pts/1 00:00:04 /usr/bin/filebeat -c /etc/filebeat/filebeat.yml
}}}
Finally (not documented at [filebeat|https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-getting-started.html],
but add an extra ** filebeat-* ** index to elastic search (basically copy from the default logstash-* index), ==> settings ==> Indices ==> Create new)
The net result of the above actions is that we do get data in elasticsearch, but all loglines are stored as one field called ''message''. \\
What we want is that the apache logfile is parsed and we store alle fields (clientip, request, response code and so on) be stored in elasticsearch.\\
I spent several hours to find out how this should be done with filebeat, but could not find it, I guess it must be something with the filebeat.template.json.\\
Anyways, I continued with the classic logstash, see next chapter.
!! Installing (classic) logstash
I first installed the [logstash deb|https://download.elastic.co/logstash/logstash/packages/debian/logstash_2.0.0-1_all.deb], and next created the following logstash config file :
! logstash.conf examples
%%collapsebox
__logstash.conf__
%%prettify
{{{
input {
file {
path => "/var/log/apache2/access.log"
type => "apache2"
}
}
filter {
grok {
match => { "message" => "%{IPORHOST:clientip} %{HTTPDUSER:ident} \[%{HTTPDATE:timestamp}\] %{NUMBER:timetaken} \"%{IPORHOST:vhost}\" \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent}" }
add_field => [ "received_at", "%{@timestamp}" ]
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
geoip {
source => "clientip"
target => "geoip"
database => "/etc/logstash/GeoLiteCity.dat"
add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ]
}
}
output {
elasticsearch { hosts => ["athena:9200","10.0.0.162:9200"] }
}
}}}
%%
%%
%%collapsebox
__logstash.conf__
%%prettify
{{{
input { stdin { } }
filter {
grok {
match => { "message" => "\"(?:%{IPORHOST:clientip}|-)\" %{IPORHOST:vhost} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{NUMBER:timetaken} %{QS:sessionid} %{QS:remove1} %{QS:referrer} %{QS:useragent}" }
add_field => [ "received_at", "%{@timestamp}" ]
add_field => [ "source", "websphere" ]
remove_field => [ "%{remove1}" ]
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
mutate {
convert => { "bytes" => "integer" }
convert => { "timetaken" => "integer" }
}
geoip {
source => "clientip"
target => "geoip"
database => "/etc/logstash/GeoLiteCity.dat"
add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ]
}
useragent {
source => "useragent"
}
}
output {
elasticsearch { hosts => ["localhost:9200"] }
# stdout { codec => rubydebug }
}
}}}
%%
%%
%%collapsebox
__logstash.conf__
%%prettify
{{{
input { stdin { } }
filter {
grok {
match => { "message" => "%{IPORHOST:clientip} %{IPORHOST:vhost} %{NOTSPACE:remove1} %{NOTSPACE:remove2} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{NUMBER:timetaken} %{NOTSPACE:remove3} %{NUMBER:keepalivenr} %{QS:referrer} %{QS:useragent} (?:%{PATH:filename}|-)" }
add_field => [ "received_at", "%{@timestamp}" ]
add_field => [ "source", "statics" ]
remove_field => [ "%{remove1}","%{remove2}","%{remove3}" ]
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
mutate {
convert => { "bytes" => "integer" }
convert => { "timetaken" => "integer" }
convert => { "keepalivenr" => "integer" }
}
geoip {
source => "clientip"
target => "geoip"
database => "/etc/logstash/GeoLiteCity.dat"
add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ]
}
useragent {
source => "useragent"
}
}
output {
elasticsearch { hosts => ["localhost:9200"] }
# stdout { codec => rubydebug }
}
}}}
%%
%%
Also first download a ["GeoLiteCity DB"|http://geolite.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz] and unzip it to /etc/logstash.
Make the user logstash part of the {{adm}} group (so it can read logfiles) and restart: ''/etc/init.d/logstash restart'' and there we have an logstash-* index in elasticsearch with all requested fields, hurray !
!! Elasticsearch URLs
! Delete an index
{{{
curl -XDELETE 'http://localhost:9200/logstash-*/'
}}}
! Index a document
{{{
curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
"user" : "kimchy",
"post_date" : "2015-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}'
}}}
Add and index a document to the twitter index, with documentid 1