Dirty Harry Wiki: Comment: ELK

ELK#

ELK Elasticsearch Logstash Kibana.

ELK
Resources
Installing Elasticsearch
Installing (classic) logstash
logstash.conf examples
Elasticsearch URLs
Delete an index
Index a document

Resources#

Installing Elasticsearch#

First pull the docker image and run it:

download and run the docker image

docker pull sebp/elk
.....
Digest: sha256:8e250160ac22d339e57ba20768137dbeca2187c94082959220569a9318f85134
Status: Downloaded newer image for sebp/elk:latest
metskem@athena:~$ docker run -p 5601:5601 -p 9200:9200 -p 5000:5000 -it --name elk sebp/elk
 * Starting Elasticsearch Server                                                                                                                                                                sysctl: setting key "vm.max_map_count": Read-only file system
                                                                                                                                                                                         [ OK ]
logstash started.
waiting for Elasticsearch to be up (1/30)
waiting for Elasticsearch to be up (2/30)
waiting for Elasticsearch to be up (3/30)
waiting for Elasticsearch to be up (4/30)
waiting for Elasticsearch to be up (5/30)
waiting for Elasticsearch to be up (6/30)
waiting for Elasticsearch to be up (7/30)
waiting for Elasticsearch to be up (8/30)
 * Starting Kibana4                                                                                                                                                                      [ OK ] 
[2015-11-14 14:42:40,076][INFO ][node                     ] [Ardroman] initialized
[2015-11-14 14:42:40,077][INFO ][node                     ] [Ardroman] starting ...
[2015-11-14 14:42:40,141][WARN ][common.network           ] [Ardroman] publish address: {0.0.0.0} is a wildcard address, falling back to first non-loopback: {172.17.1.56}
[2015-11-14 14:42:40,141][INFO ][transport                ] [Ardroman] publish_address {172.17.1.56:9300}, bound_addresses {[::]:9300}
[2015-11-14 14:42:40,197][INFO ][discovery                ] [Ardroman] elasticsearch/SGBOqCisRoK5aXakkplosQ
[2015-11-14 14:42:43,259][INFO ][cluster.service          ] [Ardroman] new_master {Ardroman}{SGBOqCisRoK5aXakkplosQ}{172.17.1.56}{172.17.1.56:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2015-11-14 14:42:43,335][WARN ][common.network           ] [Ardroman] publish address: {0.0.0.0} is a wildcard address, falling back to first non-loopback: {172.17.1.56}
[2015-11-14 14:42:43,336][INFO ][http                     ] [Ardroman] publish_address {172.17.1.56:9200}, bound_addresses {[::]:9200}
[2015-11-14 14:42:43,336][INFO ][node                     ] [Ardroman] started
[2015-11-14 14:42:43,337][INFO ][gateway                  ] [Ardroman] recovered [0] indices into cluster_state
[2015-11-14 14:42:55,965][INFO ][cluster.metadata         ] [Ardroman] [.kibana] creating index, cause [api], templates [], shards [1]/[1], mappings [config]
[2015-11-14 14:45:40,093][INFO ][cluster.metadata         ] [Ardroman] [logstash-2015.11.14] creating index, cause [auto(bulk api)], templates [logstash], shards [5]/[1], mappings [logs, _default_]
[2015-11-14 14:45:40,357][INFO ][cluster.metadata         ] [Ardroman] [logstash-2015.11.14] update_mapping [logs]
[2015-11-14 14:46:53,017][INFO ][cluster.metadata         ] [Ardroman] [.kibana] create_mapping [index-pattern]
[2015-11-14 14:47:42,004][INFO ][cluster.metadata         ] [Ardroman] [.kibana] update_mapping [config]
[2015-11-14 14:48:50,680][INFO ][cluster.metadata         ] [Ardroman] [.kibana] create_mapping [dashboard]

The docker images also runs logstash, which we don't need now (see further), we will send the processed logs directly to elasticsearch.

Then install filebeat of an active (web)server to get some real logdate to process:

download and install filebeat

root@apollo:~# curl -L -O https://download.elastic.co/beats/filebeat/filebeat_1.0.0-rc1_i386.deb
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 3390k  100 3390k    0     0  1443k      0  0:00:02  0:00:02 --:--:-- 1443k
 
root@apollo:~# dpkg -i filebeat_1.0.0-rc1_i386.deb 
Selecting previously unselected package filebeat.
(Reading database ... 190915 files and directories currently installed.)
Preparing to unpack filebeat_1.0.0-rc1_i386.deb ...
Unpacking filebeat (1.0.0~rc1) ...
Setting up filebeat (1.0.0~rc1) ...
Processing triggers for ureadahead (0.100.0-16) ...

root@apollo:~# dpkg --listfiles filebeat
/.
/usr
/usr/share
/usr/share/doc
/usr/share/doc/filebeat
/usr/share/doc/filebeat/changelog.Debian.gz
/usr/bin
/usr/bin/filebeat-god
/usr/bin/filebeat
/etc
/etc/filebeat
/etc/filebeat/filebeat.template.json
/etc/filebeat/filebeat.yml
/etc/init.d
/etc/init.d/filebeat
root@apollo:~#

Then edit the /etc/filebeat/filebeat.yml file, set paths to /var/log/apache2/access.log, frequency to 3s and hosts: "athena:9200"

Next load the index template in Elasticsearch.

root@apollo:/etc/filebeat# curl -XPUT 'http://athena:9200/_template/filebeat?pretty' -d@/etc/filebeat/filebeat.template.json
{
  "acknowledged" : true
}
root@apollo:/etc/filebeat#

And start filebeat:

root@apollo:/etc/filebeat# /etc/init.d/filebeat start
root@apollo:/var/log# ps -ef|grep filebeat|grep -v grep
root      6672     1  0 16:18 pts/1    00:00:00 /usr/bin/filebeat-god -r / -n -p /var/run/filebeat.pid -- /usr/bin/filebeat -c /etc/filebeat/filebeat.yml
root      6673  6672  4 16:18 pts/1    00:00:04 /usr/bin/filebeat -c /etc/filebeat/filebeat.yml

Finally (not documented at filebeat, but add an extra ** filebeat-* ** index to elastic search (basically copy from the default logstash-* index), ==> settings ==> Indices ==> Create new)

The net result of the above actions is that we do get data in elasticsearch, but all loglines are stored as one field called message.
What we want is that the apache logfile is parsed and we store alle fields (clientip, request, response code and so on) be stored in elasticsearch.
I spent several hours to find out how this should be done with filebeat, but could not find it, I guess it must be something with the filebeat.template.json.
Anyways, I continued with the classic logstash, see next chapter.

Installing (classic) logstash#

I first installed the logstash deb, and next created the following logstash config file :

logstash.conf examples #

logstash.conf

input {
  file {
    path => "/var/log/apache2/access.log"
    type => "apache2"
  }
}

filter {
  grok {
    match => { "message" => "%{IPORHOST:clientip} %{HTTPDUSER:ident} \[%{HTTPDATE:timestamp}\] %{NUMBER:timetaken} \"%{IPORHOST:vhost}\" \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent}" }
    add_field => [ "received_at", "%{@timestamp}" ]
  }
  date {
    match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
  }
  geoip {
     source => "clientip"
     target => "geoip"
     database => "/etc/logstash/GeoLiteCity.dat"
     add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
     add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}"  ]
   }
}

output {
  elasticsearch { hosts => ["athena:9200","10.0.0.162:9200"] }
}

logstash.conf

input { stdin { } }

filter {
  grok {
    match => { "message" => "\"(?:%{IPORHOST:clientip}|-)\" %{IPORHOST:vhost} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{NUMBER:timetaken} %{QS:sessionid} %{QS:remove1} %{QS:referrer} %{QS:useragent}" }
    add_field => [ "received_at", "%{@timestamp}" ]
    add_field => [ "source", "websphere" ]
    remove_field => [ "%{remove1}" ]
  }
  date {
    match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
  }
  mutate {
    convert => { "bytes" => "integer" }
    convert => { "timetaken" => "integer" }
  }
  geoip {
     source => "clientip"
     target => "geoip"
     database => "/etc/logstash/GeoLiteCity.dat"
     add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
     add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}"  ]
   }
   useragent {
     source => "useragent"
   }
}

output {
  elasticsearch { hosts => ["localhost:9200"] }
  # stdout { codec => rubydebug }
}

logstash.conf

input { stdin { } }

filter {
  grok {
    match => { "message" => "%{IPORHOST:clientip} %{IPORHOST:vhost} %{NOTSPACE:remove1} %{NOTSPACE:remove2} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{NUMBER:timetaken} %{NOTSPACE:remove3} %{NUMBER:keepalivenr} %{QS:referrer} %{QS:useragent} (?:%{PATH:filename}|-)" }
    add_field => [ "received_at", "%{@timestamp}" ]
    add_field => [ "source", "statics" ]
    remove_field => [ "%{remove1}","%{remove2}","%{remove3}" ]
  }
  date {
    match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
  }
  mutate {
    convert => { "bytes" => "integer" }
    convert => { "timetaken" => "integer" }
    convert => { "keepalivenr" => "integer" }
  }
  geoip {
     source => "clientip"
     target => "geoip"
     database => "/etc/logstash/GeoLiteCity.dat"
     add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
     add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}"  ]
   }
   useragent {
     source => "useragent"
   }
}

output {
  elasticsearch { hosts => ["localhost:9200"] }
  # stdout { codec => rubydebug }
}

Also first download a "GeoLiteCity DB" and unzip it to /etc/logstash.

Make the user logstash part of the adm group (so it can read logfiles) and restart: /etc/init.d/logstash restart and there we have an logstash-* index in elasticsearch with all requested fields, hurray !

Elasticsearch URLs#

Delete an index#

curl -XDELETE 'http://localhost:9200/logstash-*/'

Index a document#

curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
    "user" : "kimchy",
    "post_date" : "2015-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}'

Add and index a document to the twitter index, with documentid 1

Find & Replace

Match Case RegExp

Active Sessions	260
Uptime	6d, 12h 10m 31s
Number of pages	333

ELK#

Table of Contents

Resources#

Installing Elasticsearch#

Installing (classic) logstash#

logstash.conf examples #

Elasticsearch URLs#

Delete an index#

Index a document#