Finally (not documented at filebeat,
but add an extra ** filebeat-* ** index to elastic search (basically copy from the default logstash-* index), ==> settings ==> Indices ==> Create new)
The net result of the above actions is that we do get data in elasticsearch, but all loglines are stored as one field called message.
What we want is that the apache logfile is parsed and we store alle fields (clientip, request, response code and so on) be stored in elasticsearch.
I spent several hours to find out how this should be done with filebeat, but could not find it, I guess it must be something with the filebeat.template.json.
Anyways, I continued with the classic logstash, see next chapter.
Also first download a "GeoLiteCity DB" and unzip it to /etc/logstash.
Make the user logstash part of the adm group (so it can read logfiles) and restart: /etc/init.d/logstash restart and there we have an logstash-* index in elasticsearch with all requested fields, hurray !
ELK#
ELK Elasticsearch Logstash Kibana.
Table of Contents
Resources#
Installing Elasticsearch#
First pull the docker image and run it:
docker pull sebp/elk ..... Digest: sha256:8e250160ac22d339e57ba20768137dbeca2187c94082959220569a9318f85134 Status: Downloaded newer image for sebp/elk:latest metskem@athena:~$ docker run -p 5601:5601 -p 9200:9200 -p 5000:5000 -it --name elk sebp/elk * Starting Elasticsearch Server sysctl: setting key "vm.max_map_count": Read-only file system [ OK ] logstash started. waiting for Elasticsearch to be up (1/30) waiting for Elasticsearch to be up (2/30) waiting for Elasticsearch to be up (3/30) waiting for Elasticsearch to be up (4/30) waiting for Elasticsearch to be up (5/30) waiting for Elasticsearch to be up (6/30) waiting for Elasticsearch to be up (7/30) waiting for Elasticsearch to be up (8/30) * Starting Kibana4 [ OK ] [2015-11-14 14:42:40,076][INFO ][node ] [Ardroman] initialized [2015-11-14 14:42:40,077][INFO ][node ] [Ardroman] starting ... [2015-11-14 14:42:40,141][WARN ][common.network ] [Ardroman] publish address: {0.0.0.0} is a wildcard address, falling back to first non-loopback: {172.17.1.56} [2015-11-14 14:42:40,141][INFO ][transport ] [Ardroman] publish_address {172.17.1.56:9300}, bound_addresses {[::]:9300} [2015-11-14 14:42:40,197][INFO ][discovery ] [Ardroman] elasticsearch/SGBOqCisRoK5aXakkplosQ [2015-11-14 14:42:43,259][INFO ][cluster.service ] [Ardroman] new_master {Ardroman}{SGBOqCisRoK5aXakkplosQ}{172.17.1.56}{172.17.1.56:9300}, reason: zen-disco-join(elected_as_master, [0] joins received) [2015-11-14 14:42:43,335][WARN ][common.network ] [Ardroman] publish address: {0.0.0.0} is a wildcard address, falling back to first non-loopback: {172.17.1.56} [2015-11-14 14:42:43,336][INFO ][http ] [Ardroman] publish_address {172.17.1.56:9200}, bound_addresses {[::]:9200} [2015-11-14 14:42:43,336][INFO ][node ] [Ardroman] started [2015-11-14 14:42:43,337][INFO ][gateway ] [Ardroman] recovered [0] indices into cluster_state [2015-11-14 14:42:55,965][INFO ][cluster.metadata ] [Ardroman] [.kibana] creating index, cause [api], templates [], shards [1]/[1], mappings [config] [2015-11-14 14:45:40,093][INFO ][cluster.metadata ] [Ardroman] [logstash-2015.11.14] creating index, cause [auto(bulk api)], templates [logstash], shards [5]/[1], mappings [logs, _default_] [2015-11-14 14:45:40,357][INFO ][cluster.metadata ] [Ardroman] [logstash-2015.11.14] update_mapping [logs] [2015-11-14 14:46:53,017][INFO ][cluster.metadata ] [Ardroman] [.kibana] create_mapping [index-pattern] [2015-11-14 14:47:42,004][INFO ][cluster.metadata ] [Ardroman] [.kibana] update_mapping [config] [2015-11-14 14:48:50,680][INFO ][cluster.metadata ] [Ardroman] [.kibana] create_mapping [dashboard]The docker images also runs logstash, which we don't need now (see further), we will send the processed logs directly to elasticsearch.
Then install filebeat of an active (web)server to get some real logdate to process:
root@apollo:~# curl -L -O https://download.elastic.co/beats/filebeat/filebeat_1.0.0-rc1_i386.deb % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 3390k 100 3390k 0 0 1443k 0 0:00:02 0:00:02 --:--:-- 1443k root@apollo:~# dpkg -i filebeat_1.0.0-rc1_i386.deb Selecting previously unselected package filebeat. (Reading database ... 190915 files and directories currently installed.) Preparing to unpack filebeat_1.0.0-rc1_i386.deb ... Unpacking filebeat (1.0.0~rc1) ... Setting up filebeat (1.0.0~rc1) ... Processing triggers for ureadahead (0.100.0-16) ... root@apollo:~# dpkg --listfiles filebeat /. /usr /usr/share /usr/share/doc /usr/share/doc/filebeat /usr/share/doc/filebeat/changelog.Debian.gz /usr/bin /usr/bin/filebeat-god /usr/bin/filebeat /etc /etc/filebeat /etc/filebeat/filebeat.template.json /etc/filebeat/filebeat.yml /etc/init.d /etc/init.d/filebeat root@apollo:~#Then edit the /etc/filebeat/filebeat.yml file, set paths to /var/log/apache2/access.log, frequency to 3s and hosts: "athena:9200"
Next load the index template in Elasticsearch.
root@apollo:/etc/filebeat# curl -XPUT 'http://athena:9200/_template/filebeat?pretty' -d@/etc/filebeat/filebeat.template.json { "acknowledged" : true } root@apollo:/etc/filebeat#And start filebeat:
Finally (not documented at filebeat
,
but add an extra ** filebeat-* ** index to elastic search (basically copy from the default logstash-* index), ==> settings ==> Indices ==> Create new)
The net result of the above actions is that we do get data in elasticsearch, but all loglines are stored as one field called message.
What we want is that the apache logfile is parsed and we store alle fields (clientip, request, response code and so on) be stored in elasticsearch.
I spent several hours to find out how this should be done with filebeat, but could not find it, I guess it must be something with the filebeat.template.json.
Anyways, I continued with the classic logstash, see next chapter.
Installing (classic) logstash#
I first installed the logstash deb
, and next created the following logstash config file :
logstash.conf examples #
input { file { path => "/var/log/apache2/access.log" type => "apache2" } } filter { grok { match => { "message" => "%{IPORHOST:clientip} %{HTTPDUSER:ident} \[%{HTTPDATE:timestamp}\] %{NUMBER:timetaken} \"%{IPORHOST:vhost}\" \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent}" } add_field => [ "received_at", "%{@timestamp}" ] } date { match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ] } geoip { source => "clientip" target => "geoip" database => "/etc/logstash/GeoLiteCity.dat" add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ] add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ] } } output { elasticsearch { hosts => ["athena:9200","10.0.0.162:9200"] } }input { stdin { } } filter { grok { match => { "message" => "\"(?:%{IPORHOST:clientip}|-)\" %{IPORHOST:vhost} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{NUMBER:timetaken} %{QS:sessionid} %{QS:remove1} %{QS:referrer} %{QS:useragent}" } add_field => [ "received_at", "%{@timestamp}" ] add_field => [ "source", "websphere" ] remove_field => [ "%{remove1}" ] } date { match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ] } mutate { convert => { "bytes" => "integer" } convert => { "timetaken" => "integer" } } geoip { source => "clientip" target => "geoip" database => "/etc/logstash/GeoLiteCity.dat" add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ] add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ] } useragent { source => "useragent" } } output { elasticsearch { hosts => ["localhost:9200"] } # stdout { codec => rubydebug } }input { stdin { } } filter { grok { match => { "message" => "%{IPORHOST:clientip} %{IPORHOST:vhost} %{NOTSPACE:remove1} %{NOTSPACE:remove2} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{NUMBER:timetaken} %{NOTSPACE:remove3} %{NUMBER:keepalivenr} %{QS:referrer} %{QS:useragent} (?:%{PATH:filename}|-)" } add_field => [ "received_at", "%{@timestamp}" ] add_field => [ "source", "statics" ] remove_field => [ "%{remove1}","%{remove2}","%{remove3}" ] } date { match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ] } mutate { convert => { "bytes" => "integer" } convert => { "timetaken" => "integer" } convert => { "keepalivenr" => "integer" } } geoip { source => "clientip" target => "geoip" database => "/etc/logstash/GeoLiteCity.dat" add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ] add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ] } useragent { source => "useragent" } } output { elasticsearch { hosts => ["localhost:9200"] } # stdout { codec => rubydebug } }Also first download a "GeoLiteCity DB"
and unzip it to /etc/logstash.
Make the user logstash part of the adm group (so it can read logfiles) and restart: /etc/init.d/logstash restart and there we have an logstash-* index in elasticsearch with all requested fields, hurray !
Elasticsearch URLs#
Delete an index#
Index a document#
curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{ "user" : "kimchy", "post_date" : "2015-11-15T14:12:12", "message" : "trying out Elasticsearch" }'Add and index a document to the twitter index, with documentid 1