Difference between revisions of "Cluster logstash/Elasticsearch/Kibana"
(→WARN monitor.jvm) |
(→Optimise search Elasticsearch) |
||
Line 799: | Line 799: | ||
#> curl -XPOST 'http://localhost:9200/_optimize' | #> curl -XPOST 'http://localhost:9200/_optimize' | ||
+ | |||
+ | |||
+ | Redéfinir la valeur de "refresh_interval" : | ||
+ | |||
+ | Lors de l'utilisation d'un fichier de dictionnaire, ce paramètre indiquera à quelle fréquence (en secondes) la journalisation vérifiera le fichier de dictionnaire pour les mises à jour. | ||
+ | |||
+ | #> curl -XPUT "pplp0008:9200/unix-*/_settings" -d '{ | ||
+ | "index" : { | ||
+ | "refresh_interval" : "5s" | ||
+ | } | ||
+ | }' |
Revision as of 17:04, 31 May 2017
Présentation
- Logstash est une application de gestion d'événement et de log. Il est utilisable pour collecter traiter et stocker pour un usage future.
- Redis est un outils de stockage ou de cache clé-valeur, il peut être comparé à du NoSQL. Dans cette mise en œuvre, il est utilisé comme courtier de message ou file d'attente pour nourrir Logstash.
- Elasticsearch est un moteur d'indexation et de recherche libre.
- Kibana est l'interface de présentation et de visualisation des données.
Les sources
Les sources sont disponibles à l'heure ou j'écris ces lignes dans les versions et sous les liens suivants :
Infrastructure
Serveur | Fonction | VLAN | Utilisateur | Groupe |
---|---|---|---|---|
logstashRedis1 | Logstash (Shipper) et Redis | - | root | root |
logstashRedis2 | Logstash (Shipper) et Redis | - | root | root |
logstashElastic3 | Logstash (broker) et Elasticsearch | - | - | - |
logstashElastic4 | Logstash (broker) et Elasticsearch | - | - | - |
kibana5 | Kibana | - | root | root |
log-unix.tuxunix.fr | VIP | - | root | root |
elastic.tuxunix.fr | VIP | - | root | root |
- La VIP log-unix.tuxunix.fr redirige les flux en UDP uniquement vers logstashRedis1 et logstashRedis2 en RRB sur le port 514.
- La VIP elastic.tuxunix.fr redirige les flux en TCP vers logstashElastic3 et logstashElastic4 en Actif/backup sur le port 9200.
Logstash (Shipper)
- Version installé : 2.4.0
#> cd logstash/ && mkdir conf.d config logs tmp sources
Installation de java et logstash (shipper)
#> yum localinstall jdk-8-linux-x64.rpm #> cd logstash/ && tar xvzf logstash-2.4.0.tar.gz && ln -s logstash-2.4.0 current
- in_unix
Le résultat est le suivant :
input { syslog { type => unixlog port => 514 } }
- out_log
Le résultat est le suivant :
output { if [type] == "unixlog" { redis { data_type => list # string, one of ["list", "channel"] (optional) host => ["logstashElastic3","logstashElastic4"] # array (optional), default: ["127.0.0.1"] key => "unix" # string (optional) shuffle_hosts => true # boolean (optional), default: true #workers => 3 # number (optional), default: 1 } }
</pre>
- logstash.conf
#> echo '############################### # Default settings for logstash ############################### # Override Java location #JAVACMD=/usr/bin/java LS_USER=root LS_GROUP=root # Nice level LS_NICE=0' >logstash.conf
Redis
- Version installé : 3.2.3
Configuration Redis :
cat /soft/redis/config/redis.conf
Le résultat est le suivant :
protected-mode no port 6379 tcp-backlog 511 timeout 0 tcp-keepalive 300 daemonize no supervised no pidfile /var/run/redis_6379.pid loglevel notice logfile logs/redis.log databases 16 save 900 1 save 300 10 save 60 10000 stop-writes-on-bgsave-error yes rdbcompression yes rdbchecksum yes dbfilename dump.rdb dir /soft/redis/data/ slave-serve-stale-data yes slave-read-only yes repl-diskless-sync no repl-diskless-sync-delay 5 repl-disable-tcp-nodelay no slave-priority 100 maxclients 10000 maxmemory 32768mb appendonly no appendfilename "appendonly.aof" appendfsync everysec no-appendfsync-on-rewrite no auto-aof-rewrite-percentage 100 auto-aof-rewrite-min-size 64mb aof-load-truncated yes lua-time-limit 5000 slowlog-log-slower-than 10000 slowlog-max-len 128 latency-monitor-threshold 0 notify-keyspace-events "" hash-max-ziplist-entries 512 hash-max-ziplist-value 64 list-max-ziplist-size -2 list-compress-depth 0 set-max-intset-entries 512 zset-max-ziplist-entries 128 zset-max-ziplist-value 64 hll-sparse-max-bytes 3000 activerehashing yes client-output-buffer-limit normal 0 0 0 client-output-buffer-limit slave 256mb 64mb 60 client-output-buffer-limit pubsub 32mb 8mb 60 hz 10 aof-rewrite-incremental-fsync yes
Configuration système pour Redis
Pour redis il a fallu faire des modifications de certain paramètre système :
- modification du nombre max de "open file" fichier : '/etc/security/limits.conf'
# End of file root hard nofile 15032 root soft nofile 15032
- Modification du paramètre d'"overcommit_memory" fichier : '/etc/sysctl.conf'
vm.overcommit_memory = 1
- Prise en compte direct :
sysctl vm.overcommit_memory=1
Logstash (Indexer)
Configuration
- conf.d/in_unix :
Le résultat est le suivant :
input { redis { host => "logstashRedis1" data_type => "list" type => "redis-input" key => "unix" threads => 3 # number (optional), default: 1 } redis { host => "logstashRedis2" data_type => "list" type => "redis-input" key => "unix" threads => 3 # number (optional), default: 1 } } filter { if [type] == "unixlog" { mutate { add_field => [ "received_at", "%{@timestamp}" ] add_field => [ "ip", "%{host}" ] add_field => [ "hostname", "%{host}" ] } dns { reverse => [ "hostname" ] action => "replace" } syslog_pri { } date { match => [ "my_syslog_timestamp", "yyyy MMM d HH:mm:ss.SSS zzz", "MMM d yyyy HH:mm:ss.SSS zzz", "MMM d HH:mm:ss" ] timezone => "Europe/Paris" locale => "en" } mutate { remove_tag => [ "_grokparsefailure", "_grokparsefailure_sysloginput" ] } ################## DOMAIN #################### grok { match => { "hostname" => "(?<namehost>^[^.]*)(?<domainname>.*)" } } ################# PROGRAM ####################### if [program] == "CROND" { grok { patterns_dir => "conf.d/unix_pattern" match => { "message" => "\(%{USER:user_cron}\) %{CRON_ACTION:action_cron} \(%{DATA:message_cron}\)" } } } if [program] == "su" or [program] == "sshd" { grok { patterns_dir => "conf.d/unix_pattern" match => { "message" => "(?=%{GREEDYDATA:message})%{WORD:pam_module}\(%{DATA:pam_caller}\): session %{WORD:pam_session_state} for user %{USERNAME:username}(?: by %{GREEDYDATA:pam_by})?" } } } if [program] == "sshd" { grok { match => { "message" => [ "Invalid user %{USERNAME:username_auth} from %{IP:remote_addr_auth}", "Failed password for %{USERNAME:username_auth} from %{IP:remote_addr_auth} port %{POSINT:port_auth} ssh2", "Failed %{WORD:login_method_auth} for invalid user %{USERNAME:username_auth} from %{IP:remote_addr_auth} port %{POSINT:port_auth} ssh2", "pam_unix(sshd:auth): authentication failure; logname= uid=%{POSINT:uid_auth} euid=%{POSINT:euid_auth} tty=ssh ruser= rhost=%{IPORHOST:remote_addr_auth}(?: user=%{USERNAME:username_auth})?", "PAM %{POSINT} more authentication failures; logname= uid=%{POSINT:uid_auth} euid=%{POSINT:euid_auth} tty=ssh ruser= rhost=%{IPORHOST:remote_addr_auth}(?: user=%{USERNAME:username_auth})?", "Did not receive identification string from %{IPORHOST:remote_addr_auth}" ] } add_field => [ "failure_authentication", "true" ] } } if [program] == "ossec" { grok { patterns_dir => "conf.d/unix_pattern" match => { "message" => "%{OSSECLOGCOLLECTOR}" } } } ################# OOM KILLER ################# # if [message] =~ "Out of memory" { grok { patterns_dir => "conf.d/unix_pattern" match => { "message" => "%{INT:oom_pid} \(%{DATA:oom_processname}\) score %{INT:oom_score}" } } # } #geo location # geoip { # source => "hostname" # target => "geoip" # #add_field => "%{host}" # database => "current/vendor/bundle/jruby/1.9/gems/logstash-filter-geoip-2.0.7/vendor/GeoLiteCity-2013-01-18.dat" # add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ] # add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ] # } # mutate { # convert => [ "[geoip][coordinates]", "float"] # } } } filter { if [type] == "syslog" { mutate { remove_tag => [ "_grokparsefailure" ] } } }
- conf.d/out_file :
Le résultat est le suivant :
output { file { # codec => plain # codec (optional), default: "plain" # flush_interval => ... # number (optional), default: 2 # gzip => true # boolean (optional), default: false # max_size => 2048 # string (optional) # message_format => ... # string (optional) dir_mode => 0750 file_mode => 0640 path => "/archive/log/%{+YYYY-MM-dd}/%{hostname}_%{+YYYY-MM-dd}.txt" # workers => ... # number (optional), default: 1 } }
- conf.d/out_elasticsearch :
Le résultat est le suivant :
output { if [type] == "unixlog" { elasticsearch { hosts => ["logstashElastic3:9200","logstashElastic4:9200"] # string (optional) index => "unix-%{+YYYY.MM.dd}" # string (optional), default: "logstash-%{+YYYY.MM.dd}" workers => 3 # number (optional), default: 1 } } }
Elasticsearch
- Version installé : 2.4.0
Installation de elasticsearch
#> tar xvzf sources/elasticsearch-2.4.0.tar.gz && ln -s elasticsearch-2.4.0 current
- elasticsearch/current/config/elasticsearch.yml :
Le résultat est le suivant :
##################### Elasticsearch Configuration ##################### # # # cluster.name: elastic node.name: "logstashElastic3" network.host: x.x.x.x # Set a custom port for HTTP: http.port: 9200 path.conf: elasticsearch/config path.data: elasticsearch/data path.logs: elasticsearch/logs path.plugins: elasticsearch/current/plugins path.repo: ["elasticsearch/tmp/snapBase/"] bootstrap.mlockall: true indices.memory.index_buffer_size: 40% discovery.zen.ping.multicast.enabled: false discovery.zen.ping.unicast.hosts: ["logstashElastic3", "logstashElastic4:9300"] cluster.routing.allocation.disk.watermark.low" : "5g", cluster.routing.allocation.disk.watermark.high" : "3g" cluster.routing.allocation.node_concurrent_recoveries: "15" indices.recovery.max_bytes_per_sec": "200mb" indices.recovery.concurrent_streams: "15" threadpool.search.size: "2000"
- /soft/elasticsearch/current/config/logging.yml :
Le résultat est le suivant :
logger: # log action execution errors for easier debugging action: DEBUG # reduce the logging for aws, too much is logged under the default INFO com.amazonaws: WARN # gateway #gateway: DEBUG #index.gateway: DEBUG # peer shard recovery #indices.recovery: DEBUG # discovery #discovery: TRACE index.search.slowlog: TRACE, index_search_slow_log_file index.indexing.slowlog: TRACE, index_indexing_slow_log_file additivity: index.search.slowlog: false index.indexing.slowlog: false appender: console: type: console layout: type: consolePattern conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n" file: type: dailyRollingFile file: ${path.logs}/${cluster.name}.log datePattern: "'.'yyyy-MM-dd" layout: type: pattern conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n" index_search_slow_log_file: type: dailyRollingFile file: ${path.logs}/${cluster.name}_index_search_slowlog.log datePattern: "'.'yyyy-MM-dd" layout: type: pattern conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n" index_indexing_slow_log_file: type: dailyRollingFile file: ${path.logs}/${cluster.name}_index_indexing_slowlog.log datePattern: "'.'yyyy-MM-dd" layout: type: pattern conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"
- Elasticsearch optimisation :
echo 'elasticsearch - nofile 65536 elasticsearch - memlock unlimited root - memlock unlimited' >>/etc/security/limits.conf
- elasticsearch/elasticsearch.conf :
Le résultat est le suivant :
# Directory where the Elasticsearch binary distribution resides ES_HOME=elasticsearch/current # Heap Size (defaults to 256m min, 1g max) ES_HEAP_SIZE=30720m # Heap new generation #ES_HEAP_NEWSIZE= # max direct memory #ES_DIRECT_SIZE= # Additional Java OPTS #ES_JAVA_OPTS= ES_JAVA_OPTS="-Xss256m -Xms256m" # Maximum number of open files MAX_OPEN_FILES=65535 # Maximum amount of locked memory MAX_LOCKED_MEMORY=unlimited # Maximum number of VMA (Virtual Memory Areas) a process can own MAX_MAP_COUNT=262144 # Elasticsearch log directory LOG_DIR=elasticsearch/logs # Elasticsearch data directory DATA_DIR=base/elasticsearch/data # Elasticsearch work directory WORK_DIR=elasticsearch # Elasticsearch conf directory CONF_DIR=elasticsearch/config # Elasticsearch configuration file (elasticsearch.yml) CONF_FILE=elasticsearch/config/elasticsearch.yml # User to run as, change this to a specific elasticsearch user if possible # Also make sure, this user can write into the log directories in case you change them # This setting only works for the init script, but has to be configured separately for systemd startup ES_USER=elk # Configure restart on package upgrade (true, every other setting will lead to not restarting) #RESTART_ON_UPGRADE=true
Plugin
head-master
Ce plugin est un front web qui permet de parcourir et interagir avec un Cluster Elastic Search.
- Download last version here [1]
- Installer le plugin :
#> elasticsearch/current/bin/plugin install file:/tmp/elasticsearch-head-master.zip
- Accès via un navigateur :
http://elastic.tuxunix.fr:9200/_plugin/head/
liste des indexes
http://elastic.tuxunix.fr:9200/_cat/indices?v
Kibana
- Version installé : 4.6.1
Installation de Kibana :
#> cd /kibana-unix/ && tar xvzf sources/kibana-4.3.0-linux-x64.tgz && && ln -s kibana-4.3.0-linux-x64 current #> cd /etc/httpd/conf.d && mv ssl.conf kibana.conf
- /etc/httpd/conf.d/kibana.conf :
Le résultat est le suivant :
# LoadModule ssl_module modules/mod_ssl.so Listen 443 #Listen 80 SSLPassPhraseDialog builtin SSLSessionCache shmcb:/var/cache/mod_ssl/scache(512000) SSLSessionCacheTimeout 300 SSLMutex default SSLRandomSeed startup file:/dev/urandom 256 SSLRandomSeed connect builtin SSLCryptoDevice builtin ## ## SSL Virtual Host Context ## <VirtualHost x.x.x.x:443> ServerName kibana.tuxunix.fr:443 <Directory /kibana-unix/current> Options FollowSymLinks AllowOverride All </Directory> ProxyPass / http://localhost:5601/ ProxyPassReverse / http://localhost:5601/ ProxyPreserveHost On DocumentRoot kibana-unix/current/ ErrorLog kibana-unix/logs/error_log CustomLog kibana-unix/logs/access_log combined RewriteLog kibana-unix/logs/rewrite.log LogLevel warn SSLEngine on SSLProtocol all -SSLv2 SSLCipherSuite ALL:!ADH:!EXPORT:!SSLv2:RC4+RSA:+HIGH:+MEDIUM:+LOW SSLCertificateFile /etc/pki/tls/certs/localhost.crt SSLCertificateKeyFile /etc/pki/tls/private/localhost.key <Files ~ "\.(cgi|shtml|phtml|php3?)$"> SSLOptions +StdEnvVars </Files> <Directory "/var/www/cgi-bin"> SSLOptions +StdEnvVars </Directory> SetEnvIf User-Agent ".*MSIE.*" \ nokeepalive ssl-unclean-shutdown \ downgrade-1.0 force-response-1.0 CustomLog logs/ssl_request_log \ "%t %h %{SSL_PROTOCOL}x %{SSL_CIPHER}x \"%r\" %b" </VirtualHost> </VirtualHost>
- /kibana-unix/current/config/kibana-unix.yml :
Le résultat est le suivant :
server.port: 5601 server.host: "0.0.0.0" elasticsearch.url: "http://elastic.tuxunix.fr:9200" elasticsearch.preserveHost: true kibana.index: ".kibana-unix" elasticsearch.requestTimeout: 300000 elasticsearch.shardTimeout: 0
- Accès via un navigateur :
https://kibana.tuxunix.fr/
Plugin Kibana (logtrail)
#> ./bin/kibana plugin -i logtrail -u file:///root/logtrail-4.x-0.1.7.tar.gz #> /etc/init.d/kibana restart
Configuration client (rsyslog)
- /etc/rsyslog.conf :
Le résultat est le suivant :
# rsyslog v5 configuration file # For more information see /usr/share/doc/rsyslog-*/rsyslog_conf.html # If you experience problems, see http://www.rsyslog.com/doc/troubleshoot.html #### MODULES #### $ModLoad imuxsock # provides support for local system logging (e.g. via logger command) $ModLoad imklog # provides kernel logging support (previously done by rklogd) #$ModLoad immark # provides --MARK-- message capability # Provides UDP syslog reception #$ModLoad imudp #$UDPServerRun 514 # Provides TCP syslog reception #$ModLoad imtcp #$InputTCPServerRun 514 #### GLOBAL DIRECTIVES #### # Use default timestamp format $ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat # File syncing capability is disabled by default. This feature is usually not required, # not useful and an extreme performance hit #$ActionFileEnableSync on # Include all config files in /etc/rsyslog.d/ $IncludeConfig /etc/rsyslog.d/*.conf #### RULES #### # Log all kernel messages to the console. # Logging much else clutters up the screen. #kern.* /dev/console # Log anything (except mail) of level info or higher. # Don't log private authentication messages! *.info;mail.none;authpriv.none;cron.none /var/log/messages # The authpriv file has restricted access. authpriv.* /var/log/secure # Log all the mail messages in one place. mail.* -/var/log/maillog # Log cron stuff cron.* /var/log/cron # Everybody gets emergency messages *.emerg * # Save news errors of level crit and higher in a special file. uucp,news.crit /var/log/spooler # Save boot messages also to boot.log local7.* /var/log/boot.log # ### begin forwarding rule ### # The statement between the begin ... end define a SINGLE forwarding # rule. They belong together, do NOT split them. If you create multiple # forwarding rules, duplicate the whole block! # Remote Logging (we use TCP for reliable delivery) # # An on-disk queue is created for this action. If the remote host is # down, messages are spooled to disk and sent when it is up again. #$WorkDirectory /var/lib/rsyslog # where to place spool files #$ActionQueueFileName fwdRule1 # unique name prefix for spool files #$ActionQueueMaxDiskSpace 1g # 1gb space limit (use as much as possible) #$ActionQueueSaveOnShutdown on # save messages to disk on shutdown #$ActionQueueType LinkedList # run asynchronously #$ActionResumeRetryCount -1 # infinite retries if host is down # remote host is: name/ip:port, e.g. 192.168.0.1:514, port optional *.* @log-unix.tuxunix.fr:514 # ### end of the forwarding rule ###
Troubleshooting
high disk watermak
- Cluster en warning les shards ne ce font plus. vous avez ce message d'erreur dans les logs :
[2016-12-15 08:00:04,015][WARN ][cluster.routing.allocation.decider] [logstashElastic3] high disk watermark [90%] exceeded on [mRNmcLoESx2sdrR-VIyTVw][logstashElastic3][/elasticsearch/data/gdl-\ elastic/nodes/0] free: 7.8gb[9.7%], shards will be relocated away from this node [2016-12-15 08:19:07,329][INFO ][cluster.routing.allocation.decider] [logstashElastic4] low disk watermark [85%] exceeded on [eVlpWLHJTvqzap3o_ghPNA][logstashElastic4][/elasticsearch/data/gdl-\ elastic/nodes/0] free: 9.1gb[11.4%], replicas will not be assigned to this node
Le problème s'agit de l'espace disque en effet pour allouer les shards correctement Elasticsearch vérifie si l'espace est disponible pour le shards avant de l'éxécuté. Celui-ci se base sur le pourcentage du disque monté. Il est possible de réévaluer les seuils :
# /etc/elasticsearch/elasticsearch.yml cluster.routing.allocation.disk.threshold_enabled: True cluster.routing.allocation.disk.watermark.low: 30gb cluster.routing.allocation.disk.watermark.high: 20gb
De façon dynamique :
curl -XPUT logstashElastic3:9200/_cluster/settings -d '{ "transient" : { "cluster.routing.allocation.disk.watermark.low" : "97%", "cluster.routing.allocation.disk.watermark.high" : "99%" } }'
Une fois l'espace disque augmenté et/ou les paramètres modifié le cluster rattrape sont retard sur les shards et revient au vert.
Pour contrôler (ou via le plugin head), le "active_shards_percent" devrait augmenter jusqu'a arrivé à 100% :
http://nomDuServer:9200/_cat/health?v epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent 1481794698 10:38:18 elastic yellow 2 2 1691 858 0 0 25 0 - 98.5%
WARN monitor.jvm
Log :
[WARN ][monitor.jvm ] [nameServer] [gc][old][1409294][4628] duration [16.9s], collections [1]/[16.9s], total [16.9s]/[23.3h], memory [29.2gb]->[27.1gb]/[29.7gb], \ all_pools {[young] [2.1gb]->[13.9mb]/[2.1gb]}{[survivor] [0b]->[0b]/[274.5mb]}{[old] [27.1gb]->[27.1gb]/[27.3gb]}
Le "ES_HEAP_SIZE" de java est pratiquement atteint, vérifier avec la commande ci-dessous ce qu'il reste en mémoire pour définir s'il y a besoin d'ajouter de la mémoire ou non.
#> http://nameServer:9200/_nodes/stats/jvm?pretty
J'ai rencontré le problème suite à une recherche Kibana extrement lourde, la mise en cache des documents était très important, la solution est de vider le cache utilisé, grâce à cette commande :
#> curl -XPOST 'http://nameServer:9200/_cache/clear' {"_shards":{"total":1624,"successful":1624,"failed":0}}
Afin de limiter la mémoire par recherche vous pouvez implémenter les paramètres suivants dans la conf ES :
indices.fielddata.cache.size: 20% indices.breaker.fielddata.limit: 40%
Many outOfMemory
Log :
[DEBUG][action.bulk] [xxxxxx] failed to execute [BulkShardRequest to [xxxxx-2017.03.23] containing [15] requests] on [[xxxx-2017.03.23][4]] [xxx-2017.03.23][[xxxxx-2017.03.23][4]] EngineClosedException[CurrentState[CLOSED] Closed]; nested: OutOfMemoryError[unable to create new native thread];
- Increase nproc :
#> echo "userElastic soft nproc unlimited" >>/etc/security/limits.d/90-nproc.conf
Log :
[2017-03-23 17:20:39,959][WARN ][threadpool] [xxxxxx] failed to run [threaded] org.elasticsearch.indices.recovery.RecoveryTarget$RecoveryRunner@7898d084 java.lang.OutOfMemoryError: unable to create new native thread
- Augmenter le nombre de treadpool :
curl -XPUT serverName:9200/_cluster/settings -d '{ "persistent" : { "threadpool.search.size" : 2000 } }'
Optimise search Elasticsearch
#> curl -XPOST 'http://localhost:9200/_optimize'
Redéfinir la valeur de "refresh_interval" :
Lors de l'utilisation d'un fichier de dictionnaire, ce paramètre indiquera à quelle fréquence (en secondes) la journalisation vérifiera le fichier de dictionnaire pour les mises à jour.
#> curl -XPUT "pplp0008:9200/unix-*/_settings" -d '{ "index" : { "refresh_interval" : "5s" } }'