Logstash is a nice tool for processing your logs. Love it for its flexibility and variety of work flows, but this variety has downsides.
In our case, performance boost was about 4 times, from 60k to 200k log entries per minute, even without adding more workers to second instance. Also now we can add more rules for parsing logs. Unfortunately, looks like multi line still is the bottleneck and most probably we would have to introduce more multi line processing instances and split their responsibilities.
Sample Logstash configuration:
When you first try it out, everything seems to work fast, but real stuff begins when you start processing huge amounts of logs from many files. Probably at that point you will Google how to speed up Logstash and will find some suggestions to increase amount of workers to utilize CPU. Great, now we have multi threading. Unfortunately, now we also have thread safety issues. Or at least one big issue with multi line logs.
There are actually two issues around multi line processing in Logstash: https://github.com/logstash-plugins/logstash-filter-multiline/issues/12 and https://github.com/logstash-plugins/logstash-input-file/issues/44. First means that you can not use multi line filter as soon as you enable more workers. Second - each file requires own input configuration. In our system it means hundreds of files. This does not scale at all.
Probably at some point those issues will be resolved, but until then, following setup can boost performance of your Logstash.
Trick is to split processing into two phases - first join multi line entries and then parse them. This can be achieved by setting up 2 Logstash instances. First takes input from files, processes them with multi line filter and sends result to Redis. Second takes input from Redis, applies rest of filters and sends output to ElasticSearch. Due to mentioned issues, first instance is limited to 1 worker. Second can scale by adding more workers. Redis serves as buffer.
In our case, performance boost was about 4 times, from 60k to 200k log entries per minute, even without adding more workers to second instance. Also now we can add more rules for parsing logs. Unfortunately, looks like multi line still is the bottleneck and most probably we would have to introduce more multi line processing instances and split their responsibilities.
Sample Logstash configuration: