Friday, August 28, 2015

Handling multi line files in Logstash

Logstash is a nice tool for processing your logs. Love it for its flexibility and variety of work flows, but this variety has downsides.

When you first try it out, everything seems to work fast, but real stuff begins when you start processing huge amounts of logs from many files. Probably at that point you will Google how to speed up Logstash and will find some suggestions to increase amount of workers to utilize CPU. Great, now we have multi threading. Unfortunately, now we also have thread safety issues. Or at least one big issue with multi line logs.

There are actually two issues around multi line processing in Logstash: https://github.com/logstash-plugins/logstash-filter-multiline/issues/12 and https://github.com/logstash-plugins/logstash-input-file/issues/44. First means that you can not use multi line filter as soon as you enable more workers. Second - each file requires own input configuration. In our system it means hundreds of files. This does not scale at all.

Probably at some point those issues will be resolved, but until then, following setup can boost performance of your Logstash.

Trick is to split processing into two phases - first join multi line entries and then parse them. This can be achieved by setting up 2 Logstash instances. First takes input from files, processes them with multi line filter and sends result to Redis. Second takes input from Redis, applies rest of filters and sends output to ElasticSearch. Due to mentioned issues, first instance is limited to 1 worker. Second can scale by adding more workers. Redis serves as buffer.

In our case, performance boost was about 4 times, from 60k to 200k log entries per minute, even without adding more workers to second instance. Also now we can add more rules for parsing logs. Unfortunately, looks like multi line still is the bottleneck and most probably we would have to introduce more multi line processing instances and split their responsibilities.

Sample Logstash configuration:


Integration tests in SpringBoot without external dependencies

Target - test all layers of SpringBoot REST application in isolation from external components.

Complexity - typical application uses database and makes calls to remote services.

Solutions:

a) Use spring-test support for integration tests. It actually starts whole app for you on random port. At the same time, all components of app can be wired into test for additional manipulations.

b) Use RestAssured to make calls to our application

c) Use RestTemplate to make remote calls and MockServer from spring-test to mock them

d) Use in-memory database. HSQLDB does the simulating job pretty well.

e) Use separate Spring profile to tweak app configuration


Here is a small example putting it all together:


Sunday, August 9, 2015

Hidden exceptions

This is the story about consequences of eliminating checked exceptions in Groovy. All observations made on relatively big project, around 60 developers.

Observation 1 - catch them all :)

In some cases it was Exception, in some Throwable (like you can recover from OutOfMemory). As a result, non-recoverable exceptions are often treated as recoverable and otherwise. They are logged with same level and escape bug tracking system.

Observation 2 - invent complex return types


If exceptions can't be used as class API, developers start inventing complex return types that wrap value together with response status. This had several consequences: 

1) propagation of this status is a pain. Just imagine calling several methods and checking whether there was an issue after each call. 




2) Many transaction definitions were broken. They were handled by Spring and were supposed to be rolled back on exception, but method just returns error code, no exception.
3) No stack traces. Often this was a result of "catch them all -> return error code. The only way to figure out real cause is to debug.

As a conclusion, I would advice to become friends with exceptions and try to avoid cowboy style languages. There are definitely areas where Groovy rocks, like scripting, but don't try to push it everywhere. Keep in mind that coding is easy and does not take much time. Debugging does.