Friday, August 28, 2015

Handling multi line files in Logstash

Logstash is a nice tool for processing your logs. Love it for its flexibility and variety of work flows, but this variety has downsides.

When you first try it out, everything seems to work fast, but real stuff begins when you start processing huge amounts of logs from many files. Probably at that point you will Google how to speed up Logstash and will find some suggestions to increase amount of workers to utilize CPU. Great, now we have multi threading. Unfortunately, now we also have thread safety issues. Or at least one big issue with multi line logs.

There are actually two issues around multi line processing in Logstash: https://github.com/logstash-plugins/logstash-filter-multiline/issues/12 and https://github.com/logstash-plugins/logstash-input-file/issues/44. First means that you can not use multi line filter as soon as you enable more workers. Second - each file requires own input configuration. In our system it means hundreds of files. This does not scale at all.

Probably at some point those issues will be resolved, but until then, following setup can boost performance of your Logstash.

Trick is to split processing into two phases - first join multi line entries and then parse them. This can be achieved by setting up 2 Logstash instances. First takes input from files, processes them with multi line filter and sends result to Redis. Second takes input from Redis, applies rest of filters and sends output to ElasticSearch. Due to mentioned issues, first instance is limited to 1 worker. Second can scale by adding more workers. Redis serves as buffer.

In our case, performance boost was about 4 times, from 60k to 200k log entries per minute, even without adding more workers to second instance. Also now we can add more rules for parsing logs. Unfortunately, looks like multi line still is the bottleneck and most probably we would have to introduce more multi line processing instances and split their responsibilities.

Sample Logstash configuration:


input {
file {
path => [
"/syslog/something*/*.log"
]
exclude => ["*.gz"]
type => "service_syslog"
start_position => "beginning"
}
}
filter {
multiline {
pattern => "(^%{SYSLOG_PREFIX})"
negate => true
what => "previous"
}
}
output {
redis {
host => ["redis"]
data_type => "list"
key => "logstash"
}
}
view raw logstash1.conf hosted with ❤ by GitHub
input {
redis {
host => "redis"
key => "logstash"
data_type => "list"
}
}
filter {
##all kind of parsing
}
output {
elasticsearch {
host => "es"
protocol => "transport"
}
}
view raw logstash2.conf hosted with ❤ by GitHub

Integration tests in SpringBoot without external dependencies

Target - test all layers of SpringBoot REST application in isolation from external components.

Complexity - typical application uses database and makes calls to remote services.

Solutions:

a) Use spring-test support for integration tests. It actually starts whole app for you on random port. At the same time, all components of app can be wired into test for additional manipulations.

b) Use RestAssured to make calls to our application

c) Use RestTemplate to make remote calls and MockServer from spring-test to mock them

d) Use in-memory database. HSQLDB does the simulating job pretty well.

e) Use separate Spring profile to tweak app configuration


Here is a small example putting it all together:


import com.jayway.restassured.RestAssured
import org.springframework.beans.factory.annotation.Autowired
import org.springframework.beans.factory.annotation.Value
import org.springframework.boot.test.SpringApplicationConfiguration
import org.springframework.boot.test.WebIntegrationTest
import org.springframework.test.context.ActiveProfiles
import org.springframework.test.web.client.MockRestServiceServer
import org.springframework.web.client.RestTemplate
import spock.lang.Specification
import static com.jayway.restassured.RestAssured.given
import static com.jayway.restassured.http.ContentType.JSON
import static org.springframework.http.HttpMethod.*
import static org.springframework.http.MediaType.APPLICATION_JSON
import static org.springframework.test.web.client.match.MockRestRequestMatchers.*
import static org.springframework.test.web.client.response.MockRestResponseCreators.withSuccess
@SpringApplicationConfiguration(classes = Application)
@WebIntegrationTest(["server.port=0"]) //will actually start app on random port
@ActiveProfiles(["integration"]) //adapt some configuration for tests
class TrustlyPayoutSpec extends Specification {
@Value('${local.server.port}')
int port //random port chosen by spring test
@Autowired
Repository repository //we can wire any repository to run DB checks
@Autowired
RestTemplate remoteResourceRestTemplate //wire RestTemplate to instrument it with mock server
MockRestServiceServer mockRemoteResource
def setup() {
mockRemoteResource = MockRestServiceServer.createServer(remoteResourceRestTemplate) //instrument RestTemplate
RestAssured.port = port //configure RestAssured port
}
def "Some integration scenario"() {
expect:
//stub calls to remote resource using mock server
mockRemoteResource.expect(requestTo("https://example.com"))
.andExpect(method(PUT))
.andExpect(jsonPath('$.state').value("TRANSFERRED"))
.andRespond(withSuccess(
""" {"status": "OK"} """
, APPLICATION_JSON))
//call our service using RestAssured
given().contentType(JSON).body(""" { "name": "John Doe" } """)
.when().post("/user")
.then().statusCode(CREATED.value())
Thread.sleep(5000) //give some time for JMS messaging and async processing
User user = repository.findByName("John Doe") //check data in DB
//some assertions on DB data
mockRemoteResource.verify()
}
}

Sunday, August 9, 2015

Hidden exceptions

This is the story about consequences of eliminating checked exceptions in Groovy. All observations made on relatively big project, around 60 developers.

Observation 1 - catch them all :)

In some cases it was Exception, in some Throwable (like you can recover from OutOfMemory). As a result, non-recoverable exceptions are often treated as recoverable and otherwise. They are logged with same level and escape bug tracking system.

Observation 2 - invent complex return types


If exceptions can't be used as class API, developers start inventing complex return types that wrap value together with response status. This had several consequences: 

1) propagation of this status is a pain. Just imagine calling several methods and checking whether there was an issue after each call. 
class ResultPropagation {
Result do() {
Result result = doSomething()
if (!result.isSuccess) {
return result
}
result = doSomethingElse()
if (!result.isSuccess) {
return result
}
...
return result
}
}




2) Many transaction definitions were broken. They were handled by Spring and were supposed to be rolled back on exception, but method just returns error code, no exception.
class BrokenTransactionDefinition {
@Transactional
Result runInTransaction() {
try {
doSomething()
} catch (Exception e) {
return Result.failure()
}
}
}
3) No stack traces. Often this was a result of "catch them all -> return error code. The only way to figure out real cause is to debug.

As a conclusion, I would advice to become friends with exceptions and try to avoid cowboy style languages. There are definitely areas where Groovy rocks, like scripting, but don't try to push it everywhere. Keep in mind that coding is easy and does not take much time. Debugging does.