General programming blog: August 2015

Friday, August 28, 2015

Handling multi line files in Logstash

Logstash is a nice tool for processing your logs. Love it for its flexibility and variety of work flows, but this variety has downsides.

When you first try it out, everything seems to work fast, but real stuff begins when you start processing huge amounts of logs from many files. Probably at that point you will Google how to speed up Logstash and will find some suggestions to increase amount of workers to utilize CPU. Great, now we have multi threading. Unfortunately, now we also have thread safety issues. Or at least one big issue with multi line logs.

There are actually two issues around multi line processing in Logstash: https://github.com/logstash-plugins/logstash-filter-multiline/issues/12 and https://github.com/logstash-plugins/logstash-input-file/issues/44. First means that you can not use multi line filter as soon as you enable more workers. Second - each file requires own input configuration. In our system it means hundreds of files. This does not scale at all.

Probably at some point those issues will be resolved, but until then, following setup can boost performance of your Logstash.

Trick is to split processing into two phases - first join multi line entries and then parse them. This can be achieved by setting up 2 Logstash instances. First takes input from files, processes them with multi line filter and sends result to Redis. Second takes input from Redis, applies rest of filters and sends output to ElasticSearch. Due to mentioned issues, first instance is limited to 1 worker. Second can scale by adding more workers. Redis serves as buffer.

In our case, performance boost was about 4 times, from 60k to 200k log entries per minute, even without adding more workers to second instance. Also now we can add more rules for parsing logs. Unfortunately, looks like multi line still is the bottleneck and most probably we would have to introduce more multi line processing instances and split their responsibilities.

Sample Logstash configuration:

	input {
	file {
	path => [
	"/syslog/something/.log"
	]
	exclude => ["*.gz"]
	type => "service_syslog"
	start_position => "beginning"
	}
	}

	filter {
	multiline {
	pattern => "(^%{SYSLOG_PREFIX})"
	negate => true
	what => "previous"
	}
	}

	output {
	redis {
	host => ["redis"]
	data_type => "list"
	key => "logstash"
	}
	}

view raw logstash1.conf hosted with ❤ by GitHub

	input {
	redis {
	host => "redis"
	key => "logstash"
	data_type => "list"
	}
	}

	filter {
	##all kind of parsing
	}

	output {
	elasticsearch {
	host => "es"
	protocol => "transport"
	}
	}

view raw logstash2.conf hosted with ❤ by GitHub

Integration tests in SpringBoot without external dependencies

Target - test all layers of SpringBoot REST application in isolation from external components.

Complexity - typical application uses database and makes calls to remote services.

Solutions:

a) Use spring-test support for integration tests. It actually starts whole app for you on random port. At the same time, all components of app can be wired into test for additional manipulations.

b) Use RestAssured to make calls to our application

c) Use RestTemplate to make remote calls and MockServer from spring-test to mock them

d) Use in-memory database. HSQLDB does the simulating job pretty well.

e) Use separate Spring profile to tweak app configuration

Here is a small example putting it all together:

	import com.jayway.restassured.RestAssured
	import org.springframework.beans.factory.annotation.Autowired
	import org.springframework.beans.factory.annotation.Value
	import org.springframework.boot.test.SpringApplicationConfiguration
	import org.springframework.boot.test.WebIntegrationTest
	import org.springframework.test.context.ActiveProfiles
	import org.springframework.test.web.client.MockRestServiceServer
	import org.springframework.web.client.RestTemplate
	import spock.lang.Specification
	import static com.jayway.restassured.RestAssured.given
	import static com.jayway.restassured.http.ContentType.JSON
	import static org.springframework.http.HttpMethod.*
	import static org.springframework.http.MediaType.APPLICATION_JSON
	import static org.springframework.test.web.client.match.MockRestRequestMatchers.*
	import static org.springframework.test.web.client.response.MockRestResponseCreators.withSuccess

	@SpringApplicationConfiguration(classes = Application)
	@WebIntegrationTest(["server.port=0"]) //will actually start app on random port
	@ActiveProfiles(["integration"]) //adapt some configuration for tests
	class TrustlyPayoutSpec extends Specification {

	@Value('${local.server.port}')
	int port //random port chosen by spring test

	@Autowired
	Repository repository //we can wire any repository to run DB checks

	@Autowired
	RestTemplate remoteResourceRestTemplate //wire RestTemplate to instrument it with mock server

	MockRestServiceServer mockRemoteResource

	def setup() {
	mockRemoteResource = MockRestServiceServer.createServer(remoteResourceRestTemplate) //instrument RestTemplate
	RestAssured.port = port //configure RestAssured port
	}

	def "Some integration scenario"() {
	expect:
	//stub calls to remote resource using mock server
	mockRemoteResource.expect(requestTo("https://example.com"))
	.andExpect(method(PUT))
	.andExpect(jsonPath('$.state').value("TRANSFERRED"))
	.andRespond(withSuccess(
	""" {"status": "OK"} """
	, APPLICATION_JSON))

	//call our service using RestAssured
	given().contentType(JSON).body(""" { "name": "John Doe" } """)
	.when().post("/user")
	.then().statusCode(CREATED.value())

	Thread.sleep(5000) //give some time for JMS messaging and async processing

	User user = repository.findByName("John Doe") //check data in DB
	//some assertions on DB data

	mockRemoteResource.verify()
	}
	}

view raw IntegrationSpec.groovy hosted with ❤ by GitHub

Sunday, August 9, 2015

Hidden exceptions

This is the story about consequences of eliminating checked exceptions in Groovy. All observations made on relatively big project, around 60 developers.

Observation 1 - catch them all :)

In some cases it was Exception, in some Throwable (like you can recover from OutOfMemory). As a result, non-recoverable exceptions are often treated as recoverable and otherwise. They are logged with same level and escape bug tracking system.

Observation 2 - invent complex return types

If exceptions can't be used as class API, developers start inventing complex return types that wrap value together with response status. This had several consequences:

1) propagation of this status is a pain. Just imagine calling several methods and checking whether there was an issue after each call.

	class ResultPropagation {
	Result do() {
	Result result = doSomething()
	if (!result.isSuccess) {
	return result
	}

	result = doSomethingElse()
	if (!result.isSuccess) {
	return result
	}

	...

	return result
	}
	}

view raw ResultPropagation.groovy hosted with ❤ by GitHub

2) Many transaction definitions were broken. They were handled by Spring and were supposed to be rolled back on exception, but method just returns error code, no exception.

	class BrokenTransactionDefinition {

	@Transactional
	Result runInTransaction() {
	try {
	doSomething()
	} catch (Exception e) {
	return Result.failure()
	}
	}
	}

view raw BrokenTransactionDefinition.groovy hosted with ❤ by GitHub

3) No stack traces. Often this was a result of "catch them all -> return error code. The only way to figure out real cause is to debug.

As a conclusion, I would advice to become friends with exceptions and try to avoid cowboy style languages. There are definitely areas where Groovy rocks, like scripting, but don't try to push it everywhere. Keep in mind that coding is easy and does not take much time. Debugging does.