Introducing Pynesis - a checkpointing abstraction library for AWS Kinesis

This is a long overdue post about a library we open sourced earlier this year. TLDR; it's called Pynesis (pronounced as you would pronounce Kinesis but with a P, despite some other suggestions) and as maybe you would guess, it is a pure Python high-level library for Amazon Kinesis.

The long story.

Earlier this year we started to rebuild a major, quite important part of ticketea, a project that we are still working on quite hard. It actually was the subject of a talk that my team mate Kartones gave at PyconES 2017.

For this project, we needed to share the tickets information stored in our core system with a new service being written. We evaluated several options and eventually decided to publish the tickets information and related changes in a Kinesis Stream, so that not only our new service could access this data, but also any other service in the future.

Given that we mostly use Python in the backend for new services, we started to investigate how to use Kinesis with Python. To our surprise, AWS recommends using a Java library called KCL multilang daemon which calls your python script or, alternatively, has to be wrapped in a rather cumbersome way. KCL keeps its state in a DynamoDB table.

This didn't sound too appealing to us, so we keept looking and finally ended up using the low-level API provided by Boto, which we wrapped in a set of our own classes so that we could test this feature easily, run a mock in development, and abstract the persistence of the SequenceIds (the id of the latest processed event in the stream).

Soon after this, we needed to consume these events from another service, so instead of re-writing/copying all the Kinesis access layer, we extracted it, added documentation, made it a package and open sourced it.

Pynesis has been used in production at ticketea for several months. Although it's probably far from giving the performance, throughput, concurrency and reliability of the official Java based solution, it's been good enough for our day to day usage, and it has simplified a lot our deployment stack, development environments, and testing. We also don't need to mix Java and Python in the same project anymore.

If you are interested, check out the Github repository and feel free to contact us if you want to use it and have any concern. Oh, and Pynesis has also been mentioned in Episode 48 of Python Bytes Podcast if you are into podcasts.

Of course, if you want to join our amazing team, we're hiring!