Services, Deployments and Fun with Kafka War Stories
18:00 - 18:30 - Mingling
18:30 - 19:30 - From Backend to Frontend in (a few) Seconds - Barak Luzon at Taboola
19:30 - 20:15 - The weird journey of fixing JVM out-of-memory crashes of some Kafka brokers - Gaash Hazan at Taboola
Title: From Backend to Frontend in (a few) Seconds
Taboola services rely heavily on having fresh data. Whether it's personalized items to recommend, campaigns related data, configurations or kill-switches. Every relevant information must propagate fast and in a reliable way to all services.
Maintaining fresh information across nearly 700 frontend servers, spreading over 7 time-zones (and twice as much backend servers) means you always fight the endless battle between how fresh is your data and how much load you create by fetching that data. With the frontend servers handling over 500k HTTP requests per second we have very little resources we can spare.
As Taboola continues to grow we find ourselves “tweaking” caches more and more. Instead, we decided to take another path.
Join us to hear how we leveraged MySql internal replication mechanism with our know how in distributed Kafka. Achieving near real-time data propagation from our centralized backend db to ALL services within seconds, while reducing the load on our services and DBs.
Barak Luzon is a Fullstack Engineer at Taboola.
“Trying is the first step towards failure” said the great Homer Simpson and I would add that “Failing is the first step towards success”.
I’ve been around software since 2006, in various companies and positions, from C4 system for intercepting rockets through E-commerce and Ad-Tech.
I'm always keen to learn new technologies and test them to see how far to the edge I can take them. I practice this passion by day at Taboola with our team of rockstars, while by night I spend time on my second passion - brewing my own beer.
Title: The weird journey of fixing JVM out-of-memory crashes of some Kafka brokers
Taboola has over dozen Kafka clusters spread around the world. 100+ servers, relay 30+TB/day between Taboola's edge and core data centers. Everything from click events, through configuration changes down to debug logs and operational metrics flow through Kafka topic. This means that Kafka services must be highly reliable.
Our big Kafka clusters are highly stable. But, some of our smaller clusters experience JVM out-of-memory crashes once every few days. The obvious fix, to increase JVM memory made things even worse.
Join us for a low-level journey on trouble shooting this weird problem and to share a non-intuitive solution.
Gaash Hazan - Backend Developer at Taboola