Skip to content

Spark Serving Globally and Ingestion of IoT Big Data

Photo of shlomi hassan
Hosted By
shlomi h. and Demi B.
Spark Serving Globally and Ingestion of IoT Big Data

Details

18:00 - 18:30 - Mingling
18:30 - 19:15 - UserDB: Distributing and serving tons of random access data globally - with low latency and cost - Elad Rosenheim VP R&D @ Dynamic Yield
19:15 - 20:00 - An Ingestion and Analytics Architecture for the Internet of Things applied to Madrid Transportation - Dr. Paula Ta-Shma Research Staff Member in @ IBM Cloud Security & Analytics group

Title:
UserDB: Distributing and serving tons of random access data globally - with low latency and cost

Abstract:
At Dynamic Yield, our Spark jobs generate every day multiple bits of pieces of information about each user ID we’re encountering - feature vectors for prediction, segments based on our own behavioral tracking and onboarded CRM data, personalized recommendations and more. This amounts to roughly 250GB a day which needs to be accessed with up to a few milliseconds of latency (or preferably much less), in multiple regions.

In this talk, I’ll explain what limits we’ve hit with the technologies we’ve used (Redis, HBase) and how we’ve rolled our own (relatively) frugal solution based on simple battle-tested components: LMDB, S3, LZ4 compression and some Redis (coz you gotta have Redis in it, it’s like Monosodium Glutamate). Rather than boasting about how nice our yet-another-proprietary-approach is, I’ll try to focus on the journey: things I’ve tried, things I’ve learned, and the appropriate Zen teachings...

Bio:
Elad Rosenheim is VP R&D at Dynamic Yield, a platform for personalization in web, apps and e-mail. He joined the company as one its first employees and has built and lead the Machine Learning and Big Data teams (before transforming into yet another manager...) Prior to that, Elad worked on C4I systems for the IDF, and worked on IaaS development and deployment automation at SAP. Always interested in performance, scale and elegant design.

https://secure.meetupstatic.com/photos/event/d/6/6/a/600_460554890.jpeg

Title:
An Ingestion and Analytics Architecture for the Internet of Things applied to Madrid Transportation

Abstract:IoT data will arguably become the Biggest Big Data, possibly overtaking media and entertainment, social media and enterprise data. How can we make effective use of this vast ocean of data? How should the plethora of Big Data processing tools be pieced together in a seamless way to solve real world IoT use cases ? This talk is about our smart cities work with Madrid Council, involving monitoring traffic sensors located throughout Madrid to learn their behavior patterns and react in real time accordingly. We will present an ingestion and analytics architecture for IoT based on open source frameworks, and also discuss the extensions we made along the way. We'll wrap up with a demonstration of our solution on the IBM Bluemix platform, where the underlying code is open source and can be adapted to other IoT use cases.

Bio:
Dr. Paula Ta-Shma is a Research Staff Member in the IBM Cloud Security & Analytics group. She is currently working on cloud storage infrastructure for the Internet of Things, and leads several related research efforts. She led IBM efforts in the EU funded COSMOS project as well as various other research projects such as Continuous Data Protection. Her work has been presented at multiple industry conferences, including the Apache Spark Summit, the OpenStack summit, IBM Insight and IBM InterConnect, as well as academic conferences such as FAST. She holds M.Sc. and PhD degrees in computer science from the Hebrew University of Jerusalem.

https://secure.meetupstatic.com/photos/event/d/6/6/3/600_460554883.jpeg

Photo of Big Things group
Big Things
See more events
Atrium Tower, floor 32 (Habursa Ramat Gan) · Tel Aviv-Yafo