• From Legacy to Spark, What's new in Spark 2.0

    Rise Tel Aviv FinTech and Cyber

    Arrival Instructions: Directions to Rise (https://drive.google.com/file/d/0B5zFqZPyoCYZeU5xZmVYWk9MNDg/view) Schedule: • 18:00 - 18:30 gathering • 18:30 - 19:15 - Patterns of Data Ingestion - Moving Legacy Application to Spark - Anand Khandelwal, Delivery head of Market Risk - Data & Reporting, Barclays • 19:15 - 19:30 - break for snacks and mingling • 19:30 - 20:15 - What's new in Spark 2.0 - Shimi Bandiel CTO, Trainologic 20:15 - 20:45 - networking Meetup co-hosted with: • Rise ( http://www.meetup.com/Rise-Tel-Aviv-FinTech-and-Cyber/) • Barclays ( https://www.home.barclays/about-barclays/around-the-world/israel.html ) Sponsored by Taboola

  • Shuffling Spark with Kafka, Standalone Spark approach

    Taboola Offices Rooftop

    A joint meetup between Israel Spark Meetup and HadoopIsrael Meetup 18:00 - 18:30 - Mingling 18:30 - 19:15 - David Gruzman - “Kafka architecture, place of Kafka Streaming and usage of Kafka as Spark's shuffle engine” We will get into Kafka architecture, and try to understand together - what is Kafka streaming and when it should be used. In addition we will share our experience of using Kafka to accelerate our Spark application. I will tell also a few words about our system itself, where this acceleration was used. 19:15 - 19:30 - Break 19:30 - 20:15 - Alon Torres - DevOps Enginner, Totango & Romi Kuntsman - Senior Big Data Engineer, Totango - “Standalone Spark for Stability and Performance” After initially trying AWS EMR and YARN with lackluster results, we decided to move to a manually fine-tuned Spark Standalone setup over AWS EC2. We'll share our experience with controlling Spark components separately, using Chef, autoscaling groups, log integration, and more. Since moving to this architecture, the days of cluster instability are long gone, and our server utilization is great.

  • Apache Spark in the Cloud, Fighting World Hunger

    Taboola Offices

    17:30 - 18:00 - Mingling 18:00 - 18:45 - Vadim Solovey (Google Developer Expert and Authorized Trainer @ DoIT) - “Google Dataproc - the new way of running Spark on Google Cloud” Google's new managed Hadoop MapReduce, Spark, Pig, and Hive service designed to process large datasets at extreme scale and unprecedented price. 18:45 - 19:00 - Break 19:00 - 19:45 - Noam Barkai (Software developer @ NRGene) - "Using Spark to fight world hunger" Spark can assist in breeding better crops to help feed a growing world population.

  • Apache Spark: Spark Streaming, Dataframes, Zeppelin and more

    17:30 - 18:00 - Mingling 18:00 - 18:45 - Richard Grossman (System Architect @ Inneractive) - “How Inneractive succeed to process more than 1 billion events / day” Richard will tell us how they are using Spark Streaming, Kafka, Parquet DB and other cutting edge technologies to handle their big data challenge. 18:45 - 19:00 - Beer & Coffee break 19:00 - 19:15 - Ruthy Goldberg, Tal Sliwowicz (Taboola R&D) - "Spark Summit highlights" The recent spark summit last month was very interesting. We will take a few minutes to go over the highlights and point to some interesting talks that are worth watching. 19:15 - 19:50 - [Same Presenters, Taboola R&D] - "Using Spark and Cassandra together for data analysis using Data Frames and Zeppelin" In the previous meetup we told the story of Newsroom, a product that is used for real time analytics for home page editors. We are using Cassandra to collect all the data for Newsroom. Unfortunately, data in Cassandra is very hard to use for human analysts. Therefore, we created a new framework(*) that very quickly and efficiently loads any data from Cassandra into Spark Data Frames. Our Analysts were given access to it through Apache Zeppelin, and in this talk we will share what we did and our experience with Data Frames and Zeppelin. *We are planning to open source this framework

  • Apache Spark: How it's being used in production

    Taboola Offices

    17:00 - 17:30 - Mingling 17:30 - 18:15 - Demi Ben-Ari (Senior Software Engineer @ Windward) - “Spark in the Maritime Domain” We will be showing the use case of the implementation of a Data Pipeline in the maritime domain @Windward via Spark applications. The process was converting a Monolith application to a fully distributed and scalable application. We'll be talking about all the tools and the process of taking an idea and developing Spark applications around it, And will show the development of an application End to End, from DevOps to the method of thinking about the development of applications, showing use-cases and the "lessons learned" at Windward Ltd, I hope that after the talk, it will give you some more Practical tools to "Spark"ing your way around. 18:15 - 18:25 - Coffee Break 18:30 - 19:15 - Tzach Zohar (Architect @ Kenshoo) - “Spark your legacy - real-world lessons from distributing an 8-year-old monolith” We're all here because we understand the potential of Spark for heavy-weight distributed processing. But how does one migrate an 8-years-old, single-server, MySQL-based legacy system to such new shiny frameworks? How do you accurately preserve the behavior of a system consuming Gigabytes of data every day, hiding numerous undocumented implicit gotchas and changing constantly, while shifting to brand new development paradigms? In this talk I'll present Kenshoo's attempt at this challenge, where we migrated a legacy aggregation system to Spark. Our solutions include heavy usage of metrics and graphite for analyzing production data; "local-mode" client enabling reuse of legacy tests suits; data validations using side-by-side execution; and maximum reuse of code through refactoring and composition. Some of these solution use Spark-specific characteristics and features. 19:15 - 19:25 - Beer Break 19:30 - 20:15 - Ruthy Goldberg, Tal Sliwowicz (Taboola R&D) - “Spark Magic - building and deploying a high scale product in 4 months” Taboola’s R&D was tasked to build a new real-time high scale platform for home page optimization (“Newsroom”). We had a mission to design, develop and deploy a new full blown real time data analysis production system in 4 months after an alpha. We were able to achieve our goals and more using Spark and Cassandra. We now have many live production publishers using it exclusively (weather.com, theblaze, tribune, college humor and many others) and usage is growing. In this talk we would like to share our real-world experience of building and deploying Spark and Cassandra applications.