• Slice & DAIS 2021 - Data and AI Summit Highlights 🚀

    Online event

    Hi all,

    my name is Frank. I am a Developer Advocate for Databricks, based in Munich with a focus on Europe, the Middle East, and all of Africa. My goal is to support the community (that is you guys!) with all kinds of tech-focused events.

    Data and AI summit (the former Spark Summit) will take place the last week of May 2021 with keynotes, AMAs (ask me anything expert sessions), free training, online meetups, and lots of amazing product announcements. Expect new open source, improvements for Delta.io, MLFlow, and SQL.

    A colleague of mine, Matt, and I decided to kickstart a new event covering the key highlights from Data and AI Summit (DAIS). We named the event Slice & DAIS (because we thought that is funny). Meetup groups in Cape Town, Johannesburg, Dubai, Istanbul, Milano, Barcelona, Munich, Berlin, St. Peterburg, and Moscow will join us online.

    event date: June 17th, 2021
    time: 18.30h Berlin time / 17.30 London time
    location: online, live streaming
    level: L250
    (on a scale from L100 "marketing product flyer 1st paragraph" to L400 "live coding for the audience that already has experience with product")

    * AI/ML news, Matt Thomson, Databricks UK, 20 mins
    * Lakehouse news, Frank Munz, Databricks GER, 20 mins
    * Community session: "Degrading Performance? You Might be Suffering From the Small Files Syndrome". Adi Polak, DAIS presenter and Sr. Software Engineer and Developer Advocate in the Azure Engineering organization at Microsoft offered us a slightly shortened version of her session from DAIS. Approx. 30 mins.

    Don't miss out on the first Slice & DAIS event ever ;-). There will also be swag. Real swag, not pandemic, virtual swag!

    We want this meetup group to be community-driven. So ping us, if you'd like to present at a future event. All levels are welcome! We strive to provide a safe environment where everyone feels welcome. Topics can be Databricks or open source. Databricks is multi-cloud. We love to hear about your implementation on AWS, Azure or GCP. Emphasis will be on future in-person events once things a back to normal.

  • Apache Spark + AI Munich - Metadata Driven Development & Wide Tables #Feb2020

    In this Meetup, we are very happy to have Adidas talk about their Metadata Driven Development for Data Lake Environments and hear Data Reply talk about Pivoting wide tables in Spark, which is a common problem to solve.

    This event will be hosted by Data Reply together with two great speakers from the community:

    * Jorge Cespedes (Adidas)

    * Sadik Bakiu (Data Reply)

    Talk 1: Metadata Driven Development for Data Lake Environments

    Talk 2: Pivoting wide tables with Spark


    Metadata Driven Development for Data Lake Environments

    M3D stands for Metadata Driven Development and is a cloud and platform-agnostic framework for the automated creation, management, and governance of metadata and data flows from multiple sources to multiple target systems. It is a tool comprised of two components: m3d-API and m3d-engine. At Adidas the framework is used for the creation of data lake environments, management and governance of metadata, data flows from multiple sources and algorithms as data frame transformations. It is feeding 500+ tables and over 900 views.
    M3D was made last year open-source and it is available for the community in the Adidas Github account. It is constantly getting new features and improvements from internal teams.

    Jorge is the tech lead for Adidas M3D framework, working on making M3D ready for feeding their fast data analytics platform, and actively contributing to the M3D engine since he joined the data engineering team at Adidas almost 2 years ago. He has been working with the Hadoop ecosystem for 5 years and contributed to some open source projects such as Impala and Apache Kudu. Recently, he started exploring AI world experimenting with LSTMs for NLP.

    Pivoting Wide Tables with Spark

    Pivoting operation is very computation intensive. In cases, when the resulting table will contain more than 10k columns, this operation became especially demanding. In this talk, we will dive into how the current Spark implementation is ineffective in these cases, problems faced during the development and eventually how to implement this operation in a performant way.

    Sadik is a data engineer at Data Reply focused on helping customers develop their analytical applications with Spark and other distributed computation frameworks. Sadik is also specialized in developing scalable data processing pipelines in all major cloud providers.


    • 2 talks (each ca. 40 min incl. discussion)

    • Networking, food & drinks

    • Language: English

    • There will be photos taken

    • A list of registered users will be provided to the host

    • Please bring your ID

  • Fintech Geekout with Wirecard - Apache Spark Applications in Fintech

    Fintech GeekOut Munich


    6:30 pm Food & Mingle Time
    7:00 pm Introduction
    7:05 pm Spark Payment Insights by David McKennell & Thomas
    Mambrini / Wirecard
    7:40 pm Break
    7:55 pm Distributed Portfolio Risk Management using Spark by Nima
    Nooshi / Databricks
    8:30 pm Cake & Networking
    9:00 pm Official End

    For details go here: https://www.meetup.com/Fintech-GeekOut-Munich/events/267743292/

  • Women in Data & AI - Meetup

    Microsoft Germany GmbH


    RSVP here -> https://pages.databricks.com/201912-EU-EV-MSFT-WomenInData-Munich_01.RegistrationPage.html

    Date: Wednesday, 11th December 2019
    Time: 18:00 - 21:30
    Location: Microsoft Deutschland GmbH, Walter-Gropius-Straße 5, 80807 München

    Databricks and Microsoft would like to invite you to a special Drinks and Data event on 11th December at Microsoft Deutschland GmbH. We are aiming to help connect and empower women in Data Science, AI and ML, and their allies.

    Our tech talks will focus on technical themes and topics to give an in depth view of our speakers' cutting edge research.

    Enjoy drinks and appetizers, network with your peers and meet with industry leaders to discuss how data and machine learning are driving innovation across the entire ecosystem.

    The aim behind these meetups is to encourage more diversity in the world of tech with the help of our inspirational speakers and fantastic co-hosts.

    18:00 - Reception & Drinks - Networking, Drinks & Pizza
    19:00 - Welcome & Introduction
    19:15 - Talk #1: Usage of AI to create a new customer relationship built on trust by Sarah Rojewski
    19:45 Talk #2: Find Your Balance – A Guide to Effectively Dealing with Imbalanced Datasets by Speaker Julia Kraus
    20:15 - Break and Networking
    20:30 - Talk #3: BI is not a given – An experience report of bringing data in a medium-sized company by Ilka Landsmann-Kropp
    21:00 - Talk #3: Achieving integration through education - How Microsoft is teaching refugees about AI by Isabel Grund
    21:30 - End

    RSVP here -> https://pages.databricks.com/201912-EU-EV-MSFT-WomenInData-Munich_01.RegistrationPage.html

  • Apache Spark + AI Munich - Car Classification + PySpark #Dec2019

    Globe Business College Munich

    We are delighted to announce our car-focused event this December in Munich.

    This event will be hosted by Data Insights. together with two great speakers from the community:

    * Dr. Evan Eames (Data Insights)

    * Jannis Bergbrede (Inovex)

    Talk 1: Car Classification Using a Deep Convolutional Neural Network
    (This talk assumes a basic-to-intermediate understanding of Neural Networks)

    Talk 2: A Case for Isolated Virtual Environments with PySpark


    Car Classification Using a Deep Convolutional Neural Network Abstract

    In this talk, we will first go through a number of real Use Case examples, involving Convolutional Neural Networks (CNN), and then walk through the development, training, and eventual deployment of a heavy-duty full-scale CNN, in this case used to accomplish car model classification. Finally, we will discuss how this CNN can be easily reworked to accomplish a wide variety of other computer vision tasks.

    Dr. Evan Eames completed his PhD in Computational Astrophysics, in which he worked with early-universe full-numerical simulations. He now designs Deep Learning applications, with a focus on industry application. In his free time he's also tinkering with some interesting new ML architectures.
    You can see some of his projects on his Github: https://github.com/EvanEames

    A Case for Isolated Virtual Environments with PySpark

    When deploying and using PySpark applications in production, it is often necessary to use Python libraries like pandas, numpy or custom packages on the worker nodes of a spark cluster. Since the cluster is often shared among many users, managing global packages and their versions quickly becomes a hopeless endeavor. Based on practical experience at Germany's largest car market, I will motivate the use of isolated virtual environments for individual jobs. Further, I will talk about best practices on how to build and distribute these environments, leveraging conda requirement definitions and spark-submit functionality.

    Jannis Bergbrede completed his Master in Business Informatics at the University of Mannheim. Now he develops big data applications at inovex while focussing on building spark data pipelines and bringing Machine Learning use cases to production. He loves the outdoors and enjoys going on hikes.


    • 2 talks (each ca. 40 min incl. discussion)

    • Networking, food & drinks

    • Language: English

    • There will be photos taken

    • A list of registered users will be provided to the host

    • Please bring your ID

  • Apache Spark + AI Munich Meetup - Graph July 2019


    We are delighted to announce our graph focused event this summer in Munich.

    This event will be hosted by b.telligent together with two great speakers from the graphs community:

    * Dr. Andreas Hopfgartner & Dr. Sebastian Petry (b.telligent)

    * Martin Junghanns (Neo4j)

    Talk 1: Graphs are everywhere!? - Not yet, but maybe soon.
    Talk 2: Extending Apache Spark Graph for the Enterprise with Morpheus and Neo4j


    Extending Apache Spark Graph for the Enterprise with Morpheus and Neo4j

    Apache Spark 3.0 introduces a new module: Spark Graph. Apache Spark Graph adds the popular query language Cypher, its accompanying Property Graph Model and graph algorithms to the data science toolbox. Graphs have a plethora of useful applications in recommendations, fraud detection, and research.
    Morpheus is an open-source library that is API compatible with Spark
    Graph and extends its functionality by:

    * A Property Graph catalog to manage multiple Property Graphs and Views

    * Property Graph Data Sources that connect Spark Graph to Neo4j and SQL databases

    * Extended Cypher capabilities including multiple graph support and
    graph construction

    * Built-in support for the Neo4j Graph Algorithms library

    In this talk, we will walk you through the new Spark Graph module and
    demonstrate how we extend it with Morpheus to support enterprise users
    to integrate Spark Graph in their existing Spark and Neo4j
    installations. We will demonstrate how to explore data in Spark, use
    Morpheus to transform data into a Property Graph, and then build a Graph Solution in Neo4j.

    Martin Junghanns is part of the Morpheus Engineering team at Neo4j and one of the Apache Spark Graph contributors. He has a research background in distributed graph analytics, his main interests are query engines, graph algorithms, and functional programming languages. Martin holds an MSc Computer Science degree from the University of Leipzig.

    Graphs are everywhere!? - Not yet, but maybe soon.

    In an increasingly interconnected world, mathematical models are often sought to describe these networks and to be able to use network structures. One possibility is graphing.
    Graphs are defined and described out of the network idea. Subsequently, we transfer this mathematical model to applications outside of networks. Finally, a practical example of the application of graphs is presented.

    Graphs are everywhere - maybe not, but they could be much more common. "Use Graphs - use connections"

    Dr. Sebastian Petry
    * Development and optimization of analytical methods and tools
    * Project management
    * Leadership
    * Different analytic projects with data science and statistical methods

    Dr. Andreas Hopfgartner
    * Physicist with focus on modeling, algorithms and signal processing
    * Coverage of the complete data science stack with a focus on enterprise IoT
    * Coverage of the complete data science stack with a focus on enterprise IoT
    * Sensors, Constraint devices, Embedded Systems / Bus & Networks
    * Signal Processing
    * Profound experience in R&D, engineering, software development#


    • 2 talks (each ca. 40 min incl. discussion)

    • Networking, food & drinks

    • Language: English

    • There will be photos taken

    • A list of registered users will be provided to the host

    • Please bring your ID

  • Apache Spark + AI Munich Meetup - May 2019

    Microsoft Germany GmbH

    This is the time for our 1st Meetup in this year 2019.

    Microsoft will host our event this time with speakers from:

    * DataSentics (Petr Bednařík)

    * Databricks (Bernhard Walter)


    Petr Bednařík: AI-Driven Digital Customer Engagement powered by Apache Spark - Using ML to Analyse 100s of GBs of Ad Data and Microtarget Ads (40 min)

    Petr Bednařík, experienced data science architect and founder of DataSentics - machine learning and cloud data engineering boutique focusing on use cases especially in finance and retail using Spark and Databricks.

    Bernhard Walter: Mlflow in Action (40 min)

    MLflow is an open source platform for managing the end-to-end machine learning lifecycle. It allows tracking of experiments, packaging of ML code in a reusable, reproducible form and managing and deploying models from a variety of ML libraries to a variety of model serving platforms.
    This talk will demonstrate building and managing models in several ways:
    - Model building in Spark notebooks in the cloud using mlflow to track hyperparameter tuning.
    - Model building from versioned code via the command line in local or remote python environments managed by mlflow.
    - Inference on different platforms using models stored by mlflow in different flavours.

    With ten years of enterprise background as IT and Enterprise Architect at a global telecommunications provider, Bernhard spent the last seven years helping companies with their Digital Transformation: Starting with API Management as enabler for the bimodal IT, he now helps companies on their journey getting data driven. His main area of expertise is around distributed computing and advanced analytics on premises and in the cloud. He has a PhD in statistics and is a regular speaker at Masterclasses and events teaching advanced topics around Spark and Machine Learning.


    • 2 talks (each ca. 40 min incl. discussion)

    • Networking, food & drinks

    • Language: English

    • There will be photos taken

    • A list of registered users will be provided to the host

    • Please bring your ID

  • Apache Spark Munich @gutefrage.net GmbH

    gutefrage.net GmbH

    This is our first Apache Spark Meetup in Munich. In this Meetup we will explain some Spark basics and show a small live demo in the first Talk by Danny. The second talk will be about implementing a ML based answer scoring with Spark and MLLib.

    Between and after the talk we have time for drinks, food and conversations.

    ! The talks will be in German !


    "Spark Basics - RDD, SQL, Mllib, GraphX" - Danny Linden

    "With the rapid adoption of Apache Spark—one of the most active Apache projects today—and the need for programs to solve the world's greatest problems, distributed computing has resurfaced as a hot commodity that can take your career to the next level. More importantly, Spark opens the door to some really cool and impactful applications. Spark is a leap forward in distributed computing, allowing you to perform faster and more complex analyses on your Hadoop cluster and in the cloud. This presentation will give a short introduction to basic Spark concepts such as RDDs, transformations, actions, and executors. We will also cover recent developments in the Spark community with DataFrames, SQL on Spark, GraphX."

    "Speed Up Your Spark Job" - Christian Dedié

    gutefrage.net is using spark extensively for Maschine Learning, BI and realtime processing of user behavior. This talk is about our learnings and pitfalls when implementing a ML based answer scoring (ordering) for all questions on gutefrage.net. E.g. improve reliability and throughput of spark jobs with read/write access to relational datasources, or optimize HDFS based data structures for best performance. Starting with Dataframes and Spark SQL, we experienced some major improvements when implementing the same functionality based on RDDs.

    About Christian Dedié:
    Christian Dedié has 20 years of experience as a software engineer. He's a passionate Scala developer and Continuous Delivery advocate. In the last years he focused on big data projects using Polyglot Persistence and Maschine Learning. He is co-founder of the open source project "Flyway - Database Migrations Made Easy".

  • First Spark-Munich Meetup @ "Big Data Munich"


    Hi Spark-Munich members,

    We are happy to announce our first Meetup date. The "Big Data Munich" Meetup Group ( https://www.meetup.com/de/Big-Data-Munich/events/226100767/ ) has invited us as guest group on the Big-Data Munich Meetup.

    No one less than Sean Owen, one of the leading Spark developers from Cloudera, the first, and one of the leading provider and supporter of the Apache Hadoop Stack, will join the Meetup and talk about „A taste of random decision forests on Apache Spark“.

    I am pleased that Cloudera provides one of there best Spark Devs and I'm looking forward to upcoming great Meetups pairing informative talks with community exchange.

    Additionaly there are going be two more great talks, the whole agenda is as follows:

    7:00 - 7:15 PM: Drinks & Networking 7:15 - 7:35 PM: Christian Löhnert, Pre- Sales Consultant at ConSol* Consulting & Solutions Software GmbH

    "Where are the users? - A (simple) story about getting started with Big Data"

    7:35 - 8:05 PM: Sean Owen, Director Data Science at Cloudera

    "A taste of random decision forests on Apache Spark" 8.05 - 8:15 PM: Matthias Korn, Technical Consultant at Data Virtuality

    "Beyond the Data Lake"

    Thanks so far to all of you and i'm looking forward to see you all at the 12th of November!

    Our first dedicated "Spark-Munich Kick-Off" Meetup will be held in the first week of December. The exact Date will be announced asap.