• Data Science Meetup Hamburg

    Neustädter Neuer Weg 22

    !!! New Location @ better.group !!! === Doors open @ 6:30 === === Networking === === Small intro === === Break & Networking === === Talk 1 === RLgraph: Robust, incrementally testable reinforcement learning by Kai Fricke, Postdoc @ Helmut-Schmidt-Universität Hamburg. In this talk, we will introduce RLgraph, a modular reinforcement learning library. Utilizing a strict separation of concerns, RLgraph makes it easy to build, test and debug reinforcement learning algorithms, or to just use well-tested off-the-shelf algorithms for optimization problems. This talk will introduce the library, discuss the challenges we faced implementing the library, and touch the topic on how you can extend the library to fit your needs. === Talk 2 === Fabian Braun Algorithm Egineer at MOIA on features beat algorithms - Improving Card Fraud Detection through Suspicious Pattern Discovery In this talk we will introduce the topic of credit card fraud from a data science perspective. Then we show how frequent pattern mining can be used to improve card fraud detection. According to our hypothesis fraudsters use stolen credit card data at specific, recurring sets of shops. We exploit this behavior to identify fraudulent transactions. In a first step we show how suspicious patterns can be identified from known compromised cards. Then we define new attributes which capture the suspiciousness of a transaction indicated by known suspicious patterns. Eventually a non-linear classifier is used to assess the predictive power gained through those new features. The new attributes lead to a significant performance improvement compared to state-of-the-art aggregated transaction features. Our results are verified on real transaction data provided by our industrial partner. === Networking === === Closing ===

    4
  • Data Science Meetup Hamburg

    jimdo

    === Doors open @ 6:30 === === Networking === === Small intro === === Break & Networking === === Talk 1 === Marc Päpper Machine Learning Engineer on "Help Pacman beat the ghosts with deep Q learning" Did you ever want to defeat a computer game by only watching the screen? You can using a software agent! However, the challenge is this: given only the image pixels of the computer screen, the agent needs to figure out how to optimally play the game. In this talk I will lead you in-depth and step by step through the deep Q learning algorithm which uses neural networks to learn a Q policy which represents the optimal action given the current situation. The talk will feature code samples in Python and put all the pieces together in a live demo showing an agent learning to master a game. === Talk 2 === "Data hacking, from fast prototyping to production systems in order personalize Jimdo websites" by Michael Schneider Data Scientist @Jimdo A tool stack overview + hands on data tipps on building non blocking and near real time data products in Kotlin in order to personalize websites at Jimdo. From data sourcing using apache drill to rapid prototyping, addressing the importance of machine learning evaluation, production problems and finally a/b testing with live users. Embracing the data driven culture of learning and failing fast. === Networking === === Closing ===

    2
  • Data Science Meetup Hamburg

    jimdo

    === Doors open @ 6:30 === === Networking === === Small intro === === Break & Networking === === Talk 1 === Sven Giesselbach Data Scientist @ Fraunhofer IAIS with his NIPS paper on transfer learning "Corresponding Projections for Orphan Screening" We propose a novel transfer learning approach for orphan screening called corresponding projections. In orphan screening the learning task is to predict the binding affinities of compounds to an orphan protein, i.e., one for which no training data is available. The identification of compounds with high affinity is a central concern in medicine since it can be used for drug discovery and design. Given a set of prediction models for proteins with labelled training data and a similarity between the proteins, corresponding projections constructs a model for the orphan protein from them such that the similarity between models resembles the one between proteins. Under the assumption that the similarity resemblance holds, we derive an efficient algorithm for kernel methods. We empirically show that the approach outperforms the state-of-the-art in orphan screening. === Talk 2 === Sebastian Niehaus - Data Scientist @ AICURA medical We present a reinforcement learning approach for detecting objects within an image. Our approach performs a step-wise deformation of a bounding box with the goal of tightly framing the object. It uses a hierarchical tree-like representation of predefined region candidates, which the agent can zoom in on. This reduces the number of region candidates that must be evaluated so that the agent can afford to compute new feature maps before each step to enhance detection quality. We compare an approach that is based purely on zoom actions with one that is extended by a second refinement stage to fine-tune the bounding box after each zoom step. We also improve the fitting ability by allowing for different aspect ratios of the bounding box. === Networking === === Closing ===

  • Data Science Meetup Hamburg

    jimdo

    === Doors open @ 6:30 === === Networking === === Small intro === === Break & Networking === === Talk 1 === Distributed Machine Learning by Tim Wirtz Senior Data Scientist @Fraunhofer IAIS Efficient Decentralized Deep Learning by Dynamic Model Averaging We propose an efficient protocol for decentralized training of deep neural networks from distributed data sources. The proposed protocol allows to handle different phases of model training equally well and to quickly adapt to concept drifts. This leads to a reduction of communication by an order of magnitude compared to periodically communicating state-of-the-art approaches. Moreover, we derive a communication bound that scales well with the hardness of the serialized learning problem. The reduction in communication comes at almost no cost, as the predictive performance remains virtually unchanged. Indeed, the proposed protocol retains loss bounds of periodically averaging schemes. An extensive empirical evaluation validates major improvement of the trade-off between model performance and communication which could be beneficial for numerous decentralized learning applications, such as autonomous driving, or voice recognition and image classification on mobile phones. === Talk 2 === Challenges and Pitfalls in Attribution Modelling by Sascha Netuschil Team Lead Data Analytics bonprix. Attribution modelling is one of the most fundamental parts of (digital) marketing. It is also a fascinating data science task with various challenges and pitfalls, many of which only become apparent once you delved deep into the data. In this talk I want to share my experiences with attribution modelling from a data science perspective and point out critical issues as well as possible approaches to solving them. === Networking === === Closing ===

    4
  • Data Science Meetup Hamburg

    Location visible to members

    === Doors open @ 6:30 === === Networking === === Small intro === === Break & Networking === === Talk 1 === The AI universe speaks Python - Lessons learned from a Coursera Instructor by Romeo Kienzler (Global Chief Data Scientist, DeepLearning/AI Engineer, Watson IoT, Associate Professor Artificial Intelligence FH Berne) As a Sun Certified Java Programmer and Developer, and after working in 100s of JEE/WebSphere projects for IBM I never could imagine using anything else than Java. Even when planning for the first coursera course - "Fundamentals of Scalable DataScience" - we had tons of discussions whether we should use Scala, R or python for it. Now Python is an obvious choice for data science and we are using it for our "Advanced Machine Learning and Signal Processing" course as well as for the "Applied AI using DeepLearning" course. So in this talk I'll give you a summary on how to use Python and ApacheSpark on top of different machine learning and deep learning frameworks like SparkML, SystemML, Keras, DeepLearning4J to build scalable brains for AI. I'll finalise with a summary on Fabric for DeepLearning and the Docker/Kubernetes based Open IBM Model Asset Exchange. === Talk 2 === Snakes on a plane: Ship your Python on enterprise machines by Max Pumperla (Max is a Deep Learning engineer at SF-based skymind.ai, bringing Keras and TensorFlow to the JVM, and co-founder of Deep Learning platform aetros.ai. He is the author of "Deep Learning and the Game of Go" at Manning and Coursera instructor for "Applied AI with Deep Learning". Max has been a Keras contributor since day one, is Deeplearning4J core developer, maintainer of Hyperopt). Data scientists want Python for experimentation, engineers want production-gradesystems. This can create friction between departments and often leads to suboptimal solutions. In this talk we show how to access Deeplearning4J (DL4J) directly from Python, and discuss how to import some of your favorite frameworks into DL4J. This approach narrows the gap between science and engineering and brings Deep Learning models to production more easily. We close by giving a demo of real-time object detection with YOLO, using Skymind's intelligence layer (SKIL). === Networking === === Closing ===

    3
  • Data Science Meetup Hamburg

    jimdo

    === Doors open @ 6:30 === === Networking === === Small intro === === Break & Networking === === Talk 1 === Dude where’s my Database? Productive Data Pipelines in Python by Malte Harder, Senior Data Scientist at Blue Yonder At BlueYonder we use Python (almost exclusively) to operate data pipelines for our customers in the retail world. To achieve this, we leverage open source technologies such as Apache Arrow, Apache Parquet as well as Dask and Pandas. In this talk I will describe the challenges we faced scaling our solution from a single node to a distributed system running on hundreds of nodes and how this changed the role of our database. === Talk 2 === Machine Learning and Artificial Intelligence for Business Applications - An Overview of Current Developments by Prof. Dr. Martin Spindler Machine Learning (ML) and Artificial Intelligence (AI) are more and more used for economic and business-related problems. The talk will give an overview of current developments of ML and AI, in particular on the importance of causal inference for many practical questions and how it can be used in practice. Classical examples are demand estimation and dynamic pricing. The focus will be on both the underlying concepts / principles and applications for business-related problems using examples from logistics, marketing, and health care. === Networking === === Closing ===

    8
  • Data Science Meetup Hamburg

    jimdo

    === Doors open @ 6:30 === === Networking === === Small intro === === Break & Networking === === Talk 1 === Data Story Telling by Simon Nehls, Senior Consultant Data Science @holisticon.de Data Story Telling We are getting better in every aspect of data science. We are automatizing feature engineering, we are constantly improving model performance, etc.... However, if it comes to talking about all this, we seem to be stuck in an early stage! Data Story Telling might be a solution. It can help us to clearly communicate our objectives and explain our results 😉 This short primer will show a few basic elements of a good data story. === Talk 2 === Deduplication of Images from Social Media Sources by Stefan Schadwinkel, Senior Data Engineer @Jimdo This talk will focus on practical issues when pooling user images from different social media sources: Users might have uploaded the same image to multiple providers. We will look at challenges that particularly arise from that setting in comparison to more classical use cases like searching for copyright infringement. To tackle those challenges, we will look at different image processing methods and ways to solve the task in computationally efficient ways. And yes: there's data science! === Networking === === Closing ===

    2
  • Data Science Meetup Hamburg

    wework

    === Doors open @ 6:30 === === Networking === === Small intro === === Break & Networking === === Talk 1 === Pipeline Testing with Great Expectations Every data scientist has their scary stories of how a change in data broke their metrics dashboard, report or ML model. This talk introduces Great Expectations - an open source library for testing data pipelines: https://github.com/great-expectations/great_expectations/ Eugene Mandel is Head of Product at Superconductive Health. Passionate about making Data Science and Machine Learning into real products. Previously: Data Science lead - Directly, Data Science team - Jawbone, CTO/co-founder - Qualaroo, co-founder - Jaxtr === Talk 2 === Insights and highlighting trends on experimental designs from KDD 2018 by Michael Schneider. I cover the topics of: Optimizing experimental designs and going beyond vanilla a/b testing. The use of machine learning to optimize experiments. Method for a stable FDR while searching for heterogeneous subgroups. SIGKDD promotes basic research and development in KDD, adoption of "standards" in the market in terms of terminology, evaluation, methodology and interdisciplinary education among KDD researchers, practitioners, and users. === Networking === === Closing ===

    7
  • Data Science Meetup Hamburg

    jimdo

    === Doors open @ 6:30 === === Networking === === Small intro === === Break & Networking === === Talk 1 === Gregor Kasieczka on "Deep Learning in Particle Physics" Deep learning has recently gained attention for solving numerous problems - from image classification to strategies for games like Go or Poker, outperforming humans for many of these tasks. This talk will discuss how these techniques can be used to look for new elementary particles with the Large Hadron Collider (LHC) at CERN in Geneva. Gregor Kasieczka is a Junior Professor at the University of Hamburg and leads an Emmy Noether research group searching for exotic new particles produced in proton-proton collisions at the LHC. A main focus of his research is the creative application of machine learning techniques to particle physics. === Talk 2 === Christian Geier (data science consultant for Ginkgo Analytics) on "Data Science meets Neuroscience" - Trying to understand the epileptic process und predicting epileptic seizures in humans This talk will highlight some of the unique challenges of dealing with EEG recordings from the human brain and provide insights into recent progress in seizure predictions. Furthermore an overview of the tools and techniques used from scientific fields from physics, digital signal analysis and computer science will be given. Christian Geier wrote his dissertation in physics at Bonn University on the importance of nodes in complex networks, focusing on the human epileptic brain as a particularly rich and interesting example. === Networking === === Closing ===

    4
  • Data Science Meetup Hamburg

    jimdo

    === Doors open @ 6:30pm === === Networking === === Small meetup intro === === A note from Dr. Ulrich Fricke (Head of Department Information Management and Technology) our partner eventim === === Break & Networking === === Talk 1 === John Langford from Microsoft Research, New York (some background information): (Vowpal Wabbit [VW] related) https://en.wikipedia.org/wiki/Vowpal_Wabbit http://hunch.net/~vw/ https://github.com/JohnLangford/vowpal_wabbit/wiki (cool paper) https://arxiv.org/pdf/1503.02834.pdf (more recent cool paper) https://openreview.net/pdf?id=HJNMYceCW === Networking === === Talk 2 === One month of GDPR. How is it affecting you? Share your experience session. === Networking === === Closing === === Next Meetup === https://www.meetup.com/Hamburg-Data-Science-Meetup/events/jjcpspyxkbpc/

    4