• Data+Analytics meetup

    Grand Parade part of William Hill

    Let's meet on Data+Analytics meetup! Don't forget to grab your ticket here -> https://dataplusanalytics.evenea.pl/ AGENDA: 💢 6:00 PM - 6:15 PM Start & Networking 💢 6:15 PM - 6:30 PM Intro by Grand Parade Data Team 💢 6:30 PM - 6:50 PM Keeping customers safe - our responsibility as an operator, and our challenges with data - by Wayne Field Abstract: As the popularity of online betting and gaming increases, especially with a younger audience, it's massively important that we use our data responsibly to identify and interact with those customers we think may be at risk of harm. 💢 7:00 PM - 7:20 PM Applications of Data Science in eCommerce from a gaming perspective - by Piotr Smolinski Abstract: We have been enjoying the digital transformation for the last few years and as crazy it seems, we still haven't reached the top of "AI pyramid"! This talk will review the ways in which William Hill applies machine learning to various areas of the business from marketing, through website personalization to various product enhancements. We will touch what we delivered already, discuss roadmap and explain key role our new Smart Data Platform plays in transitioning William Hill to true "AI leader" in online betting. 💢 7:30 PM - 7:50 PM Design best practices for MSSQL based warehouse. Challenges and lessons learned from the last 5 years - by Alan Christie Abstract: Sound database design and implementation is the cornerstone to a successful data platform. Several key factors focused around the MSSQL stack play a key role in enabling this. 💢 7:50 PM - 8:00 PM Quiz 💢 8:00 PM - 9:00 PM Networking Registration kicks off on March 29th, 10:00 AM. Stay tuned for more info by following the event page. Save the date and see you at Kotlarska 11! 🙌

  • Streaming topic model with Apache Flink + Using IoT Analytics To Save The Planet

    Title: Streaming topic model with Apache Flink + Using IoT Analytics To Save The Planet When: Fri, Mar 1st 2019, 6PM-9PM Where: Grand Parade part of William Hill Office, Kotlarska 11 (Main entrance is from Kotlarska street, alongside the main route) Agenda: 6:00 PM - Networking 6:30 PM - DataKRK update and intro by Grand Parade 6:45 PM - Streaming topic model training and inference with Apache Flink 7:30 PM - Short break 7:45 PM - Using IoT Analytics To Save The Planet 8:30 PM - Q&A, Quiz, Networking # First talk: Streaming topic model training and inference with Apache Flink by Suneel Marthi and Jörn Kottmann ## Abstract How to use stateful stream processing and Flink’s Dynamic processing capabilities to continuously train topic models from unlabelled text and use such models to extract topics from the data itself. Analyzing streams of text data to extract topics is an important task for getting useful insights to be leveraged in subsequent workflows. In this talk, we discuss a new approach to streaming topic modeling and also look at other implementations like Online LDA leveraging Apache Flink stateful streaming. We illustrate how to use Flink’s Dynamic processing capabilities to continuously train topic models from unlabelled text and use such models to extract topics from the data itself. Such topic models will be built leveraging distributed representations of words and documents. ## Bio: Suneel is a Member of Apache Software Foundation and is a Committer and PMC on Apache Mahout, Apache OpenNLP, Apache Stream. He presently works as a Principal Technologist – AI/ML at Amazon Web Services. He’s previously presented at Flink Forward, Hadoop Summit Europe, Berlin Buzzwords, Machine Learning Conference and Apache Big Data in the past. He’s based out of Dulles, Virginia in the Washington DC Metro area. Jörn is a member of the Apache Software Foundation. He contributed to Apache OpenNLP for 13 years and is PMC Chair and committer of the project. In his day jobs he used OpenNLP to process large document collections and streams, often in combination with Apache UIMA where he is a PMC member and committer as well. # Second talk: Using IoT Analytics To Save The Planet :) ## Abstract: It's easy and cheap to find patterns in data using modern analytics tools and techniques. The bigger question is whether analytics can be used to change "offline" behaviour of people, hopefully for the better. It was observed that vehicle drivers with teliaSense product installed in their car who look at the "Eco-Drive" feature in the corresponding mobile app more often, used their car in a more eco-friendly manner by idling for a shorter duration per km driven. Subsequently, my team conducted an experiment on users wherein we exposed users to the Eco-Drive feature by making it more prominent, and this "caused" a subset of users to start idling less per km driven compared to both the wider population of drivers and to their own previous idle time. This interdisciplinary project merges behavioral psychology and economics, statistics, and IoT analytics. The results carry implications for tackling a wide array of challenges for us as a business, such as accident prevention and lowering of carbon emissions from transport. Furthermore, the wider learnings from such an experiment can be abstracted and applied to a number of domains, such as healthcare, finance and public services. ## Bio: Aru leads the analytics team at Tantalum, a connected cars platform based in Stockholm. He holds undergrad degrees in Computer Science and Political Science from Grinnell College (USA), and went to graduate school in Economics at The Graduate Institute of International Studies (Switzerland). Prior to Tantalum, he worked in the statistics/analytics teams at the United Nations, Yahoo and Truecaller. Outside work, he enjoys reading about psychology/tech/politics, watching cricket, cooking, playing tennis and running.

  • Ticket Raffle - Big Data Technology Warsaw Summit 2019

    Dear DataKRKers, As you already know, we are helping our friends from GetInData with Big Data Technology Warsaw Summit 2019 - the go-to Big Data event in Poland and Central Europe. We are doing a ticket raffle in which one free ticket will be handed to a lucky winner. Rules are as simple as it gets - just join this event and on Thursday we'll make a draw. For the rest of us there's still the promo code available until February 17th. Please find attached the message from the organisers: -- Nie musisz jechać do Londynu, Amsterdamu czy Madrytu. Jedyna taka okazja w roku - duża międzynarodowa konferencja o technologiach BigData odbędzie się już 27 lutego w Warszawie: Big Data Technology Warsaw Summit 2019! Większość prelegentów przybędzie z zagranicy, zaś wszyscy to wybitni fachowcy – inżynierowie, architekci, deweloperzy. Podzielą się doświadczeniem zdobytym przy wyjątkowo ciekawych projektach – w takich firmach jak Netflix, Slack, Roche, Booking.com, Twitter, ING, Philip Morris International, Data Artisans, Oath, Klarna, AWS, Google, Zalando, Adform, Tink, XCaliber, Trivadis i in. https://bigdatatechwarsaw.eu/speakers Konferencja jest w możliwie dużym stopniu praktyczna i technologiczna. BigData Technology Warsaw Summit to ponad 500 uczestników - profesjonalistów od dużych danych. To miejsce, w którym trzeba się pojawić! Jak się pospieszysz możesz wziąć udział w dodatkowym dniu z warsztatami do wyboru (mamy ostatnie wolne miejsca) na warsztaty:  Hadoop Ecosystem Basics  From small data in Python to big data model in Apache Spark  Big Data on Kubernetes  Big Data on Google Cloud Dołącz do uczestników konferencji i skorzystaj z promocji przy rejestracji do 17 lutego 2019! Z kodem promocyjnym DataKRK zyskasz dodatkowe 10% rabatu! https://bigdatatechwarsaw.eu/registration/ Zapraszamy!

  • Cutting edge improv. in sentiment analysis and Challenges of Productizing NLP

    Title: Cutting edge improvements in sentiment analysis and The Challenges of Productizing Natural Language Processing When: Wed, 30 Jan 2018. 6PM - 9PM Where: Qualtrics Office, Pawia 9, HighFive Two, IV-th floor ( please wait on the reception downstairs for Qualtrics employee) Agenda: 6:00 PM - 6:30 PM - Networking 6:30 PM - 7:15 PM - Cutting edge improvements in sentiment analysis 7:15 PM - 7:30 PM - Short break 7:30 PM - 8:15 PM - The Challenges of Productizing Natural Language Processing 8:15 PM - 9:00 PM - Q&A, Networking * Snacks and drinks will be served through the whole event! Speaker: Jamie Morningstar, Qualtrics Abstract: Natural Language Processing is hard work - and making a product that’s reliable, multi-purpose, accurate, and intelligible enough to be used by tens of thousands of self-service clients across millions of datasets? That’s the daunting challenge that the Qualtrics Text iQ team took on three years ago. Come learn from our mistakes and successes in productizing a commercial text analysis tool. We’ll talk about some lucky breaks, good assumptions, and dramatic missteps we’ve made along the way. We’ll discuss product definition, build/buy analysis, data platform constraints, tuning challenges, algorithm choices, and more. Bio: Jamie is the engineering manager over all NLP and ML teams at Qualtrics. She is based in Utah, USA and has enjoyed 15 years of experience in software development, engineering management, and product management across several different cloud-based products. Jamie founded Qualtrics Text iQ three years ago and has grown the product to serve over 35k users analyzing millions of documents monthly. Speaker: Felix Fang, Qualtrics Abstract: Sentiment analysis is a common NLP task in text analytics. The results are useful for many downstream tasks such as customer review analysis and stock price predictions. In this talk we will discuss how we improve sentiment analysis at Qualtrics using state-of-the-art techniques such as transfer learning with pretrained deep language model and question-answer pair training on user feedback data. Bio: Felix is an machine learning engineer and team lead of the ML/AI lab at Qualtrics. He works at the Seattle office and primarily focuses on building and productionizing various ML models for NLP tasks, such as sentiment analysis, aspect extraction, text summarization. Felix is an ML enthusiast and organizes events such as ML reading groups to review state-of-the-art work in the academia and industry.

  • Tensorflow-accelerated Genetic Algorithms & StreamSets

    Aon Office (Dragon 1 room)

    Speaker: Andrew Morgan, Aon Abstract: Andrew will introduce ideas around Tensorflow accelerated Genetic Algorithms, reviewing an introductory use case for classification and regression using Karoo_GP, a tool for GPU accelerated Evolutionary Algorithms in python. He’ll explain the problem, the ideas and methods investigated, the theory behind the tools, and give a working demonstration. An informal review of the comparative results against other popular algorithms is included, as well as an explanation of how easy it is to deploy the Karoo_GP trained models to Apache Spark and other SQL enabled technologies. Bio: Andrew is an experienced big-data scientist and platform engineer, and author of Mastering Spark for Data Science. He works in London, Dublin and Krakow as Head of Data Services for Aon, and directs a large data engineering team. --------------------------------------------------------------------------------------------- Speaker: Cristian Varela, Aon Abstract: Your boss needs a new system with hundreds of high-performing data pipelines processing real-time data from all sort of heterogeneous sources. And you’ve guessed right: You’re charged with developing it from scratch, with very little budget, and a small team of data engineers. If that wasn’t enough you also have a very tight deadline. Impossible I hear? Fret not! StreamSets have Open Sourced their Data Collector which enables you to develop and continuously run streaming pipelines in minutes (forget about scheduling nightmares) as well as monitoring their throughput and performance in a single integrated interface. In this talk we’ll introduce SteamSets’ Open Source Data Collectors and develop a pipeline to consume and analyse real-time streaming data while getting familiarized with the product features and capabilities and discuss common patterns and some tips & tricks. We’ll also introduce the concept of a MicroService pipeline and create a reusable data service. Bio: Cristian is a Snr. Data Architect in the Aon Centre for Innovation and Analytics in Dublin with over 20 years’ under his belt working in technical roles and data related initiatives. His Experience spans many industries including Pharmaceutical, Government, Gambling / Gaming and more recently Insurance & FinTech where he has spent the last 3 years designing Data Intensive Applications and assisting with the implementation a Hadoop based Enterprise Wide Analytics Platform. He has a personal interest in everything Python and Home Automation systems. In his own time he enjoys getting the soldering iron out and play with circuits and Arduinos when his not training for a triathlon.

  • [Krakow Apache Kafka] Stream processing and build streaming data pipelines

    Kraków Apache Kafka Meetup by Confluent is organising an event supported by DataKRK (main page: https://www.meetup.com/pl-PL/Krakow-Kafka/events/252830549/ ) As it is Big Data related we are more than happy to share it on DataKRK too. Following all the information provided by the organisers: -- Join us for an Apache Kafka meetup on July 25th from 6pm, hosted by VirtusLab at The Stage in Krakow. The address, agenda and speaker information can be found below. See you there! Agenda: 6:00pm: Doors open 6:00pm - 6:30pm: Drinks and Networking 6:30pm - 7:15pm: Robin Moffatt, Confluent 7:15pm - 7:45pm - Additional Q&A & Networking Speaker: Robin Moffatt, Confluent Abstract: Have you ever thought that you needed to be a programmer to do stream processing and build streaming data pipelines? Think again! Apache Kafka is a distributed, scalable, and fault-tolerant streaming platform, providing low-latency pub-sub messaging coupled with native storage and stream processing capabilities. Integrating Kafka with RDBMS, NoSQL, and object stores is simple with Kafka Connect, which is part of Apache Kafka. KSQL is the open-source SQL streaming engine for Apache Kafka, and makes it possible to build stream processing applications at scale, written using a familiar SQL interface. In this talk we’ll explain the architectural reasoning for Apache Kafka and the benefits of real-time integration, and we’ll build a streaming data pipeline using nothing but our bare hands, Kafka Connect, and KSQL. Gasp as we filter events in real time! Be amazed at how we can enrich streams of data with data from RDBMS! Be astonished at the power of streaming aggregates for anomaly detection! Bio: Robin is a Developer Advocate at Confluent, the company founded by the creators of Apache Kafka, as well as an Oracle ACE Director and Developer Champion. His career has always involved data, from the old worlds of COBOL and DB2, through the worlds of Oracle and Hadoop, and into the current world with Kafka. His particular interests are analytics, systems architecture, performance testing and optimization. He blogs at http://cnfl.io/rmoff and http://rmoff.net/ (and previously http://ritt.md/rmoff ) and can be found tweeting grumpy geek thoughts as @rmoff. Outside of work he enjoys drinking good beer and eating fried breakfasts, although generally not at the same time. -------- Don't forget to join our Community Slack Team (https://slackpass.io/confluentcommunity)! If you would like to speak or host our next event please let us know! [masked] NOTE: We are unable to cater for any attendees under the age of 18. Please do not sign up for this event if you are under 18.

  • Data Science Tools of Trade

    Pauza In Garden

    1) Data Science at PMI - The Tools of The Trade Abstract: Data Science is not a one man show. It is a team effort that requires every team member to master the tools of the trade. This is extremely important for effectively putting data science to work in a global organization. In this talk we would like to share with you the best practices to start, develop and ship data science products developed inside PMI - the best practices and tools, currently in use by 30+ data scientists across four different location, where data science labs of PMI were established in 2017. If you're interested in how Python, Jupyter notebooks, Docker, AWS, Hadoop ecosystem, Artifactory, Jenkins, Atlassian suite, etc. are setup to support our collaborative work, devoted to building predictive models, this talk is for you. About the presenter: Maciej is a Best Practices Ambassador and Data Scientist in Philip Morris International, passionate about Machine Learning and Big Data. In his free time he is a motorcyclist, a pilot and a sailer. This time he would like to present you The Tools of The Trade in Philip Morris International. 2) You! Just ping us with the subject of a talk or lighting talk and we'll be more than happy to help See you there!

  • Data.Sphere Krakow 2018 - First Edition

    Opera Krakowska

    Join us at the FIRST DATA.SPHERE conference in KRAKÓW ! 15-17 April, 2018; Opera Krakowska (workshop day + 2x speaker days) http://data.sphere.it DataSphere is the first edition of Krakow conference devoted to applications of Data and the technologies, concepts, and trends that offer meaningful insights into data, whether large or small. From technical details to concrete business use cases, no fluff. NOTE: tickets needed (see below) ------------------------- Where and When 15-17 April 2018, Krakow, Opera Krakowska ------------------------- Why to attend - get in touch with people behind data application use cases - get up to date with latest data tooling zoo - get touched by what AI can do for you ------------------------- Whom you’ll be able to meet Diverse range of speakers coming from: - well established tech companies (eg IBM, Redhat, GE, ...), - research bodies (eg CERN, ...), - startup/r&d scene (eg 2040.io, 9livesdata, AltoCloud, AirHelp, Craftinity, ...) - and academia ------------------------- For 20% special datakrk discount use the code: [masked]20 https://www.eventbrite.co.uk/e/datasphere-2018-tickets-40190269177 See you there!

  • Data.Sphere Workshop: Sales forecasting with Keras and Tensorflow

    (NOTE: You must register via the eventbrite link below) BACKGROUND: For those of you who didn't have a chance to attend the workshop at one of the previous dataKrk events I will be hosting it as part of Data.Sphere conference workshop day in our offices at Podwale 3. CONTENT: This beginner-level workshop will guide you through implementing a sales forecasting neural network model based on an openly available dataset. We will go through some basic neural network theory and apply a systematic model improvement approach to get from a simple model to a more complex one, trying out various tricks along the way. NOTE: You will need to be registered for a conference (https://data.sphere.it) to attend it, but don't worry - we have some good discounts for dataKrk members - just let us know before registering. REGISTRATION: https://salesforecastingvl.eventbrite.co.uk/

  • DataSphere ticket raffle!

    Needs a location

    Hey DataKRKers, We have been given 3 free tickets by DataSphere conference organizers and we're making a raffle to give them out. The conference happens in just 3 weeks, Apr 15-17th, Kraków and it's a great opportunity for anyone operating on data to listen to some great talks in the field. The raffle rules are really simple: 1) To show that you are interested, just join this meeting 2) We are going to draw the lucky three participants on noon, Thursday, Mar 29th 3) That's it! The promo code for the rest of us is still available, just use `sphereit-datakrk` to join DataSphere right now. Thank you and see you there!