As usual, we will have two talks. This time they are both related to the Hadoop ecosystem. We will start with Christoph Wille talking about Big Data Applications running on Azure HDInsight, the Big Data PaaS offering by Microsoft that relies solely on open-source technologies. Then Christoph Körner will give an Introduction to Apache Hive, one of the most widely used SQL engines for Hadoop. Here are the detailed abstracts:
When looking at Google Cloud, AWS or Azure you will find a wide array of PaaS big data services - from storage to processing. However, what if you want to stick with tried-and-true Hadoop (and there are very good reasons for that)? Well, you could roll your own Hadoop cluster in the cloud of your choice using IaaS - or you could turn to HDInsight ( https://azure.microsoft.com/en-us/services/hdinsight/ ) which offers you a fully managed Hadoop experience based on the Hortonworks distribution. Spark, Storm, Kafka - you name it, you got it.
About the speaker:
Christoph Wille is MVP for Azure and publishes the code of his projects on https://github.com/christophwille (includes Windows as well Azure projects). He is an independent consultant, supporting companies in everything Windows and Web (he was ASP.NET MVP for ten+ years). Aside from working on his own projects, he is also involved in other OSS projects, such as ILSpy and Refactoring Essentials for Visual Studio. You can reach him via https://twitter.com/WilleChristoph/.
Distributed Machine Learning on Apache Hive
Apache Hive with Tez execution provides a fast, scalable and reliable infrastructure for processing large amounts of batch data on a distributed Hadoop cluster. Scaling traditional data mining tasks, training large machine learning models and deploying the trained models all in the same environment is not a simple task. Christoph will present Hivemall, an Apache (incubating) project to perform distributed ML on Hadoop using Hive in order to perform preprocessing, distributed training and deployment directly within Hive (and Spark).
About the speaker:
Christoph Körner is a MSc. Student in Computer Science (Visual Computing) and Big Data Tech Lead at T-Mobile Austria.
18:45 - Get together with a few drinks
19:00 - "Azure HDInsight" (Christoph Wille)
19:45 - "Distributed Machine Learning on Apache Hive" (Christoph Körner)
20:30 - "Stuff we recently found and we think is really cool" (The Organizers)
20:40 - Food, more drinks & chatting.
Looking forward to see you at the Novomatic Forum!
This event is sponsored by NOVOMATIC
NOVOMATIC is the leading provider of gaming technology and casino equipment in Europe. As such we are constantly applying cutting edge technologies to develop the most innovative solutions in the industry. Sponsoring the Modern Data Science Tools Meetup is part of our effort to contribute to the creation of a strong Data Science and Machine Learning community in and around Vienna.