Big Data & Real Time Analytics bei Idealo.de
Willkommen zur nächsten Runde mit zwei Punkten:
• Talk 1: Kai Wähner, "How to Apply Big Data Analytics and Machine Learning to Real Time Processing"
• Talk 2: Nico Ring (HPI), Lawrence Benson (HPI), Martin Gerlach (idealo) "Moving from DIY clusters to Spark for Data Processing"
Wir freuen uns sehr, diesmal idealo.de als Host zu haben!
Euer Big Data Beers Team
Abstract Talk II:
The idealo price comparison platform processes large amounts of e-Commerce data provided by registered online shops. Over the past 15 years, the data volume has been constantly increasing, and so has the need for higher processing speed.
In order to consolidate the current heterogeneous architecture and improve speed and scalability, idealo and the Hasso Plattner Institute are evaluating a new approach using the stateful streaming capabilities of Spark in a joint project. Tasks that have to be performed on the data include normalization, deduplication, product matching and classification.
Abstract Talk I:
"Big Data" has gained a lot of momentum recently. Vast amounts of operational data are collected and stored in Hadoop and other platforms on which historical analysis will be conducted. Business Intelligence tools and distributed statistical computing are used to find new patterns in this data and gain new insights and knowledge, that can then be leveraged for promotions, up- and cross-sell campaigns, improved customer experience or fraud detection.
One of the key challenges in such environments is to quickly turn these new found insights and patterns into action while processing operational business data in real time. This is necessary to ensure we are making customers happy, increase revenue, optimize margin or prevent fraud when it matters most. "Fast Data" provides a stream processing approach to automate decisions and initiate actions in real-time that are based on the statistical insights as obtained from Big Data platforms.
This session uses real world use cases and success stories to explain the concepts behind stream processing and its relation to Hadoop, Spark, and other big data platforms. The session discusses a flexible solution architecture that combines the speed of fast data decisioning with the intelligence obtained from big data analysis. We will zoom in on different implementation patterns, best practices and pitfalls for implementing a closed loop system from big data capture and storage, historical analysis to find insights, capture these insights into statistical and mathematical models and algorithms, and deploy these models to a real-time processing systems to turn these insights into action.
A live demonstration illustrates how a developer can leverage different technologies, frameworks and products to implement such closed loop approach including big data analytics, machine learning, stream processing, model fitness tracking and human oversight. The audience will learn how to choose the right tool for the right job and how to combine them. The live demonstration is built on technologies and frameworks such as Apache Hadoop (HDFS, Hive, HBase, Flume, Zookeeper), Apache Spark (MLlib, SparkSQL, SparkR), Stream Processing (Apache Storm, TIBCO StreamBase), and statistical platforms such as R language based TERR, PMML, H2O’s Sparkling Water and Spark’s MLlib.
Kai Wähner works as Technical Lead at TIBCO. Kai’s main area of expertise lies within the fields of Integration, Big Data, Analytics, SOA, Microservices, BPM, Cloud Computing, Java EE and Enterprise Architecture Management. He is speaker at international IT conferences such as JavaOne, ApacheCon or OOP, writes articles for professional journals, and shares his experiences with new technologies on his blog (www.kai-waehner.de/blog). Contact: [masked] or Twitter: @KaiWaehner. Find more details and references (presentations, articles, blog posts) on his website: www.kai-waehner.de