Past Meetup

DataKRK #26: Online incremental learning on streams

This Meetup is past

90 people went

Location image of event venue


IMPORTANT: we kindly ask you to register in Eventbrite (ABB policy - registration is mandatory)
There are still some places in Eventbrite, so even if you are on waiting list in Meetup you can still come if you register in Eventbrite.

• Co będziemy robić
We have a special guest this time, Christophe Salperwyck from ABB Kraków who will describe Stream Mining approach:

Statistical learning provides numerous algorithms to build predictive models on past observations. These techniques proved their ability to deal with large scale realistic problems. However, new domains generate more and more data which are only visible once and need to be processed sequentially. These volatile data, known as data streams, come from telecommunication network management, social network, ad servers, web mining... The challenge is to build new algorithms able to learn under these constraints.

First data stream context and constraints will be presented. Then the presentation will be in three parts:
1. “Concept drift”: how to deal with distribution changes in streams
2. Stream summaries: how to keep past data distribution with low CPU/memory footprint
3. Online classifiers: how to build online classifiers on data streams - naive Bayes and Decision Tree classifiers will be presented

Even though supervised learning/classification will be shortly presented, it is preferable to have some basic knowledge. The most complicated formula will be the naive Bayes one, so no worries on the mathematical part :-).

If there is enough time, I might present MOA (machine learning tool/library on data streams) and how to add new methods inside:
and also 2 projects where Flink and MOA are used together (
- train/test
- online bagging