Skip to content

Large scale weak supervision & machine translation (tentative)

Photo of Przemek Maciolek
Hosted By
Przemek M. and Natalia R.
Large scale weak supervision & machine translation (tentative)

Details

Schedule:

18:00 Networking
18:30 Large Scale Weak Supervision with Snorkel and Apache Beam by Suneel Marthi
19:30 Break
19:45 Machine Translation (tentative)

Note: "Scalable recommendations in a hybrid environment by Mikolaj" needs to by postponed last minute due to the sickness. We will probably have another talk, on machine translation, but this will be confirmed just before the event.

  1. Large Scale Weak Supervision with Snorkel and Apache Beam

The advent of Deep Learning models has led to a massive growth of real-world machine learning. The models models rely on massive hand-labeled training datasets which is a bottleneck in developing and modifying machine learning models.

Most large scale Machine Learning systems today like Google’s DryBell use some form of Weak Supervision to construct lower quality, large scale training datasets that can be used to continuously retrain and deploy models in a real-world scenario.

The challenge with continuous retraining is that one needs to maintain prior state (e.g., the learning functions in case of Weak Supervision or a pre-trained model like BERT or Word2Vec for Transfer Learning) that is shared across multiple streams, while continuously updating the model. Apache Beam’s Stateful Stream processing capabilities are a perfect match here including support for scalable Weak Supervision.

The audience would come away with a better understanding of how Weak Supervision with Apache Beam’s stateful stream processing can be used to accelerate the labeling of training data, and real-time training and update of machine learning models.

Bio:

Suneel is a Member of Apache Software Foundation and is a Committer and PMC on Apache Mahout, Apache OpenNLP, Apache Stream. He presently works as a Principal Technologist – AI/ML at Amazon Web Services. He’s previously presented at Flink Forward, Hadoop Summit Europe, Berlin Buzzwords, Machine Learning Conference and Apache Big Data in the past. He’s based out of Dulles, Virginia in the Washington DC Metro area.

  1. Machine translation (tentative)
Photo of DataKRK group
DataKRK
See more events
Community Hub Kraków
Podwale 3 · Kraków