Floor 34, Not floor 26.
This is an introduction to reinforcement learning and it is meant to be the starting point for a reader who already has basic machine learning background and is confident with a little bit of probability and statistics.
Therefore, no background in AI or TensorFlow is expected for the first part.
However you have to know probability and statistics. In particular, some exposure to Bayesian Inference would be advantageous.
Session # 1 Multi Armed Bandits using Thompson Sampling , Shlomo Kashani, Chief Data Scientist.
Shlomo will Introduce us to multi-armed bandit (MAB) problems and present a Bayesian approach to solving them. We will discuss how to efficiently determine the best choice between options that have a fixed but unknown success rate.
• Intro to MAB
• The Binomial Distribution
• The Beta Distribution
• An intro to Bayesian statistics (prior, evidence, posterior, posterior predictive distribution)
• The Beta Binomial Distribution
• Thompson Sampling
• Bandits with Bernoulli rewards
• Python Demo
Session # 2 RL and The Kullback–Leibler Divergence from the Eyes of a Physicist , Natan Katz, Senior ML Architect, NICE.
Following a very successful session in the previous meeting on Variational Inference, Natan will provide an introduction and proof of KL Divergence while explaining the theoretical roots from Physics and how they relate to Machine learning.
Session # 3 RL Theory for Data Scientists, Nir Ben Zvi.
This talk explains what Markov processes are, continues to Markov decision processes, the bellman equations and explains Q learning. There will also be a general discussion about RL techniques. Gradient based RL, namely the vanilla Policy Gradients method will also be explained in some detail.
An example from the Recurrent Models of Visual Attention paper will also be presented.
Nir is a research scientist at Amazon doing computer vision. Before that he was an MSc student at the Hebrew university of Jerusalem, focusing on style transfer using classical methods.
Google Analytics discussion of their implementation of Bayesian Multi-Armed Bandits:
Analysis of Thompson Sampling for the Multi-armed Bandit Problem, from Microsoft Research India. Establishes logarithmic cumulative regret: