PyMC at the Bayesian Mixer


The PyMC team is meeting in London and we use this opportunity to welcome Thomas Wiecki and Colin Carroll for two talks and a brief introduction by Chris Fonnesbeck.

Statistics and Machine Learning: Don’t Mind the Gap

Thomas Wiecki, VP of Data Science at Quantopian and core PyMC contributor

Abstract: What are the differences of machine learning and statistics? Answers to this question vary as widely as the array of tools employed by the two disciplines. Although both share more similarities than differences, their cultures and roots as well as the language they use are quite different. More recently, however, we can see a healthy cross-pollination where each field starts to adapt ideas of the other. In this talk, we will look at the ideas the two disciplines have developed, identify those that have already crossed the chasm, and those that are still grounded in one of the two fields. Some examples of such concepts are informative priors, neural networks, uncertainty, regularization, and hierarchical models. We will see how we can combine these various ideas to give us a rich toolbox to solve wide-ranging data science problems. For this recombination to be possible, we need a highly versatile and powerful framework. As I will show, probabilistic programming using PyMC3 allows us to perform both, machine learning and statistics, and blend freely between them to take the best ideas for the current problem that's being solved. Specific examples include hierarchical Bayesian neural networks with informed priors to achieve higher accuracy, and uncertainty around predictions to make better decisions.

Tidy and beautiful: Visualizing Bayesian models with xarray and ArviZ

Colin Carroll, data scientist at Freebird Inc and core PyMC contributor

ArviZ is a new library for visualization and criticism of Bayesian models. We will show how ArviZ uses xarray to provide an intuitive way to store and query these high dimensional objects. This will be a beautiful visual tour of the Python probabilistic programming landscape - including PyMC3, PyStan, CmdStan, emcee, Pyro, and tensorflow probability - while using ArviZ to visualize parameters and diagnose problems with sampling.