Finally, here it is! It took a while to arrange but we're very pleased to announce that the first RAPIDS London meetup will take place on the 22nd of November, hosted by the kind people over at SAGE Publishing.
We're still working on speakers but for now please find the schedule for the evening below.
18:30 - Arrive, drinks, pizza
19:00 - Steph Locke from Locke Data
19:25 - Samir Talwar from prodo.ai
19:50 - 10 minute break
20:00 - Mark Coleman from dotscience
20:25 - Final drink, networking
21:00 - Home time
==Steph Locke, Locke Data==
Steph's talk answers the question, "How do we get out R code into production more quickly?"
There’s no point being a data scientist if your work never makes it to production. This session explores a solution for getting your code live, scalable, and easily managed.
Covering models, dashboards, and other products built in R, you’ll look at how Docker containers can make managing dependencies a breeze, allow your code to be hosted anywhere, and have it work in high-scale systems.
Steph is the founder of a consultancy in the UK. Her talks, blog posts, conferences, and business all have one thing in common – they help people get started with data science. Steph holds the Microsoft MVP award for her community contributions. In her spare time, Steph plays board games with her husband and takes copious pictures of her doggos.
==Samir Talwar, prodo.ai==
We care about our data pipelines, so we built one
At Prodo.AI, we process a lot of data in order to train our ML models. GPUs are expensive, so we’d rather not do it more than we have to.
Plz ( https://github.com/prodo-ai/plz) is our solution. It manages running experiments so we don’t have to. Given some data, it’ll spin up an AWS EC2 machine, run some code, capture the output and shut everything down, saving money on AWS. Minimal cost, minimal fuss.
Our next steps are to manage the entire pipeline, funneling data between experiments and data preparation scripts in order to completely automate the flow of data. We should always have the latest versions of our models, know how good they are, never use the wrong version by accident, and never run the same job twice. In order to achieve this boost in productivity, we always need to be able to reproduce a given state of our data and our code.
Come find out how it works, the considerations we had to make to accomplish this, and what you can do to get the same benefits.
Samir Talwar is an expert in all programming disciplines, including, but not limited to:
dysfunctional, objectifying, argumentative, illogical, headache-oriented architecture, fragile software, and bullet points.
He has received several awards in the field of software design and development, and has been praised endlessly by critics. One 22-year old colleague stated he was "hands-down the most bearded developer he'd ever seen". Clients refer to him as "Oh, it's you again". When his superiors were asked to comment, they showered him with compliments such as "Who the hell is Samir?"
==Mark Coleman, dotscience==
Because it is more complex and has far more moving parts, Data Science & AI is where Software Development was in 1999: people are emailing and Slacking notebooks to each other, due to a lack of appropriate tooling. There are few CI/CD pipelines and model health monitoring is scarce. A lot that could be automated is still manual. And teams are siloed. This causes problems both for productivity: it's hard to collaborate, and reproducibility: which impacts on governance and compliance.
Mark presents a proposal for an architecture and a set of open source tools to solve both the collaboration and the governance problem in Data Science & AI.
Mark is VP marketing at dotscience and the marketing chairperson for the Cloud Native Computing Foundation.