Jared Ledvina, a Cloud Operations Engineer at Palantir, and has spent the last year helping grow their infrastructure from 12,000 to over 20,000 EC2 instances. In this talk, he'll discuss managing over 50 ephemeral production Kubernetes clusters, future plans to support Azure, and how Palantir engineers could have avoided multiple production outages.
If possible, please checkout the following before hand:
Introducing Rubix: Kubernetes at Palantir - https://medium.com/palantir/introducing-rubix-kubernetes-at-palantir-ab0ce16ea42e
Spark scheduling in Kubernetes - https://medium.com/palantir/spark-scheduling-in-kubernetes-4976333235f3
If there are any topic's or question's you'd like answered please reach out to @jaredl on the MadeInA2 Slack! (#orchestructure channel!)
You can also catch him on Twitter: https://twitter.com/geekatcomputers and Github: https://github.com/jaredledvina