This first meetup will deal with the end to end vision of the pipeline. The three speakers will describe stack particularity, challenges they faced and technologies they used to make their product usable in the "real world".
Subject tackled will be versioning, automation of production and testing, and multi-platform technologies.
Speakers will be:
Laura Calem (Heuritech)
Title: "How to deploy a new model in production without overhead"
Abstract: Putting a new model in production can sometimes be a really time consuming task that nobody wants to do. And most likely this is a process you will need to repeat a lot. In a fast-paced industry where new tech and new data is coming in a constant flow, being able to put a new model in production fast is critical. With this talk you will get a glimpse of how this "dev to prod" transition is done in a real world environment, and which design choices can be made so that the overhead of deployment is reduced to a minium.
Renaud Allioux (Earthcube)
Title: Traceability, reproducibility and scalability with hundreds of AI services
Abstract: At Earthcube our job is to monitor sensitive site around the world using AI and satellite imagery. In order to reach the best performances, each site as its unique combination of pre/postprocessing, segmenter, detector or classifier trained on specific dataset and hyper parameters combination. Result is an exponential inflation of number of services as we increase number of site monitored.
In this talk we will describe how we designed a simple but efficient micro-service architecture, scalable to 1000s of sites, while ensuring complete traceability and reproducibility thanks to extensive use of docker, versioning and cloud processing.
Vincent Delaitre (Deepomatic)
Title: Running deep learning models onto heterogeneous hardware
Abstract: Running different frameworks like Caffe or Tensorflow onto various heterogenous hardware ranging from FPGAs to GPUs can quickly become a nightmare. Meet ONNX and runtimes. ONNX is a model exchange format supported by most deep learning frameworks and understood by hardware vendors runtimes designed for inference speed-up onto production hardware, like TensorRT (NVIDIA) or OpenVino (Intel). After presenting a typical framework-agnostic network deployment architecture, we will describe how runtimes work and finish with a glimpse of the possibilities that the related library "Facebook tensor comprehension" will offer.
Of course food and drinks will be provided!