Orange in the Cloud - the Backend Story [Miha Stajdohar, Genialis]

Abstract of the talk:
Orange is a machine learning toolkit developed by researchers at FRI. Genialis was founded 5 years ago with a simple mission, develop and sell an online version of orange. Genialis pivoted from a general data analytics company to a more focused mission—helping life scientists untangle the mysteries of genetic code. But the core of our web platform stayed the same. We have built an interactive workflow engine called Resolwe that is agnostic of the science it facilitates.

We'll kick off the talk and dive deep into the workflow engine Resolwe (spoiler alert: pip install resolwe—it's open source) and its workflow syntax. How to define the workflow steps with the corresponding inputs, outputs, and the algorithm that transforms inputs into outputs. We will then review the architecture details. Resolwe is an app for the Django web framework, so it's pretty straight forward to extend your existing Django projects with workflow capabilities. On the server it requires PostgreSQL, Redis, and a shared file system that is accessible from the server and worker nodes. We can plug-in different workload managers like Celery, Slurm, or AWSBatch to run workflow steps as batch computing jobs. Resolwe can be extended with object-level permissions, Elasticsearch for fast data retrieval, monitoring and backup solutions, and other services required for a production deploy. You will hear about the quirks and unexpected issues we had to overcome—i.e. what all can go wrong when developing a complex system (that requires a wide range of expertise) from scratch. I will conclude with the lessons learned from building a tech startup in Slovenia.

About the speaker:
Miha is the architect of Genialis’ software. He manages the engineering department with dedication to bridge the gap between life scientists and engineers. He worked for numerous IT companies before specializing in data science. His doctoral research stands at the crossroads of machine learning, network analysis and interactive visualizations. A data scientist by training but an engineer at heart, Miha is passionate about building production-ready software for the masses.