Milan Open Source Data Infrastructure Meetup - June 2023


Dettagli
[ITA]
Unisciti a noi il 21 giugno presso il Talent Garden Calabiana per una serata ricca di interessanti talk GitOps & Apache Kafka e come scalare pipeline di Machine Learning.
La serata prevede una serie di talk seguiti da cibo e socializzazione.
Programma:
6PM - Apertura delle porte & drinks
6.30PM - Scaling image analysis to infinity and beyond da Matteo Madeddu, Platform Engineer e AWS Community Builder
7.00PM - Apache Kafka and GitOps: the best of the two worlds da Natale Vinto, Developer Advocate Lead a Red Hat
Le sessioni saranno in lingua inglese.
[ENG]
Join us on 21st June 2023 at Talent Garden Calabiana for an evening full of interesting talks around GitOps & Apache Kafka and scaling Machine Learning pipelines.
The evening will feature a series of talks followed by food* and socialising later in the evening.
*Note: This is an alcohol-free event.
Program:
6-6:30PM - Open doors & drinks
6.30PM - Scaling image analysis to infinity and beyond by Matteo Madeddu, Platform Engineer and AWS Community Builder
Most Machine Learning solutions are composed by different datasets, technologies and processes glued together. Taking them to production doesn't only mean feeding with real data, but also forecasting load, evaluating performances, and creating a supportive and robust pipeline capable of providing a consistent level of services.
In this talk, we are sharing the details of the experience in building an infrastructure to scale image analysis composed by static analysis and inference performed with pre-trained neural models. We'll understand how the architecture satisfies two main constraints: an inherent limitation of the internal system that provides the assets, and the need to have near-to-zero costs when there are no jobs to run. Then, we'll address how the infrastructure's flexibility is able to speed up the analysis process up to business requirements, depending on the efficiency of the machines used for the runs.
We'll see in action Tensorflow models, opencv, Apache Spark for the metadata processing and an Hadoop-based EMR cluster; all glued together by WS ECS services (in EC2 and Fargate modes), SQS, S3, and Docker. If you're in the process of productionizing machine learning models, this session will showcase a success story and what's behind it.
7.00PM - Apache Kafka and GitOps: the best of the two worlds by Natale Vinto, Developer Advocate Lead at Red Hat
In the world of big data, Apache Kafka has become the go-to platform for building real-time data pipelines and streaming applications. However, managing a Kafka cluster in a multi-tenant environment can be complex, especially when different users require access to specific topics and resources.
That's where Helm, Strimzi, and ArgoCD come in. These three powerful tools can be combined to create and manage a multi-tenant Kafka cluster with a GitOps approach, enabling you to manage Kafka resources through declarative configuration files stored in a Git repository.
7.30-9PM - Food & Socialising
Note: Recording equipment will be present.


Milan Open Source Data Infrastructure Meetup - June 2023