What we’re about
Are you an avid data scientist always keen on trying new things? Do you use
data analysis in your daily work and you want to expand your toolkit? Are you looking for a more guided hands-on introduction instead of learning through a random forest of online tutorials?
MiraiLabs might be the answer for you!
At Mirai Solutions we truly believe that open source contributions and knowledge sharing are important tasks that data scientists need to take on. MiraiLabs is one small contribution towards a better future (== japanese Mirai).
Since we started, we have worked in many projects, given training and built an extensive data science expertise. None of it would have been possible without open source and online sharing. Having turned 10 years old, we think it’s the right moment to give something back.
MiraiLabs is a series of data science workshops aimed at professionals.
It will benefit people who work with data or models on a daily basis and would like to expand or strengthen their skill set.
In each workshop, an experienced practitioner will cover in some detail a topic relevant for any data scientist’s all-purpose toolkit. Materials will be made available and shared online.
In order to make the workshops effective and foster the interaction among participants, the events will have a limited number of attendees.
Upcoming events (1)
See all- Scale Your Analytics: Leveraging Apache Spark in Python and RImpact Hub Zurich - Viadukt Bogen D, Zürich
Audience
Are you a data science practitioner who primarily uses Python and/or R? Have you found yourself in situations where your data grew too big and your code failed with an out-of-memory error, or your data processing pipeline brought your machine to its limit? You attempted to scale up but eventually faced the same problems or ran into other ones? If so, this workshop might be for you. We'll talk about scaling your analytics and specifically about how to leverage Apache Spark to scale out your analytics beyond a single machine. We’ll start with an overview of scaling options and the fundamentals of Apache Spark. After that, we’ll explore a simple data processing pipeline in Spark and will see how it compares to equivalent implementations in Python and R.
The workshop will focus on Spark's data frame API and primarily provide examples using Python/pyspark, but the concepts & considerations conveyed are equally applicable to R/SparkR/sparklyr. The workshop will not go into the specifics of Spark structured streaming, Spark's machine learning library (MLlib) and graph processing (GraphX).Duration
Presentation: ~ 2.5hSchedule
4:45 pm - Doors open
5:15 pm - Welcome / Start of workshop
7:45 pm - End of workshop / closing remarks
7:45 - 9:00 pm - Apéro at the barPrerequisites
Basic knowledge of Python and/or R is highly recommended. No prior knowledge of Apache Spark and the corresponding language APIs is needed.
Workshop participants are not required to bring their laptops.Registration
To secure your spot at the workshop, please register on Eventbrite.