Skip to content

PyData Trojmiasto #11

Photo of Agata Skamruk
Hosted By
Agata S. and P. S.
PyData Trojmiasto #11

Details

We are pleased to invite everyone interested in ML, AI and Data Science to join us on our 11th PyData Trojmiasto meetup which will take place on 11th of December.

We are meeting at Gdański Park Naukowo Technologiczny, located at "Trzy Lipy 3" street, building C, room AB (first floor). Please look at the map below:

https://ibb.co/sFsvfnq

Partners:
Gdanski Park Nauko Technologiczny - venue provider.
Fundacja CODE:ME - legals and support

Agenda:

18:00 - 18:05 - doors open
18:10 - 19:00 - Talk #1
19:00 - 19:10 --- break
19:10 - 19:45 - Talk #2

  1. Introduction: a few words about PyData Trojmiasto.

  2. Talk #1
    Maciej Karpicz - "Kontrola wersji w pracy z danymi"

  3. Talk #2
    Tomasz Tylec - "Successful story of Haskell in Data Science"

Abstract:
The best tool to do the job is not always the most popular one. In this talk I will present the case where Haskell was used to rewrite one of the core elements in ours company ML toolkit. It was undoubtedly one of the best business decisions we made. I will focus on how distinctive features of Haskell met specific requirements we had and how we integrated solution with ours Python ecosystem.

Description:
When existing libraries does not provide enough functionality a difficult decision need to be made: what kind of technology we want to utilise to build our own solution? A lot of factors need to be taken into account and the price of miscalculation can be high, so in many business cases conservative approach is adopted. Here, I will present a particular use case where we made a bold decision to use a non-standard tool, namely a pure, strongly typed functional language, often (wrongly) considered as a purely academic one. My aim is not to advocate use of Haskell but rather to show what kind of questions we asked, what requirements we defined and how we evaluated the right tool to do the job. I hope that presented example can be helpful for those who make analogous decisions, despite the tools involved.

About half of Zettafox's projects involve construction of classification rules. For their construction we use our own proprietary solution. Previous generation tools were written in Python and C. When I arrived at the project, the code was almost unmaintainable, even for its author. Buggy behaviour and limited features forced us to make a decision that the complete rewrite is the only sensible way to go.

The talk will start with a brief theoretical introduction to classification rules. I will focus on the mathematical structure of the problem, as it is critical to understand why Haskell was so efficient in building solution.

Next, I will introduce some of the Haskell's distinctive features that are crucial for this talk, namely: lazy evaluation, strong type system and parametric polymorphism. I will keep discussion relatively high-level with simplified illustrative examples so that no prior knowledge of Haskell's syntax is required. Examples will slowly evolve into illustrations how those features played the key role in efficient implementation of rule search engine.
I will also show how we built a custom Jupyter notebook kernel as a convenient UI.

In the concluding part I will present how the rewritten Haskell implementation compared with the previous generation Python and C based implementations and discuss difficulties that we met.

-----------------------------------------------------
This meetup will be held in Polish only.

If you're interested in delivering presentation on one of our next meetings or you have any other questions, please contact us by email:

pydata.trojmiasto@gmail.com

We are community focused mostly on Python, R and Julia in Data Science application, but we are open to other technologies too.

Join us also on Facebook:
https://www.facebook.com/PyDataTrojmiasto/

Thank you and see you there!

Photo of PyData Trojmiasto group
PyData Trojmiasto
See more events