Codebeez Hackathon: Polars for Efficient Processing Structural Data in a Machine
Details
This time we dive into Polars, a powerful DataFrame Python library with a highly efficient, multi-threaded implementation in Rust in using Apache Arrow. It has support for SQL syntax as well as native Python syntax and, while very efficient and complete on its own, plays well together with Pandas and DuckDB for nearly seamless conversion between data structures from and to the other libraries. Not only are we excited about the benchmarks that show significant performance improvements with respect to Pandas, and to a lesser extent DuckDB, but we also love the intuitive Python syntax in comparison.
That is not all, as we will also cover a basic machine learning (or, more specifically, deep learning) use case. Namely, after our exploration of the Polars library, we will put it to the test in a fun practical use case for preprocessing our data efficiently in order to train a neural network.
You will hack your way through at least the following topics:
* Reading and writing large datasets from and into Parquet files
* Using Polars for data analysis and visualisation
* Lazy loading of data and lazy evaluation of queries in Polars
* Migration from Pandas and speed comparisons between the two
* Training a neural network with preprocessing performed using Polars
Agenda:
- 9:00 the Office opens
- 09:30 the hackathon kicks off
- 12:00 Lunch
- 16:00 end of the hackathon and time for some drinks
Lunch and snacks are provided. Dinner is provided if people are interested!
The hackathon will be at the Dataworkz office.
The address is:
Tractieweg 41, Studio E
3534 AP Utrecht
If you can't find it call Sigrid: +31 6 121 030 72
