pandas benchmarks workshop and sprint


Details
Join us for a morning of learning and coding together. Marc Garcia, pandas core developer will explain how pandas makes sure changes to the library do not make any part of it slower. We will understand how to measure performance, how to monitor, and understand the computer hardware and operating system in certain detail to be able to execute benchmarks with minimum noise. After the workshop and theory, we will spend time sprinting and coding together to improve pandas benchmarks and the benchmarking system.
pandas has a benchmark suite [1], and a dashboard where to see how long many functions take on a regular basis [2]. There are many things that can be improved in the pandas benchmarks status quo. Information on things to improve is available in the #55007 issue [3].
Among the tasks we can work on this workshop/sprint are:
- Experiment on what can make a Linux system more stable and deterministic to run benchmarks
- Analyze the existing benchmarks and detect regressions, identify which benchmarks have the highest variance...
- Fine tune asv [4] to use better parameters to run the benchmarks
- Any other idea welcome
Please bring your laptop to the session if you can, to try to get the most out of it. We have limited space in the venue, please only RSVP if you are planning to attend, and change it as soon as possible if your plans change. We want our events to be as diverse as possible, if you are in the waiting list, but you belong to an underrepresented group in IT, please contact the group admins to be prioritized to the event.

pandas benchmarks workshop and sprint