Accelerate your pandas workload using FireDucks at zero manual effort

Name: Accelerate your pandas workload using FireDucks at zero manual effort
Start: 2024-08-31T10:00:00+05:30
End: 2024-08-31T11:00:00+05:30

Hosted By

karuppaiya

Accelerate your pandas workload using FireDucks at zero manual effort

詳細

In general, a Data Scientist spends significant efforts in transforming the raw data into a more digestible format before training an AI model or creating visualizations. Traditional tools such as pandas have long been the linchpin in this process, offering powerful capabilities but not without limitations. With numerous possible ways to write the same thing in pandas, often a user ends up selecting the uneconomical, inefficient ones, leading to large computational costs with the growth in data size. We introduce a couple of frequently occurring intricate performance issues in pandas, along with a compiler-accelerated high-performance library named FireDucks to auto-detect and optimize those issues without any manual effort. We will also demonstrate how FireDucks can outperform the existing high-performance pandas alternatives.

The growth of data sizes and the increase in performance cost create the demand for high-performance DataFrame libraries. However, the existing pandas alternatives often compel a user to learn completely new APIs (incurring migration cost), whereas some of the others demand a more efficient computational system (incurring hardware cost). To address the same, we at NEC R&D Lab Japan, have developed FireDucks, a solution that’s been crafted for the contemporary data professional who loves flexible user APIs in pandas and wants to enhance the performance of their application without any extra hardware cost when dealing with voluminous and complex data on a regular basis. It is released on pypi.org under the 3-Clause BSD License and can be simply installed using pip.

Level of Audience
Beginner to Intermediate: This session is helpful for anyone with basic knowledge of pandas or a similar data manipulation library in Python.

Machine Learning Big Data

Data Analytics Data Science Open Source