Bradlee Rees will present:
The typical data processing workflow has a data scientist spending the vast majority of their programming time (not execution runtime) on data cleaning, transformation, and preparation. The remaining time is spent on running an analytic (and waiting) to produce results that hopefully contain a meaningful answer. To increase productivity, the data scientist uses tools within the Python ecosystem, such as Pandas and Scikit-learn. Unfortunately, most libraries suffer from poor performance. This makes the data scientist’s job hard since the amount of time they spend waiting for results interrupts their train of thought.
RAPIDS is an open-source software suite for GPU-accelerated data science that allows the data scientist the freedom to execute end-to-end workflows fully on GPUs through familiarity Python APIs. To do this, RAPIDS has several libraries that follow APIs similar to popular libraries: cuDF, a Pandas like dataframe library; cuML a machine learning library that follows the Scikit-Learn API; and cuGraph, a graph analytics library matching the NetworkX API.
This talk will walk through a data science problem that introduce components and features of RAPIDS, including feature engineering, data manipulation, statistical tasks, machine learning, and graph analysis. Throughout the talk, code examples and benchmarked perform gains will be presented. The talk will wrap-up with a presentation of the current RAPIDS roadmap.