Skip to content

FuzzyData: A Scalable Workload Generator for Testing Dataframe Workflow Systems

Photo of Lionel Barrow
Hosted By
Lionel B. and 4 others
FuzzyData: A Scalable Workload Generator for Testing Dataframe Workflow Systems

Details

Dataframes have become popular for representing, transforming, and analyzing data. This approach has gained traction and a large user base for data science practitioners - resulting in a new wave of systems that implement a dataframe API but allow for performance, efficiency, and distributed/parallel extensions to systems such as R and pandas. However, unlike relational databases and NoSQL systems with various benchmarking, testing, and workload generation suites, there is an acute need for similar tools for dataframe-based systems. This talk presents fuzzydata, a first step in providing an extensible workflow generation system that targets dataframe-based APIs. We present an abstract data processing workflow model, random table and workflow generators, and three clients implemented using our model. Using fuzzydata, we can encode a real-world workflow or randomly generate workflows using various parameters. These workflows can be scaled and replayed on multiple systems to provide stress testing, performance evaluation, and a breakdown of performance bottlenecks on popular dataframe systems. Fuzzydata is available as an Open Source project on GitHub and is currently included in the Modin project as a fuzzy testing component in its CI pipeline.

This is a hybrid event. To attend online, join us on Zoom here at 6pm:
https://numfocus-org.zoom.us/j/82526680045?pwd=NE5HRXBIdUNmK0ROWGZXWDFnTW5adz09

Sponsor: [Tegus.com](https://www.tegus.com) will provide the meeting site, as well as pizza and soft drinks for the onsite participants.

  • Tegus Chicago Address: 200 N. LaSalle Street. Suite 1100. Chicago, IL 60601
  • Tegus Overview: Tegus is the leading market intelligence platform for key decision makers. We power some of the world’s most well-respected institutional investors, corporations, and consultancies through the largest and most comprehensive database of primary and market information. Our products and services enable clients to discover unmatched insights and answers to the most challenging questions they face to help them make better informed decisions. We are an end-to-end investment intelligence and research platform that modernizes the research processes. With an ecosystem that combines at-cost, on-demand expert calls with a 55K+ transcript library; quantitative financial workflows that streamline research across company disclosures, management presentations, earnings calls and filings; and 4K+ fully-drivable financial models and company benchmarking data, including every KPI and comparison that matters, Tegus enables investors to move faster, gather deep research and surface high-quality insights to drive better decisions. The company serves customers worldwide, including investment analysts, portfolio managers and key decision makers across public and private businesses in markets of all sizes.
  • Logistics: To access the Tegus building, we require first and last names of those who RSVP'd by Oct 25th. Attendees will then present their IDs when they arrive at the front desk (right when they enter the building). They will be sent up in the elevator to the 11th floor where we will be.
Photo of PyData Chicago group
PyData Chicago
See more events