Processing Large Data with Python Pandas
Hosted by Dayton Women Code Together
Details
Data sets can get large quickly. You can go from looking at a few 100 lines and a handful of columns to a million lines and hundred of columns. Python's Pandas library is a great tools to handle and process data. Pandas is fast, powerful and flexible. Plus it does an amazing job at cleaning messy / real world data. It can quickly parse data and help you make meaningful plots. But it was designed to handle ~<100mb of data.
So what do you do when you have a few gigabytes of real world data? Data that you need to explore via a laptop? This talk will show you how to reduce the amount of memory your data takes on a computer system by up to 90%! Thus enabling you on a laptop to read in a few gigabyte csv file and process the data in that file with RAM to spare.
This talk will be interactive presented by Evelyn Boettcher. Please bring a laptop with python (vs 3.n) and pandas installed.
Python
https://www.python.org/downloads/release/python-374/
Pandas
https://pandas.pydata.org/
Follow along with Evelyn's talk:
https://github.com/DiDacTexGit/Talk-ProcessingLargeDatawithPandas
