Pandas: documentation and bokeh (continued)


Details
According to this recent blog post by Stack Overflow (https://stackoverflow.com/), Python is declared as the fastest-growing major programming language. And 10% of its credit is considered to be due to the pandas (http://pandas.pydata.org/) library.
In this sprint we'll have two different groups:
Beginners: We will improve pandas documentation
Gitter: https://gitter.im/py-sprints/pandas-doc
The idea is to improve the API documentation. So we will transform a page like:
http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.reset_index.html
to a page like:
https://pandas-docs.github.io/pandas-docs-travis/generated/pandas.DataFrame.reset_index.html
More information on how to contribute to Pandas documentation can be found here:
https://pandas.pydata.org/pandas-docs/stable/contributing.html#contributing-to-the-documentation
Intermediate / advanced: We will continue the implementation of Bokeh, as a backend for Pandas. This is described next.
Gitter: https://gitter.im/py-sprints/pandas-bokeh
One of the popular features of pandas is that it can directly plot the data it contains (in a Series or DataFrame). For example:
https://secure.meetupstatic.com/photos/event/b/7/a/0/600_464567008.jpeg
When this feature was implemented, matplotlib (https://matplotlib.org/) was the standard plotting library in Python. But things changed, and now there are many great available libraries. One of the most popular ones is Bokeh (https://bokeh.pydata.org/en/latest/). Bokeh generates interactive visualization charts in the style of D3.js.
Plotting pandas data in Bokeh is quite straight-forward:
https://secure.meetupstatic.com/photos/event/b/7/d/c/600_464567068.jpeg
But it would be more efficient and consistent, if pandas could be configured for a different backend like Bokeh, and then use the current pandas methods to plot with your favorite library. The result with Bokeh would be:
https://secure.meetupstatic.com/photos/event/b/8/2/9/600_464567145.jpeg
Pandas is already well prepared to be integrated with other backends. Having all the matplotlib logic in a directory plotting (https://github.com/pandas-dev/pandas/tree/master/pandas/plotting).
But some work needs to be done, adding a setting to define the backend, and further decoupling the plotting logic.
Also, a new package pandas-bokeh needs to be created, that can be called from the .plot() pandas methods.
In this sprint we will code this new module (that can be later added to pandas), and we will send the pull request for the required pandas packages.
Our sponsor
https://www.touchsurgery.com/img/logo-colour.svg
Thanks to Touch Surgery (https://www.touchsurgery.com/jobs.html) for providing the venue, and the pizza and drinks for the night.
Set up instructions:
- Get a pandas development repository
Fork pandas repository by clicking in the top right button at:
https://github.com/pandas-dev/pandas (http://meet.meetup.com/wf/click?upn=pEEcc35imY7Cq0tG1vyTt5bU6lMAVp2Y-2FVwG3L7-2BlDcVMvd0oa5cY0SWbzw0V4Ad_v655q35lr747ElyfPGSUh046oGHdEMFAcxOonLu-2Fm0JZczNJBiF0HIg0yRWx4wXrHKZqAy-2Bm1ktramBpclWxNzxh4Cj0yzX22X5UkO1w2ESkTzn9TDw5EI4hJ5BCZRppDS6KVWXcxRFV9yT9xJTr8rMzdjmCRqYc3RTwY8QrPDShjQRcNNBC1Tu3u3FRIuajrfCvzPi9w2B2iHesxeIKuJIMVI4QYQCocAkqZUVQtWk-3D)
After it completes, run in your computer terminal.
$ git clone https://github.com/ /pandas
$ cd
$ python setup.py build_ext --inplace
- Download and install Anaconda from:
https://www.anaconda.com/download/ (http://meet.meetup.com/wf/click?upn=pEEcc35imY7Cq0tG1vyTt45mZa7RQhrDun4GaOz4VCMNBdVHxUvrij57tfKWAO-2Bq_v655q35lr747ElyfPGSUh046oGHdEMFAcxOonLu-2Fm0JZczNJBiF0HIg0yRWx4wXrsRtCeL7UDjbgHTidR-2FG30KibXFDMMquRBlsa2WlQD8VyLAAMyhY9B8kNbMuWrCkUeYo-2B4MKFWom9JYG-2BgS4cmK-2B9dnDcBNwhRBxWlpouJmyb4EIc2At0JP84vmELypxXSmvQsCWYuvGDIsuN5hrI8M-2BICEE5xWPTmpFCs3y1SeY-3D)
After restarting the terminal, run:
$ conda config --add channels conda-forge
$ conda create -n pandas_dev --file ci/requirements_dev.txt
$ source activate pandas_dev

Pandas: documentation and bokeh (continued)