How to Scrape the Web: Finding information... Everywhere!!


Details
Join us at 7:00pm on Tuesday, December 23rd to learn all about... The Art of Web Scraping: finding information everywhere!
The web is huge and unorganised source of information. When you are lucky, you meet very nice websites with perfect UX…But too many times , their UX does not match your needs or it just sucks.
-
Did you ever want to access the database of a website ?
-
Were you disappointed looking for an API that does not exist ?
-
Did you spend hours copy pasting information from pages to Excel ?
We are going to introduce a technique, web scraping , that allows you to solve all of these questions.Web scraping allow you to explore a website and extract and organise all its information in way you can use it for a new purpose.
Introduction:
What is web scraping
How to use it in real life
Advantages
Challenges and limitations
Technology
Workshop Activities:
Introduction to scrapy frameworks
How to scrape events from meetup.com (http://www.google.com/url?q=http%3A%2F%2Ft.signauxdeux.com%2Fe1t%2Fc%2F5%2Ff18dQhb0SmZ58dDMPbW2n0x6l2B9nMJW7sM9dn7dK_MMdBzM2-04%3Fsi%3D6356873654042624%26pi%3D5B57EE29-8A8F-4398-8A1D-F00E5230B1CD%26t%3Dhttp%3A%2F%2Fmeetup.com&sa=D&sntz=1&usg=AFQjCNE7DH_8FBkoa5_m1FguFuQ0g2C0sA), timeoutshanghai.com (http://www.google.com/url?q=http%3A%2F%2Ft.signauxdeux.com%2Fe1t%2Fc%2F5%2Ff18dQhb0SmZ58dDMPbW2n0x6l2B9nMJW7sM9dn7dK_MMdBzM2-04%3Fsi%3D6356873654042624%26pi%3D5B57EE29-8A8F-4398-8A1D-F00E5230B1CD%26t%3Dhttp%3A%2F%2Ftimeoutshanghai.com&sa=D&sntz=1&usg=AFQjCNHqHOsY69imPSrYf9j6LDvD1PL18A), smartshanghai.com (http://www.google.com/url?q=http%3A%2F%2Ft.signauxdeux.com%2Fe1t%2Fc%2F5%2Ff18dQhb0SmZ58dDMPbW2n0x6l2B9nMJW7sM9dn7dK_MMdBzM2-04%3Fsi%3D6356873654042624%26pi%3D5B57EE29-8A8F-4398-8A1D-F00E5230B1CD%26t%3Dhttp%3A%2F%2Fsmartshanghai.com&sa=D&sntz=1&usg=AFQjCNFizX2yOjJD6z8NYtkwhH6byesbhQ)
About the Presenter: Frederic Bazin
This meetup's host will be Frederic Bazin, the owner of Agora Space and web-scraper extraordinaire, so you definitely don't want to miss out!
As always, be sure to BRING YOUR LAPTOP to get the most out of this meetup.
See you soon!
Eve (区丽怡)
UPDATE: instruction to prepare your computer
If you want to participate the workshop make sure you have python 2.7.x , virtualenv and scrapy installed. The instructions below will help...
Linux ( Ubuntu only, if you are curious enough to try another distro, you probably can figure out instructions ;-)
$ apt-get install python-pip python-virtualenv
Mac
$ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" (https://raw.githubusercontent.com/Homebrew/install/master/install))
$ (https://raw.githubusercontent.com/Homebrew/install/master/install)) brew install python$ pip install virtualenv
Windows ( untested, I am windows sober for 4 years, please help if I made a mistake)
#download python 2.7.9 https://www.python.org/downloads/release/python-279/ (https://www.python.org/downloads/release/python-279/figure)
#figure out how to make `pip` and `virtualenv` works
For all ( on above set up is done)
$ mkdir scraping
$ cd scraping
$ virtualenv venv
$ source venv/bin/activate
$ pip install scrapy
Message me if any trouble.

How to Scrape the Web: Finding information... Everywhere!!