Skip to content

How to Scrape the Web: Finding information... Everywhere!!

Photo of Eve Olynyk
Hosted By
Eve O. and Frederic B.
How to Scrape the Web: Finding information... Everywhere!!

Details

Join us at 7:00pm on Tuesday, December 23rd to learn all about... The Art of Web Scraping: finding information everywhere!

https://mail.google.com/mail/u/0/h/ukil18e3akhx/?view=att&th=14a61a1937c9d572&attid=0.1.1&disp=emb&zw&atsh=1

The web is huge and unorganised source of information. When you are lucky, you meet very nice websites with perfect UX…But too many times , their UX does not match your needs or it just sucks.

  • Did you ever want to access the database of a website ?

  • Were you disappointed looking for an API that does not exist ?

  • Did you spend hours copy pasting information from pages to Excel ?

We are going to introduce a technique, web scraping , that allows you to solve all of these questions.Web scraping allow you to explore a website and extract and organise all its information in way you can use it for a new purpose.

Introduction:

What is web scraping

How to use it in real life

Advantages

Challenges and limitations

Technology

Workshop Activities:

Introduction to scrapy frameworks

How to scrape events from meetup.com (http://www.google.com/url?q=http%3A%2F%2Ft.signauxdeux.com%2Fe1t%2Fc%2F5%2Ff18dQhb0SmZ58dDMPbW2n0x6l2B9nMJW7sM9dn7dK_MMdBzM2-04%3Fsi%3D6356873654042624%26pi%3D5B57EE29-8A8F-4398-8A1D-F00E5230B1CD%26t%3Dhttp%3A%2F%2Fmeetup.com&sa=D&sntz=1&usg=AFQjCNE7DH_8FBkoa5_m1FguFuQ0g2C0sA), timeoutshanghai.com (http://www.google.com/url?q=http%3A%2F%2Ft.signauxdeux.com%2Fe1t%2Fc%2F5%2Ff18dQhb0SmZ58dDMPbW2n0x6l2B9nMJW7sM9dn7dK_MMdBzM2-04%3Fsi%3D6356873654042624%26pi%3D5B57EE29-8A8F-4398-8A1D-F00E5230B1CD%26t%3Dhttp%3A%2F%2Ftimeoutshanghai.com&sa=D&sntz=1&usg=AFQjCNHqHOsY69imPSrYf9j6LDvD1PL18A), smartshanghai.com (http://www.google.com/url?q=http%3A%2F%2Ft.signauxdeux.com%2Fe1t%2Fc%2F5%2Ff18dQhb0SmZ58dDMPbW2n0x6l2B9nMJW7sM9dn7dK_MMdBzM2-04%3Fsi%3D6356873654042624%26pi%3D5B57EE29-8A8F-4398-8A1D-F00E5230B1CD%26t%3Dhttp%3A%2F%2Fsmartshanghai.com&sa=D&sntz=1&usg=AFQjCNFizX2yOjJD6z8NYtkwhH6byesbhQ)

About the Presenter: Frederic Bazin

This meetup's host will be Frederic Bazin, the owner of Agora Space and web-scraper extraordinaire, so you definitely don't want to miss out!

As always, be sure to BRING YOUR LAPTOP to get the most out of this meetup.

See you soon!

Eve (区丽怡)

UPDATE: instruction to prepare your computer

If you want to participate the workshop make sure you have python 2.7.x , virtualenv and scrapy installed. The instructions below will help...

Linux ( Ubuntu only, if you are curious enough to try another distro, you probably can figure out instructions ;-)
$ apt-get install python-pip python-virtualenv

Mac
$ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" (https://raw.githubusercontent.com/Homebrew/install/master/install))
$ (https://raw.githubusercontent.com/Homebrew/install/master/install)) brew install python$ pip install virtualenv

Windows ( untested, I am windows sober for 4 years, please help if I made a mistake)

#download python 2.7.9 https://www.python.org/downloads/release/python-279/ (https://www.python.org/downloads/release/python-279/figure)
#figure out how to make `pip` and `virtualenv` works

For all ( on above set up is done)

$ mkdir scraping
$ cd scraping
$ virtualenv venv
$ source venv/bin/activate
$ pip install scrapy

Message me if any trouble.

Photo of Shanghai Web Dev and Design Club group
Shanghai Web Dev and Design Club
See more events
1199 Panyu Road, Building 3, Unit 101. 上海市徐汇区番禺路1199弄3号101室 · Shanghai