Skip to content

Details

## Details

Enroll in this training and receive a one-month complimentary e-learning subscription with access to 40+ courses.
This course is provided by Big Data Trunk for Stanford Technology Training Program but a limited few seats available to the public.

Students of this class may have opportunity to be considered for Internship with Big Data Trunk.

In this class, we will:

  • Explore many sources and repositories for valuable data acquisition such as open government and university datasets
  • Explore popular social APIs (e.g., Facebook, Spotify, Twitter) and domain-specific APIs (e.g., healthcare, news, science and math) that store a wealth of data
  • Discuss methods to query web servers, and request and parse data to extract the information you need
  • Explore scraping various types of data from websites and how to read and extract text from documents (e.g., PDF, Word) along with methods to clean and store sourced and scraped data.

Learning Objectives
During this course, you will have the opportunity to:

  • Explore a Variety of Public Data Repositories
  • Understand Effective Means to Search for Valuable Data
  • Use the Python Programming Language to Source and Scrape Data
  • Use Popular Social and Domain-specific APIs to Access Data (e.g., Slack)
  • Extract Text from Documents (e.g., data in PDFs, Word) and access PDF Tables
  • Scrape Data from Web Pages
  • Clean Scraped Data and store Sourced and Scraped Data

Topic Outline
Overview of Data Sourcing

  1. Public Open Dataset
  2. Government Data
  3. University Data
  4. Milestone 1 Learning Exercise: Explore public data repositories

Introduction to the Python Programming Language

  1. Installing Anaconda
  2. Milestone 2 Learning Exercise: Learn how to use Jupyter Notebooks

Using Public APIs (Application Programming Interfaces)

  1. Explore Popular and Domain-specific APIs
  2. Common Conventions
  3. Parsing JSON
  4. Milestone 3 Learning Exercise: Access a public API (e.g., Facebook, Twitter, Google)

Extracting Text from Documents

  1. Milestone 4 Learning Exercise: Extract data from PDFs

Overview of Data Scraping

  1. Introduction to BeautifulSoup
  2. Parsing HTML and Javascript
  3. Milestone 5 Learning Exercise: Scrape data from a website

Cleaning Scraped Data

  1. Storing Sourced and Scraped Data

Conclusion: Next steps
Prerequisites
Learners should have an understanding of Basic Python Programming.

Big Data
Data
Data Analytics
Data Visualization
New Technology

Members are also interested in