Skip to content

Details

With a mission of "Universal Access to All Knowledge", the Internet
Archive is building a digital library of Internet sites and other
cultural artifacts in digital form for the last 29 years.

In this talk, core infrastructure engineer Pablo Duboue will walk us
through the project and its external APIs, which he helps
maintain. As a former data scientist himself, Pablo will discuss how the
different APIs can be useful for data scientists.

Outline
By the end of this session, participants will be able to:

  • The Internet Archive Project: history, core infrastructure engineering, the Wayback Machine, Open Library, digitization centres.
  • Using the IA through the Website: creating accounts, uploading material, derivatives, collections.
  • Using the IA through the Python `ia` Tool: instalation, IA APIs, IA and crawlers.
  • Contributing to the IA: uploading your own models / data, volunteering, copyright reform, donations

----------------------------------------
How to Join the Webinar
----------------------------------------
You can join via your browser (no app download required). Use Chrome or Firefox. Pre-register for the webinar:
https://www.bigmarker.com/neo4j/Data-Umbrella-Webinar

--------------------------------
Video Recording
--------------------------------
This event will be recorded and placed on our YouTube. We usually have it up within 24 hours of the event. Subscribe to our YT and set your notifications: https://www.youtube.com/c/DataUmbrella/

----------------------------------------
Time
----------------------------------------
20:00 UTC, 12pm PT / 3pm ET / 9pm Paris / 11pm EAT
----------------------------------------
About the Speaker
----------------------------------------
Pablo Duboue is a Core Infrastructure Engineer at the Internet Archive, contributing to site reliability and maintaining the legacy codebase that powers the platform (approximately 330,000 lines of PHP). Before joining the Internet Archive, Pablo had a 25 year career in applied language technologies and natural language generation, including earning a Ph.D. in Computer Science from Columbia University and joining the IBM TJ Watson Research Centre as a Research Staff Member.

LinkedIn: https://www.linkedin.com/in/pabloduboue/
GitHub: https://github.com/DrDub
Mastodon: https://mastodon.archive.org/@drdub

----------------------------------------
Connect with Data Umbrella
----------------------------------------
We invite you to follow Data Umbrella on our social networking sites to keep up to date on the latest news.

Data Science
Open Source

Members are also interested in