Skip to content
This event was canceled

Spidering Wikipedia Politely In Async Rust

Photo of Jim Blandy
Hosted By
Jim B. and Bart M.

Details

How many pages are reachable from Wikipedia's page on the Rust programming language in two hops? Around 30,000, it turns out, including pages on wheat flour, Welsh orthography, and the zombie apocalypse.

As it turns out, it's super easy to do this exploration using asynchronous Rust code. Wikipedia offers a cute little REST API for querying links, and it's easy to use Serde to generate requests and parse replies. And if you're feeling guilty about flooding a precious public resource with silly API requests, it's also super easy to do rate limiting.

Jim Blandy will show how to wire up Tokio, Reqwest, and Serde to do the spidering, and whip up a mock server for testing using Warp. The techniques shown work nicely for all kinds of REST API scripting, including, say, GitHub.

Please plan to arrive between 6:30 and 7. Due to limitations of the venue, we need to have someone stand outside and let people in, and we'd like them to be able to attend, so the doors will be effectively closed at 7:00, unless you're a PSU student.

Photo of PDXRust group
PDXRust
See more events

Canceled