Skip to content

#3 => John Feminella on "Coresets and their applications"

Photo of Mike Bernstein
Hosted By
Mike B.
#3 => John Feminella on "Coresets and their applications"

Details

Description of the paper:

We'll be looking at the Ph.D. thesis of the head of the MIT Robotics and Big Data Lab, Dan Feldman. It's entitled "Coresets and their applications" (2010). (That's "coresets" as in the mathematical reduction technique, not "corsets" as in the article of clothing.) You can find it here:

http://www.cs.tau.ac.il/thesis/thesis/feldman

Coresets are a way of turning large datasets into much smaller, more manageable sets in a way that preserves the interesting features ofthe larger set. Approximation isn't a new technique; we compress datasets all the time to preserve interesting features (JPEGs for pictures, MP4 for audio, etc.). The choice of the way in which you construct the coreset determines which interesting features are (or are not) preserved, and within what error bounds.

The surprising result is that you can, very frequently, throw away large parts of your dataset still get good approximations to the optimal answer. This is valuable for cases where you want to ask an expensive question on a large dataset (e.g. Hamiltonian tours /Traveling Salesperson problem, convex hull, etc.) when you're willing to trade off some fidelity in the answer for response speed.

Brief outline:

  • How we improve solutions to problems with computer science: faster algorithms (hard!), faster computers (hard!)

  • How we improve solutions with coresets: approximate the dataset (much easier!)

  • Prerequisites for constructing a coreset

  • What kinds of problems do and don't work well, and why

  • A few examples of constructing a coreset

  • Diving into the math of why it works

  • Tips for constructing an arbitrary coreset (it's hard to do)

  • Maybe a demo or two, time permitting

A Note:

§1 and 2 (14 pages out of the 80, a lot of which is pictures) are really the only parts to read. The rest is more detailed math and proofs of different corner cases, and a generalization to N dimensions.

About the presenter:

On a dare, John once drank a 32-ounce chocolate milkshake in under 10 seconds. Life has more or less been downhill from there.

Find him at: @jxxf / http://jxf.me

Photo of Papers We Love DC group
Papers We Love DC
See more events
718 7th Street NW · Washington, DC