Skip to content

Details

Much of the world’s data are stored in portable document format (PDF) files. This is not my preferred storage or presentation format, so I often convert such files into databases, graphs, or spreadsheets. When I'm reading PDF files, I ask these questions.

• Do we need to read the file contents at all?

• Do we only need to extract the text and/or images?

• Do we care about the layout of the file?

I take different approaches to parsing depending on the answers to these questions. In the talk, I’ll show a few different approaches to parsing and analyzing PDF files, and I'll discuss which approaches make sense in which situations.

Our Teacher: Thomas Levine

Playing with computers since he was young, Tom eventually developed back and wrist pain, so he started studying ergonomics and conducting quantitative ergonomics research. At some point, people started calling him a data scientist. And his back and wrists now hurt less. He has recently been playing music and studying how people share data.

Related topics

Sponsors

Booz Allen

Booz Allen

DC2 Org Sponsor

GWU

GWU

The skills you need to develop and apply modern data solutions.

Anant Corporation

Anant Corporation

Program Sponsor

ByteCubed

ByteCubed

Tech Innovators located in Crystal City

You may also like