Skip to content

Lisa Green & Stephen Merity: CommonCrawl.org!

Photo of Stefan Edlich
Hosted By
Stefan E.
Lisa Green & Stephen Merity: CommonCrawl.org!

Details

Dear Big Data Beers Members,

it is my pleasure to announce the next meetup!

Lisa Green and Stephen Merity will speak about CommonCrawl.org!

The first will be a visionary talk and the second a technical talk.

Speaker

Lisa Green http://www.linkedin.com/in/lisagreen

Title
Big Open Web Data

Abstract
The Web is the largest collection of data in human history and can provide immensely rich corpus for scientific research, technological advancement, and innovative new businesses. It is crucial for our information-based society that the Web be openly accessible to anyone who desires to utilize it. The Common Crawl Foundation builds and
maintains an open repository of web crawl data that can be accessed and analyzed by everyone. This presentation will discuss the relationship between open data and innovation, explain the mission and vision of Common Crawl, and demonstrate the value of an open repository of web crawl data through an overview of previous work.

Speaker
Stephen Merity http://smerity.com (http://smerity.com/)

Title
Experiments in web scale data

Abstract
The Common Crawl corpus contains petabytes of web crawl data and is a treasure trove of potential experiments. But the scale can be intimidating! To introduce you to the possibilities and to help you navigate such a vast collection, this presentation will take a a detailed, technical look at how the data has been used by various experiments, and how you can use a variety of frameworks handle the task of processing and analyzing such a dataset.

It would be a pleasure to have you all here!

http://photos2.meetupstatic.com/photos/event/7/d/c/0/600_429452192.jpeg

http://photos3.meetupstatic.com/photos/event/7/d/f/2/600_429452242.jpeg

Photo of Big Data Beers group
Big Data Beers
See more events
CoUp coworking space
Adalbertstr. 7-8 · Berlin