Druid Data Ingest and Text Mining with Python

Name: Druid Data Ingest and Text Mining with Python
Start: 2014-02-26T18:00:00-07:00
End: 2014-02-26T21:00:00-07:00
Location: ATLAS Building

Hosted by Michael M.

Data Science & Business Analytics

Details

University of Colorado Boulder - Wednesday February 26, 2014 @ 6:00pm MST

NOTE: For folks unable to attend in person register and we will email you a livestream link 2 hours prior to event.

Location: ATLAS - 1125 18th St Bldg 223, Boulder, CO - Room 100

Map: http://goo.gl/maps/XTJ9v

Agenda:

6:00 - 6:20 Schmooze - Food will be served in Lobby.

6:20 - 6:30 Announcements

6:30 - 7:15 Druid Data Ingest by Wayne Adams

7:15 - 8:00 Text Mining with Python by William Stanton

8:00 - 9:00 Network at Old Chicago at 1102 Pearl St. (western end of Pearl Street pedestrian mall, directly facing Boulder Bookstore) -- we have set this to be an hour earlier than last time in the hopes that more can come and support our sponsor, Old Chicago in Boulder, as well as of course enjoy meeting fellow data scientists or aspiring data scientists.

See: http://oldchicago.com/locations/boulder (http://bit.ly/ND8Kp)

Druid Data Ingest -- Abstract

Most data scientists know and accept that their greatests moments of blinding insight are likely to be preceded by hours, days, or weeks of data retrieval, inspection and cleanup. Another unheralded area of data science is the setup and administration of a project's big-data tools. This presentation will be a practical guide to data ingest in Druid, an open-source analytics database designed for scalable, explanatory analysis of large datasets. In addition to real-time ingest and analytics, Druid supports several options for bulk ingest of historical data. Druid data ingest can be a little challenging for the first-timer, so this presentation will be a hands-on guide to the practical details. We will start with a quick overview of the topology of a Druid cluster and a brief look at real-time ingest. Next, we'll cover how to choose which bulk ingest method to use, configuration opportunities/pitfalls, and where to look if things go wrong. My goal is to leave you with enough information to speed up your deployment if you find yourself getting started with this outstanding database.

Bio

Wayne Adams is a software consultant in Boulder, Colorado. After obtaining a BS in Physics from Eastern Kentucky University during a tough job market, he was fortunate to procure (civilian) employment with the US Navy, as well as the coolest assignment of his career -- testing ship fragmentation armor at Aberdeen Proving Ground. A few years and an MS in Electrical Engineering from Colorado State University later, he is happy to enjoy the relative tranquility of business software. Like all of you, he is interested in all things data, and he especially enjoys providing detailed how-to's to help you get productive as quickly as possible.

Text Mining with Python - Abstract

Raw text is the classic example of unstructured, high-dimensional data. Text mining methods allow you to uncover structures, patterns, and sometimes even meaning in text. In this talk, I will introduce the key challenges and methods in text mining, and give examples of how to actually do text mining using Python. This talk will contain example use-cases and big picture ideas for generalists, as well as some real, working code for technical folks.

Bio

Will Stanton is on the analytics team at Return Path, the world's leading email data company. Before starting at Return Path, Will studied probability in the Department of Mathematics at CU Boulder. Will loves learning and teaching data science. You can find him on LinkedIn (

http://www.linkedin.com/in/willstanton

) or on his personal website (

http://www.williamgstanton.com/

Data Science & Business Analytics

Druid Data Ingest and Text Mining with Python

Data Science & Business Analytics

Details

Related topics

You may also like