Seattle Scalability Meetup: Eastside Edition

Name: Seattle Scalability Meetup: Eastside Edition
Start: 2014-07-30T19:00:00-07:00
End: 2014-07-30T22:00:00-07:00
Location: Microsoft Building 41

Hosted by Bradford S.

Seattle Scalability Meetup

Details

Building 41, Room Townsend. Park in the structure next to the building.

This meetup focuses on Scalability and technologies to enable handling large amounts of data: Hadoop, HBase, distributed NoSQL databases, and more!

There's not only a focus on technology, but also everything surrounding it including operations, management, business use cases, and more.

We've had great success in the past, and are growing quickly! Previous guests were from Twitter, LinkedIn, Amazon, Cloudant, Microsoft, 10gen/MongoDB, and more.

This month's guests:

Jim White, University of Washington-- Gondor : A Groovy DSL for HTCondor DAGman workflows

HTCondor ( http://research.cs.wisc.edu/htcondor/description.html ) is a full-featured batch processing system for compute-intensive workloads and is widely used in scientific research. It has also been a significant part of the Big Data story in industry having been the foundation for a series of record-breaking cloud (AWS EC2) computing runs ( http://www.cyclecomputing.com/discovery-invention/use-cases/ ). This talk begins with a brief introduction to HTCondor and its workflow management tool DAGman. Then I'll explain my project Gondor, a Groovy solution to writing Condor DAGman workflows. Gondor is a work in progress with several innovative features in the pipeline including job memoization (using Git), workflow reduction, and support for provenance-aware results.

Jim White is a computational linguist, system architect, and software developer. He has over 30 years experience as a consultant for a variety of companies from startups to large corporations. An open source evangelist and contributor to OpenOffice and the Groovy programming language. BS in ICS from UC Irvine and nearing completion of a MS in computational linguistics at University of Washington where he taught the fundamentals course last summer.

Mike Drogalis -- Designing Like Bartok

If you’ve ever jumped heads down into a codebase maintaining complex distributed activity and tried to simplify or change the processing workflow, not only will you scratch your head for 7 sleepless nights before you can get anywhere, but you’ll come to realize that workflows are often deeply complected with their mechanism of execution.

In Design, Composition and Performance, Rich Hickey challenges developers to design like the composer Béla Bartók. Rich’s overarching theme is to take ideas apart and use different design sensibilities at each layer of the architecture under construction. In this talk, I’ll introduce a novel way of building distributed data processing systems, driven by the X factor of Rich’s advice.

We’ll survey contemporary frameworks such as Storm and Cascading, and discuss the pain points that can be overcome by taking an alternate approach. Additionally, we’ll look at how hardware advances in the last 10 years have enabled new designs, such as those that underlie Datomic. Attendees will come away with new perspective on leveraging immutability, persistent data structures, queues, and transactions to architect for increasingly complex problem spaces. The concepts and tools discussed will help us begin to obviate some of the incidental complexity that plagues modern frameworks.

This talk is intended for an audience with an intermediate level of familiarity with distributed systems, functional programming, and Clojure.

Michael Drogalis a software engineering consultant who mainly focuses on Clojure and Datomic.

Andrew Musselman, Accenture -- Fast Time-series Analytics with R and HBase

An overview with technical details about an analytics platform we built for a client who wanted to be able to calculate historical and real-time metrics of the operation of their equipment. We designed a solution that would support their needs, including programming with R; storing hundreds of time-series variables, each with its own reporting cadence, per piece of equipment; querying any combination of a handful of variables to calculate values for certain metrics over arbitrary time periods; and prompt results, on the order of fifteen seconds to plot and calculate the area under the curve for over a million data points at a time

Our solution uses R(and packages), Python(and modules), and Hortonworks' HDP2, specifically HDFS, MapReduce/hadoop-streaming via RHadoop's rmr2 package, and HBase, all on our team's big-data analytics platform called ADD(Accenture Data Discovery).

Andrew Musselman is principal scientist in the global big data practice at Accenture where he leads the data science team in North America. He is a committer on the Apache Mahout project and is writing a book on data science for O’Reilly.

Our format is flexible: We usually have 2 speakers who talk for ~30 minutes each and then do Q+A plus discussion (about 45 minutes each talk) finish by 8:45.

There'll be beer afterwards, of course!

Meetup Location:

Microsoft Building 41, WA
http://binged.it/15ZUnBU

After-beer Location: Damans on 24th (http://www.yelp.com/map/damans-bar-and-grill-redmond-2) c/o Greythorn Big Data

14810 NE 24th St, Redmond, WA 98052

Doors open 30 minutes ahead of show-time. Please show up at least 15 minutes early out of respect for our first speaker.

Events in Redmond, WA

Seattle Scalability Meetup: Eastside Edition

Seattle Scalability Meetup

Details

Members are also interested in