Silicon Valley Big Data Meetup Message Board › Cascalog training - San Jose - Mar 13-15
Interested in learning about how to use Cascalog, Elephant and Clojure big data technologies? Sam Ritchie, committer on Cascalog and Elephant, and co-author of Big Data will be giving a training class prior to Clojure/West - San Jose, CA, Mar 13-15th. Check out http://clojurewest.or... for more info and register soon at http://regonline.com/....
** What You'll Learn **
Cascalog is a data-processing library written in Clojure that lets you manipulate data at any scale, from local datasets at the REPL to hundreds of terabytes on the Hadoop platform. Cascalog treats data queries as first-class entities, allowing you to build up domain-specific libraries for your projects and only write code that matters.
In this workshop, you'll master the Cascalog API by writing dozens of queries and working through dozens of problems. We'll start with functions, inner and outer joins, aggregation, and how to compose queries into rock-solid Cascalog workflows.
From there we'll discuss advanced techniques like dynamic query generation and test-driven development. By the end of these three days you'll have all the tools to start working with Cascalog in production.
Course Outline (subject to change)
** Day 1 **
Getting Started - executing queries, interactive Cascalog
What is Cascalog?
Core Concepts - Cascalog predicates, queries
Thinking in MapReduce
Mastering the API - option predicates, taps
** Day 2 **
Functional programming at scale
- Queries as functions on data
- Generators as sets
- Optimizing Cascalog queries
- Dynamic query generation
- Dynamic predicate macros
- Cascalog "gotchas" and best practices
** Day 3 **
Test-driven big data
- Debugging Cascalog
- TDD with Midje
- Example projects
- Elastic MapReduce
- Generating key-value indexes with ElephantDB