Folks, Kai Voight from Cloudera is in town, and has kindly offered to drop by and talk Hadoop. For those who remember the great talk on Pig and Hive given by Ian Wrigley last year - you can expect more of the same this time.
We're still working on figuring out the exact subject of the talk (suggestions welcome), but in the meantime mark this spot in your diaries. Should be an cracking night!
We'll be hosting the night at Fishburners in Ultimo, the red hot core of the nuclear reactor that it is the Sydney startup scene.
Cloudera are also kindly sponsoring beer and pizza on the night.
Kai studied computer science in Kiel, his diploma thesis covered a HTTP session framework for web based applications. For over 5 years, he worked for MySQL as an instructor and consultant, also covering MySQL Cluster, a distributed high available database. Currently, he's an instructor at Cloudera, the major software and service supporting Hadoop and related projects.
Presented by David Boloker (CTO, IBM Emerging Technologies) and Iwan Winoto (Software Architect, IBM Australia)
IBM's Emerging Internet Technologies team are called upon to deal with some of the biggest of the "big data" problems in the world. To tackle them effectively, they leverage both Hadoop as well as a suite of their own tools built on top of Hadoop.
Recently the team was engaged by the British Library to quite literally "download the web". Recent research estimates the average life expectancy of a Web site is just 44 – 75 days, meaning every six months, 10 percent of Web pages on the UK domain are lost. The challenge is to preserve the digital culture of the nation. IBM used their Hadoop based BigSheets project to help the British Library archive and analyse the UK web domain.
David is an IBM Distinguished Engineer and Chief Technical Officer for Emerging Internet Technologies in IBM Software Group. David is recognised in and outside IBM as a technical leader in the Internet software space guiding IBM's investments as well as internal product development.
Iwan is a Software Architect at IBM and represents the Emerging Internet Technologies team in Australia.
Akash deep S.
601 Pacific Highway, St Leonards NSW 206 · St Lenoards
Our next meetup is at the ATP Innovations centre in the Australian Technology Park, where Guy Harrison (from Quest) will be talking about SQOOP, an open source tool for getting data out of RDBMS like Oracle into Hadoop for batch processing.
After the talk we'll run a QA, as well as a "SMAQDown" panel session that attempts to dissect and solve a particular architecture problem. If you've got a interesting problem you'd like the group to have a go at solving, please get in touch with Andrew.
We meet at 6.30pm for a 7pm start.
Space is limited so please be sure to RSVP on meetup before the night.
As Hadoop penetrates the modern enterprise, it will increasing be called upon to integrate with more traditional enterprise data stores, and with Oracle in particular. Hadoop may need to copies of reference data mastered in an Oracle RDBMS to make sense of unstructured data held in Hadoop. Larger volumes of data may be moved from Oracle to Hadoop in order to take advantage of the Map Reduce programming model for complex analytics. In other cases, data from Hadoop - or output from Map Reduce or HIVE jobs - may be copied into Oracle where more real time ad-hoc queries can be supported.
Each of these scenarios demand a functional and efficiency means of shifting data between the two data stores. To this end, Cloudera have provided the open source SQOOP utility to import or export data between any SQL database and Hadoop. Quest have partnered with Cloudera to provide OraOop - an free utility that provides performance and functionality enhancements for those who with to inter-operate Oracle and Hadoop.
This presentation will discuss the general architecture of SQOOP and how it's extensibility architecture allows third party providers like Quest to provide optimized drivers for specific RDBMS. We'll then discuss technical challenges in moving data between Oracle and Hadoop. Finally, we'll consider how Hadoop changes the landscape for enterprise data management and speculate on how enterprises might leverage and consolidate the strengths of Oracle and Hadoop.
Guy Harrison is a Director of Research and Development at Quest Software, and has over 20 years experience in database design, development, administration and optimization. Guy is the author of numerous books, articles and presentations on database technology, is the architect of Quest's Spotlight family of diagnostic products and has lead the development of Quest's Toad for Cloud Databases(tm), and the Oracle-Hadoop "OraOop" product. Guy can be found on the internet at www.guyharrison.net, on email at [masked] and is @guyharrison on twitter.
National Innovations Centre 4 Cornwallis Street · Sydney
Location: Google's Sydney office, Level 5, 48 Pirrama Road, Pyrmont
6pm for a 6.30 start. Please be prompt as it can be difficult to get in the building after 6.30pm
Two great talks in one night to kick us off. Big thanks to Google for providing the venue, and to Cloudera for putting down a tab at the bar afterwards. See you there!
Talk: Taking the pain out of MapReduce with Hive and Pig (Ian Wrigley, Cloudera)
Writing Hadoop MapReduce jobs in Java can be a complex, time-consuming task. That's great for job security, but not for productivity. In this talk, we'll discuss Hive and Pig, two high-level abstractions which make the power of Hadoop accessible to a far wider audience. We'll see what sort of data lends itself to being processed by these languages, and what types of problems still need you to bribe the resident Java expert.
Talk: Intro to AppEngine (Nick Johnson, Google)
Nick will give an overview of App Engine, a complete hosted runtime environment for Java and Python applications that automatically scales on top of Google's own infrastructure. As well as offering a durable and highly scalable data storage on top of BigTable, AppEngine also provides built-in APIs for task queuing and toolkits for data transforms.
Following the talks, we'll keep the talk going at the Pyrmont Bridge Hotel, with a bar tab for early birds (thanks Cloudera!).
About the Speakers
Ian Wrigley started one of the UK's first web consultancies and has been managing large amounts
of data ever since, starting with flat files and Perl scripts, moving on to database servers such as MySQL, and now Hadoop. He describes his job as Cloudera as "helping geeks become geekier". Ian is also PC Pro's Contributing Editor for Unix and Open Source. Cloudera are the "Commerical Hadoop" company that provide their own open source Hadoop distribution as well as management tools and production support for the enterprise.
Nick Johnson is a Developer Programs engineer for Google App Engine, who's recently seen the light and moved to Sydney. His blog (http://blog.notdot.ne... (http://blog.notdot.net/)) is an essential resource to almost anyone building for Google App Engine, and when he's not saving the world there he can be found on twitter (@nicksdjohnson) or Stack Overflow helping folks out. It is rumoured that he owns a Python (http://blog.notdot.net/2010/10/Hello-developers).