- SnappyData - Real Time Operational Analytics with Apache Spark!
Description Apache Spark has come a long ways since it began as a faster batch oriented map-reduce solution for Hadoop. With support for streaming, machine learning and most recently SQL, Spark aspires to become a player in the realm of real time analytics at scale. However, much of its underpinnings remain batch oriented and unsuited for highly concurrent OLAP workloads. In this talk, we will describe how we are innovating and renovating Spark's core underpinnings to make it suitable for real time operational analytics, helping create a unified platform that supports row and column data, high concurrency and approximate query processing at scale. Agenda 6:00 – 6:15 Welcome and Networking 6:15 – 7:15 Presentation by Jags Ramnarayan 7:15 – 8:00 Food, Drinks & Discussion Presenter Jags Ramnarayan (https://www.linkedin.com/in/jagsr) is the chief product visionary for SnappyData. Jags is the Chief Architect for “fast data” products(GemFire) at Pivotal and has served in the extended leadership team of the company. At Pivotal and previously at VMWare he led the technology direction for its high performance distributed data Grid and in-memory DB products. He has spent more than two decades in this industry and has a bachelors degree in computer science and a masters degree in management of science and technology. Sponsors Snappy Data (http://www.snappydata.io/) is the presenter of the event and sponsor for food and drinks. Pivotal (http://pivotal.io/) is the sponsor for the hosting fees. CentrlOffice (http://centrloffice.com/) is the host for this event.
- Simplify Your Architecture: Say No to Lambda, presented by VoltDB
What is Lambda? The architecture is designed to handle massive quantities of data by taking advantage of both batch- and stream-processing methods. It proposes that both speed/streaming and batch workloads be run in parallel on the same incoming data. The speed layer can achieve faster responses, while the batch layer can be more robust and serve as the system of record for historical data. Lambda also requires a serving layer to serve results. What’s a real-world example? Ed Solovey of Twitter (formerly Crashlytics) has given several talks on the use of the Lambda Architecture for the Crashlytics service, including a 20-minute presentation given at the October Boston Facebook @Scale Conference ( http://youtu.be/56wy_mGEnzQ ). The company needed to identify how many times end-users access a mobile app, which means handling hundreds of thousands of unique ids per second. To solve this problem the company implemented the Lambda Architecture. In the speed layer, they enlisted Kafka for ingestion, Storm for processing, Cassandra for state and Zookeeper for distributed agreement. In the batch layer, tuples were loaded in batches into S3, then processed with Cascading and Amazon Elastic Map-Reduce. The problem? It’s complicated. Sure, this approach can meet high-performance needs, but it comes with tremendous cost. Focusing on just the speed layer, getting Zookeeper, Kafka, Storm and Cassandra combined into a reliable fast data engine is expensive in developer time, computing resources and in operational ongoing support. Each system requires at least three nodes to run, meaning your speed layer is at least 12 nodes, and often larger. And once the speed layer is working, the batch layer is a second problem with its own complexity. Even with reliable components, the odds that any single component will have issues, goes up as the number of components rises. And when a component fails, how well trained is the operational support when there is such breadth to the app? Isolating which component is the issue is difficult when you must also consider inter-component interaction when hunting for problems. What’s a better answer? Simplify. Simplify. Simplify. Removing just a single component of a typical Rube-Goldbergian Lambda implementation can reduce complexity and cost, but also will make it easier to change the application as business needs change. Look to replace all or part of your Lambda stack with a more integrated solution and this discussion will show you how with a clear example. See how thousands of lines of code becomes 30 when you collapse disparate systems with those that integrate ingestion, state, agreement and processing. About the Presenter Ryan Betts, CTO, VoltDB Ryan was one of the initial developers of VoltDB’s commercial product, and values his unique opportunity to closely collaborate with customers, partners and prospects to understand their data management needs and help them to realize the business value of VoltDB and related technologies. About the location: Franz Hall - Room 026 Parking available in the main parking lot (no permit required) Campus Map: https://pilots.up.edu/documents/1960960/2608978/Campus+Map.pdf
- Moneyballing: How To Use Data to Win At Fantasy Football - Presented by Cloudera
Description Participants in fantasy football leagues manage teams by acquiring and trading players. Using predictive models based on historical data improves team selection and performance. In this talk we will use managing a fantasy football team as an example of how to integrate disparate data sources and make predictions based on complex event histories. We'll cover: ● Modeling data for building predictive models about individuals. ● Integrating disparate data sources. ● Applying portfolio optimization theory to player selection. ● Translating subjective knowledge of domain into a rigorous improvement of a predictive model. A raffle drawing will be organized, courtesy of O'Reilly's Strata + Hadoop World Conference, for a chance to win O'Reilly's Books! Meetup members also receive a 20% discount to Strata + Hadoop World 2015 by using this link http://oreil.ly/UGSJ15 and your unique code PBDG20. Agenda 6:00 – 6:30 welcome and networking 6:30 – 7:15 Presentation by Juliet Hougland 7:15 – 7:30 Raffle Drawing sponsored by O'Reilly 7:30 – 8:15 Pizza, Drinks & Discussion Presenter Juliet Hougland (https://www.linkedin.com/in/jhlch) recently joined Cloudera’s data science team. Juliet spent the last 3 years working on a variety of Big Data applications from e-commerce recommendations to predictive analytics for oil and gas pipelines. She holds an MS in Applied Mathematics from University of Colorado, Boulder and graduated Phi Beta Kappa from Reed College with a BA in Math-Physics. Sponsors Cloudera (http://www.cloudera.com/) is the presenter of the event and the sponsor for food and drinks! Webtrends (https://www.webtrends.com/) is the official host of the event. The Strata + Hadoop World conference (http://oreil.ly/UGSJ15) is providing the discount code & the O'Reilly book raffle. Talend (http://talend.com/) is the event organizer. Pulehu Pizza (http://pulehupizza.com/) is providing their delicious thin crust pizza!
- Making big data analytics simple for everyone: Datameer
Description Datameer is the only end-to-end big data analytics application purpose-built for Hadoop, designed to make big data easy for everyone. Companies of all sizes like British Telecom, Citibank, TrustEv and Workday use Datameer to integrate, analyze and visualize all of their data to get new insights faster than ever. Founded by Hadoop veterans in 2009, Datameer scales from a laptop to thousands of nodes and is available for all major Hadoop distributions including Apache, Cloudera, EMC, Hortonworks, IBM, MapR, Yahoo!, Amazon and Microsoft Azure. Where business intelligence provide answers to known questions, big data discovery reveals unknown patterns, relationships and insights in any data. Please join our discussion/demo around leveraging the power of Hadoop, and how Datameer is addressing the challenges in proliferation of analytics technologies with their latest release, including Smart Execution. Agenda 6:00-6:30 Welcome and Networking 6:30-7:30 Presentation by Scott Webber 7:30-8:30 Food, Drinks & Discussion! Presenter Scott Webber (https://www.linkedin.com/profile/view?id=6206657&authType=name&authToken=8Y25&trk=miniprofile-name-link) has worked in engineering and development positions for the last 20 years here in the Pacific Northwest with software vendors like Oracle, BEA and CA Technologies. Scott started his career coding Java for Symantec and then Silverstream. He has also worked in cloud automation, application performance analytics and recently with Hadoop and Big Data analytics. Sponsors Datameer (http://www.datameer.com/)is the presenter of the event and the sponsor for food and drinks! Seabourne Consulting (http://www.seabourneinc.com/) is our wonderful host! Talend (http://talend.com/) is the event organizer. Pulehu Pizza (http://pulehupizza.com/) is providing their delicious thin crust pizza!
- MapR presents Apache Drill: Self Service Data Exploration
MapR (https://www.mapr.com/) is presenting this event. MapR delivers on the promise of Hadoop with a proven, enterprise-grade Big Data platform that supports many mission-critical and real-time production uses. Thetus (http://www.thetus.com/) is our wonderful host! Pulehu Pizza (http://pulehupizza.com/) are our expert pizza makers! Talend (http://talend.com/) is organizing the event. Description Time to value is everything. With the emergence of new data sources such as web logs, social media, mobile applications and sensor data, organizations are looking extend BI by providing insights into new areas such as operational performance, product optimization and customer satisfaction. However, traditional data management processes simply don’t work in this new world of big data. Organizations now not only have to manage higher volumes of information, but the data itself arrives at faster rates in real time and is more complex, dynamic than traditional transactional datasets. To be useful, this data must be analyzed in much more shorter intervals than traditional reporting cycles of weeks and months. In this session, we will see how Apache Drill (http://incubator.apache.org/drill/) is driving this audacious goal to bring Self Service data exploration to Hadoop/NoSQL data, by letting users explore any type of data, immediately as it comes in, using the ANSI SQL language/SQL tools they are already familiar with. We'll see Drill in action on a live Hadoop cluster. About the Presenter Aditya Kishore (https://www.linkedin.com/profile/view?id=11254587) is a software engineer at MapR Technologies where he works on Apache HBase, MapR-DB, Hadoop security and most recently Apache Drill. Prior to MapR, Aditya worked at Novell as software engineer on the infrastructure team. Agenda 6:00 – 6:30 Welcome & Networking 6:30 – 7:30 Presentation by Aditya Kishore 7:30 – 8:30 Networking + drinks and our signature delicious thin crust pizzas!
- Using In-Memory, Data-Parallel Computing for Operational Intelligence
Sponsors Scaleout Software (http://www.scaleoutsoftware.com/) is presenting the event and paying for drinks and your favorite flat-crust pizzas. ScaleOut Software is a leader in in-memory computing software for applications with extreme low-latency, scalability and high availability requirements. Seabourne Consulting (http://seabourneinc.com/) is our wonderful host! Seabourne is a growing software company based in Portland, OR and Washington, DC. They are experts in information integration and big data applications, and leverage this expertise to build solutions for large government, corporate, and non-profit organizations (FCC.gov, NBC Sports, Commerce.gov, WRI.org, Cogstate.). Pulehu Pizza (http://pulehupizza.com/) are our expert pizza makers! Talend (http://talend.com/) is organizing the event. Description Operational systems manage our finances, shopping, devices and much more. Adding real-time analytics to these systems enables them to instantly respond to changing conditions and provide immediate, targeted feedback. This use of analytics is called “operational intelligence,” and the need for it is widespread. This talk will describe the use of in-memory, data-parallel computing to obtain operational intelligence in several scenarios, including financial services, ecommerce, and cable-based media. It will show both how an in-memory model is constructed and how data-parallel analysis can be implemented to provide immediate feedback. Performance results from a simulation of 10M live cable-TV set-top boxes will illustrate how this technique was used to correlate and enrich 25K events per second and complete a parallel analysis every 10 seconds on a cluster of commodity servers. The talk also will compare the use of in-memory computing to the more traditional “big data” model popularized by Hadoop MapReduce. It also will examine simplifications offered by this approach over directly analyzing incoming event streams from an operational system using complex event processing or Storm. Lastly, it will explain key requirements of the in-memory computing platform, in particular real-time updating of individual objects and high availability, and compare these requirements to the design goals for stream processing in Spark. About the Presenter Dr. William L. Bain (https://www.linkedin.com/profile/view?id=116535&authType=name&authToken=pat8&trk=miniprofile-name-link) is Founder and CEO of ScaleOut Software, Inc. Bill has a Ph.D. in electrical engineering from Rice University, and he has worked in the field of parallel computing at Bell Labs research, Intel, and Microsoft. Bill founded and ran three start-up companies prior to joining Microsoft. In the most recent company (Valence Research), he developed a distributed Web load-balancing software solution that was acquired by Microsoft and is now called Network Load Balancing within the Windows Server operating system. Dr. Bain holds several patents in computer architecture and distributed computing. As a member of the Seattle-based Alliance of Angels, Dr. Bain is actively involved in entrepreneurship and the angel community. Agenda 6:00 – 6:30 Welcome & Networking 6:30 – 7:30 Presentation by Dr. William L. Bain 7:30 – 8:30 Networking + drinks and our signature delicious thin crust pizzas!
- Big + Fast Data - Featuring VoltDB and Lilien Systems
Sponsors Lilien Systems (http://www.lilien.com/) is co-presenting the event and providing drinks and your favorite flat-crust pizzas. Volt Db (http://voltdb.com/) is co-presenting the event. Thetus (http://www.thetus.com/) is our wonderful host! ProFocus (http://www.profocustechnology.com/) is providing our group first ever Door Prize! Talend (http://talend.com/) is organizing the event. Description Traditional corporate data architectures aren’t up to the task of handling the volume and velocity of today's data streams. The Fast + Big Data challenge requires a new approach for both corporate data architectures and the operational and analytic data management systems that enable them. Learn how VoltDB and HP Vertica can help to extract new insight and uncover hidden value from Fast + Big data more quickly than ever before. Led by Lilien Systems’ Big Data and Advanced Analytics Practice Director Paul Cattrone and Ryan Betts, Chief Technology Officer at VoltDB, they will demonstrate how a high velocity in-memory database can co-exist with an analytics database and how the two can share data between each other. A live use case will be featured during the presentation. To celebrate the first-year anniversary of the Big Data Meetup group, ProFocus is offering a Door Prize! Come and get a chance to go home with a prize! Topics Covered - What are the requirements for Fast Data - How are corporate data architectures evolving - That “one size does not fit all” – the combination of a fast operational database with an analytics store yields a more optimal solution for this Fast + Big Data challenge Presenters: Ryan Betts (http://www.linkedin.com/pub/ryan-betts/7/2a6/431) - Chief Technology Officer at VoltDB Paul Cattrone (https://www.linkedin.com/pub/paul-cattrone/1/640/733) - Big Data & Analytics Practice Director at Lilien Systems Agenda 5:30 – 6:00 Welcome & Networking 6:00 – 7:00 Big + Fast Data Challenge Presentation 7:00 – 8:00 Networking + drinks and our signature delicious thin crust pizzas!
- Impala: MPP SQL engine for Apache Hadoop & Kite SDK: It's for Developers
This event is sponsored by Cloudera Inc. (http://www.cloudera.com/), one of the biggest Hadoop Distributions on the market that provides Apache Hadoop-based software, support and services, and training to business customers. Cloudera's open-source Apache Hadoop distribution, CDH (Cloudera Distribution Including Apache Hadoop), targets enterprise-class deployments of that technology. More than 50% of its engineering output is donated upstream to the various Apache-licensed open source projects (Apache Hive, Apache Avro, Apache HBase, and so on) that combine to form the Hadoop platform. Cloudera is also a sponsor of the Apache Software Foundation. Description Impala: A modern, open source, MPP SQL query engine for Apache Hadoop Cloudera Impala provides fast, ad hoc SQL query capability for Apache Hadoop, complementing traditional MapReduce batch processing. Learn the design choices and architecture behind Impala, and how to use near-ubiquitous SQL to explore your own data at scale. Kite SDK: It's for Developers The Kite SDK is an open source set of libraries, tools, examples, and documentation focused on helping developers build systems on top of the Apache Hadoop ecosystem. In this talk, attendees will get an introduction to Kite SDK components, as well as learn (via examples) how Kite makes it easier to work with data in HDFS and Apache HBase as records and datasets, just as you would with a relational database. About the Host Seabourne Consulting (http://seabourneinc.com/) is a growing software company based in Portland, OR and Washington, DC. They are experts in information integration and big data applications, and leverage this expertise to build solutions for large government, corporate, and non-profit organizations (FCC.gov, NBC Sports, Commerce.gov, WRI.org, Cogstate.). Seabourne provides leading-edge technology with lower risk by using and continually augmenting tools that address a broad spectrum of data problems. About the Presenters Alex Moundalexis (https://www.linkedin.com/profile/view?id=4939191&authType=name&authToken=ehCr&trk=miniprofile-primary-view-button) is a Solutions Architect at Cloudera. Alex spends his time with Federal customers to get their Hadoop clusters up and running properly. Before entering the land of Big Data, Alex spent the better part of ten years wrangling Linux server farms and writing Perl as a contractor to the Department of Defense and Department of Justice. A Maryland native, he likes onion rings, shiny objects, and Oxford commas! Ryan Blue (https://www.linkedin.com/profile/view?id=216335275&authType=name&authToken=uXrD&trk=miniprofile-primary-view-button) is a Software Engineer at Cloudera, currently working on the Kite SDK team. Agenda 5:30 – 6:00 Welcome & Networking 6:00 – 6:45 Impala presentation by Alex Moundalexis 6:45 – 7:30 Kite SDK presentation by Ryan Blue 7:30 – 8:30 Networking + drinks and our signature delicious thin crust pizzas!
- Big Data Technologies - Apache Spark with MapR
This event is sponsored by MapR, one of the biggest Hadoop Distributions on the market, and big contributor of Apache Hadoop projects like HBase, Pig (programming language), Apache Hive, and Apache ZooKeeper. Description Hadoop has been a huge success in the data world. It’s disrupted decades of data management practices and technologies by introducing a massively parallel processing framework. The community and the development of all the Open Source components pushed Hadoop to where it is now. That's why the Hadoop community is excited about Apache Spark. The Spark software stack includes a core data-processing engine, an interface for interactive querying, Sparkstreaming for streaming data analysis, and growing libraries for machine-learning and graph analysis. Spark is quickly establishing itself as a leading environment for doing fast, iterative in-memory and streaming analysis. This talk will give an introduction the Spark stack, explain how Spark has lighting fast results, and how it complements Apache Hadoop. About the Presenter Sungwook Yoon (http://sungwookyoon.com/) is a Data Scientist at MapR. Sungwook's data experience includes: - Malware detection algorithms for packet stream analysis - Mobile network signaling analysis - Social network analysis - Job title analysis - Call center data analysis Before joining MapR, Sungwook worked as a Research Scientist at Palo Alto Research Center and as an Architect in Seven Networks. Sungwook's main technical background lies in Artificial Intelligence and Machine Learning. Sungwook holds Ph.D. degree from Purdue University and is a graduate from Seoul National University. Agenda 5:30 – 6:00 Welcome & Networking 6:00 – 7:30 Spark presentation by Sungwook Yoon 7:30 – 8:30 Networking + drinks and our signature delicious thin crust pizzas!
- Big Data Featuring Intel & HP - Two for One!
Sponsored by TechPower IT Solutions (http://www.techpowerusa.com/default.aspx) This event will be on the Intel Campus in Hillsboro, don't miss the chance to see presentations from two very well respected speakers in the Industry! Presentation #1 – 5:00 pm Intel – How we use Big Data and what we do for Big Data. Speaker: Ajay Chandramouly (https://www.linkedin.com/profile/view?id=1685123&authType=name&authToken=nsxe&trk=miniprofile-primary-view-button), Big Data Industry Engagement Manager, Intel Topic: Ajay will present an overview of how Intel has adopted and put to use various Big Data platforms internally, how this environment has benefitted the business, and how Intel is contributing to Apache Hadoop with optimizations for Intel technologies. Presentation #2 – 6:15 pm HP Vertica – How it’s used, where it complements Hadoop, and where HP is taking Big Data. Speaker: Walt Maguire (https://www.linkedin.com/profile/view?id=2377929&authType=name&authToken=FXVq&trk=miniprofile-primary-view-button), Chief Field Technologist, HP Vertica Topic: Walt will provide an overview of the Vertica analytic database and how Vertica’s 1000+ customers typically use it today. Also, he will discuss typical Apache Hadoop use cases and how Vertica complements them. Finally, he’ll discuss HP’s big data strategy – HAVEn – and how its evolution will deliver a coherent big data platform for organizations to deliver successful large scale analytic applications. Pizza and refreshments will be provided. Hope to see you there!