I am very excited to announce our first meetup for 2014!
This time we will have 2 very interesting presentations:
• Getting started with Amazon Kinesis (http://aws.amazon.com/kinesis/).
• Using Storm with MapR M7 for Real-Time Predictive Modeling @ Sociocast (http://www.socialcast.com/).
1. Ryan Waite (@presstopause) from the Amazon Kinesis team in will talk about how to get started with Kinesis, a new fully managed service for real-time processing of high volume, streaming data. Using Amazon Kinesis, a customer can store and process terabytes of data an hour from hundreds of thousands of sources, making it easy to write applications that take action on real-time data such as web site click-streams, marketing and financial transactions, social media feeds, logs and metering data, and location-tracking events.
Amazon Kinesis-enabled applications can power real-time dashboards, generate alerts, drive real-time business decisions such as changing pricing and advertising strategies, or send data to other big data services such as Amazon Simple Storage Service (Amazon S3), Amazon Elastic Map Reduce (Amazon EMR), or Amazon Redshift.
Kinesis can elastically scale to ingest small streams of data, large streams of data, or streams of data that grow and shrink throughout the day. Kinesis stores all data in three different availability zones to provide high durability and availability for data. Finally, Kinesis is designed to support streams of data that are so large that they require multiple machines working in parallel to keep up with the stream of data. Kinesis was designed to support internal Amazon streams, one of which is 9TB/hr at 11 million events/sec.
Ryan will talk about how Kinesis works and demonstrate setting up a Kinesis stream. Ryan will then show how to build a simple, distributed application for real-time processing of Kinesis data using the Kinesis Client Library.
Ryan will explain how the Kinesis Client Library works and provide links to the source for the Kinesis client library for those that want to go deeper on their own. Finally, Ryan will talk about Kinesis integration with popular open source tools for data ingestion and data processing, including Storm.
About Ryan Waite (http://www.linkedin.com/in/ryanwaite) (@presstopause): Ryan is the general manager for Data Services at Amazon Web Services. In addition to Kinesis, Ryan is responsible for metering, the internal AWS data warehouse, fraud detection, and metadata services.
2. Using Storm with MapR M7 for Real-Time Predictive Modeling
Sociocast LLC is a Big Data predictive analytics platform that provides near real-time predictions from real-time and batch data feeds of entity (user), event, and timestamp data. From the data, features are extracted, models built, and predictions created.
We will discuss the evolution from an Hadoop-only system to an architecture consisting of Storm, Play, Kafka, Redis, MapR M3, and MapR M7 (HBase) to meet our requirements.
An overview of the different types of topologies created by Sociocast will be discussed with an in depth review of the topology used for real-time probabilistic and absolute counting. Performance metrics of the platform will be shared as well as a development road map for the platform.
This talk will benefit anyone interested in learning about an approach taken for the evolution of a batch-based Hadoop M/R system to Real-Time Big Data Predictive Analytics Platform using Kafka, Storm, and MapM7 HBase.
About Speakers: Sourigna Phetsarath (http://www.linkedin.com/in/sourignaphetsarath/)(@sourigna)
Mr. Phetsarath is a senior technologist with over 20 years experience and a strong focus on high frequency multi-asset trading platforms and high performance computing systems. With a background in financial services technology consulting, he has worked for SunGard Consulting Services, Finetix LLC, Fort Point Partners LLC, and American Management Systems. Mr. Phetsarath graduated from the University of Virginia with a BS in Computer Science, and dual minors in Environmental Engineering and MIS.
Suren Hiraman (http://www.linkedin.com/in/surenhiraman)(@suren_h)
Before joining Sociocast, Suren served as Vice President of Technology at Proclivity Systems, a tech company focused on behavioral data mining, a Principal and Practice Manager at SunGard Consulting Services, and the Managing Principal of Global Integration Services for the Northeast U.S. Region at Salesforce.com. Suren holds a B.S. in Electrical Engineering from MIT.
DataTorrent (https://www.datatorrent.com/) - DataTorrent is the most powerful real-time computation platform.
NoSQLWeekly (http://www.nosqlweekly.com/) - A free weekly newsletter featuring curated news, articles, new releases, jobs etc related to NoSQL.