- How to select a Modern Cloud Data Warehouse and get the most out of it?
Please join us for an evening event where we will discuss modern cloud data warehouses including the challenges, selection process and justification. We will also cover how to maximize business engagement and use cases of modern cloud data warehouses. RSVP Details: You must have your full name as it appears on your ID on your meetup account to pass beyond building security. Guests must join the NYC Advanced Analytics Meetup. Schedule: 6:00-6:30 PM Networking over pizza and drinks 6:30-6:35 PM Welcome and introduction from A.T. Kearney 6:35-7:10 PM Setting the stage for modern cloud data warehouses by Slim Baltagi, Cervello an A.T. Kearney company 7:10-7:30PM How to get the most out of a cloud data warehouse by Jim Leavitt, Cervello an A.T. Kearney company 7:30-800PM: Q&A and networking Sponsors: • A.T. Kearney (http://www.atkearney.com/) is hosting the event. • Cervello (http://www.mycervello.com), an A.T. Kearney company is offering the speakers, pizza and drinks. Talk description: In the first part of this talk, we will give a setup and definition of modern cloud data warehouses as well as outline problems with legacy and on-premise data warehouses. We will speak to selecting, technically justifying, and practically using modern data warehouses, including criteria for how to pick a cloud data warehouse and where to start, how to use it in an optimum way and use it cost effectively. In the second part of this talk, we discuss the challenges and where people are not getting their investment. In this business-focused track, we cover how to get business engagement, identifying the business cases/use cases, and how to leverage data as a service and consumption models. Speakers bios: 1. Slim Baltagi (https://www.linkedin.com/in/slimbaltagi/ ) is a Director of Big Data and Machine Learning at Cervello, an A.T. Kearney company. Cervello is a fast-growing professional services company with an elite team of creative problem solvers highly specialized in helping organizations win with data. Slim is an IT leader with many years of extensive Big Data experience and more recently Machine Learning. As director, architect and engineer he delivered many end-to-end data projects to major Fortune 500 companies in many verticals. Slim enjoys being considered a thought leader, speaking at many conferences and organizing many advanced analytics meetups at major tech hubs in US cities and abroad. 2. Jim Leavitt ( https://www.linkedin.com/in/jimleavitt ) is a Vice President at Cervello, an A.T. Kearney Company. His responsibilities include maintaining key relationships with Cervello's technology partners, marketing Cervello's capabilities to new clients, and developing new sales opportunities. Jim works out of Cervello's New York Office. Jim is a seasoned executive partner helping organizations win with data. With over 25 years of experience, he has been working with finance, business, and information technology leadership teams to craft and execute business improvement programs using performance management, analytics, and data strategies. Jim has a broad based of experience having worked with client companies from Fortune 500 to mid-market in Financial Services, Consumer Packaged Goods/Retail, and Life Sciences. He has improved business performance through information technology, strategy, process improvement, and change management. Jim has previously worked at The Chase Manhattan Bank, RedBird Software, Oneworld Software Solutions (acquired by Microsoft), Sapient Corporation (acquired by Publicis), and Digital Equipment Corporation (acquired by Compaq/Hewlett Packard). He has a B.S.E. from Princeton University.
- Distributed Deep Learning with TensorFlow in Docker containers
Please join us for an exciting evening to learn more about Distributed Deep Learning with TensorFlow in Docker containers from Mathieu Dumoulin (https://www.linkedin.com/in/mathieudumoulin/), Data Engineer at MapR technologies (https://mapr.com/). RSVP Details: You must have your full name as it appears on your ID on your meetup account to pass beyond building security. Guests must join the NYC Advanced Analytics Meetup. Schedule: 6:00 pm - 6:30 pm: Networking, pizza and drinks 6:30 pm - 6:35 pm: Welcome by McKinsey & Company 6:35 pm - 6:40 pm: Intro by Jamal Syed from Hexstream 6:40 pm - 7:30 pm: Talk by Mathieu Dumoulin from MapR 7:30 pm - 8:00 pm: More networking! Sponsors: • Hexstream (http://www.hexstream.com) is offering pizza and beverage. • McKinsey & Company (http://www.mckinsey.com/) is hosting the event. Talk description: MapR recently announced the MapR XD platform, a global data fabric. It is a logical continuation of Convergence, which merges into a single cluster all of the MapR platform's technology in distributed file system, NoSQL and Document DB as well as real time event streams (Kafka). The sum of these technologies is definitely greater than the parts. In this talk, we'll look at convergence in action using distributed deep learning as an example. First, we're going to make use of the MapR Persistent Application Client Container (PACC) to demonstrate distributed Tensorflow running within docker containers on data stored on MapR. Then, using the Deep Learning example as a base, we'll show how the unique features of MapR can be cleverly used to improve and accelerate critical parts of the typical enterprise machine learning project, focusing on data ingest/data cleaning, model/dataset version and production deployment. Bio: Mathieu Dumoulin (https://www.linkedin.com/in/mathieudumoulin/) is a Data Engineer on the MapR Professional Services team, and is based in Tokyo, Japan. Machine Learning at scale has been the major focus of his interest since he finished his Masters degree at Universite Laval in Quebec City in Canada in early 2010's. Since joining MapR last year, Mathieu has been a frequent speaker at conferences like Strata on topics such as streaming architecture, real-time predictive maintenance and Convergence for machine learning. You can find his blog posts on these topics, as well as Spark performance tuning, CaffeOnSpark and others on the MapR blog (https://mapr.com/blog/)
- Streaming Analytics in a Flash - presented by Cask Data
RSVP Details: The Yodle (http://www.yodle.com/) office has strict security. You must have your full name as it appears on your ID on your meetup account to pass beyond building security. Please join us for a talk by David Finnegan from Cask to learn more about how to reduce built time for streaming analytics projects and other common Big Data projects using their proprietary technology. Sponsors: • Yodle (http://www.yodle.com/) is hosting the event at their office. • CASK (https://cask.co/)is providing the content and sponsoring for food/beverages. Schedule: 6:00 pm - 6:25 pm : Networking, pizza and drinks 6:25 pm - 6:30 pm : Welcome by Rassul Fazelat 6:30 pm – 7:30 pm : Talk by David Finnegan Talk description: David Finnegan will be presenting an overview of Cask and how the Cask Data Application Platform (CDAP) reduces the build time for many common Big Data projects such as Real-time Analytics and IoT and can be applied to other projects like Data Lake, Customer 360, CDAP for Spark, and EDW offload. Cask, the company that makes building and running big data solutions easy, provides the first unified integration platform for big data that cuts down the time to production for data applications and data lakes. For more information about Cask Data: https://cask.co/solutions/ Speaker: David Finnegan (https://www.linkedin.com/in/dmfinnegan) - Director of Sales Engineering - Cask Data
- The Stream Processor as a Database
RSVP Details: The Yodle (http://www.yodle.com/) office has strict security. You must have your full name as it appears on your ID on your meetup account to pass beyond building security. Please join us for an exciting evening to learn more about a new design pattern for data streaming applications, using Apache Flink and Apache Kafka from Jamie Grier who is Director of Applications Engineering at data Artisans. Sponsors: • Yodle (http://www.yodle.com) is hosting the event at their office. • data Artisans (http://data-artisans.com)is sponsoring for pizza and drinks. Schedule: 7:00 pm - 7:30 pm : Networking, pizza and drinks 7:30 pm – 8:30 pm : Talk Talk description: We present a new design pattern for data streaming applications, using Apache Flink and Apache Kafka: Building applications directly on top of the stream processor, rather than on top of key/value databases populated by data streams. Unlike classical setups that use stream processors or libraries to pre-process/aggregate events and update a database with the results, this setup simply gives the role of the database to the stream processor (here Apache Flink), routing queries to its workers who directly answer them from their internal state computed over the log of events (Apache Kafka). This talk will cover both the high-level introduction to the architecture, the techniques in Flink/Kafka that make this approach possible, as well as experiences from a large scale setup and technical details. Bio: Jamie Grier is Director of Applications Engineering at data Artisans where he’s extremely excited to be able to help others realize the potential of Apache Flink in their own projects. Jamie has been working on stream processing for the last decade at companies such as Twitter, Gnip and Boulder Imaging. This has spanned everything from ultra-high-performance video stream processing to social media analytics.
- Robust Stream Processing with Apache Flink - Jamie Grier (Data Artisans)
Robust Stream Processing with Apache Flink In this hands on talk and demonstration I'll give a very short introduction to stream processing and then dive into writing code and demonstrating the features in Apache Flink that make truly robust stream processing possible. We'll focus on correctness and robustness in stream processing. During this live demo we'll be developing a realtime analytics application and modifying it on the fly based on the topics we're working though. We'll exercise Flink's unique features, demonstrate fault-recovery, clearly explain and demonstrate why Event Time is such an important concept in robust stateful stream processing and talk about and demonstrate the features you need in a stream processor to do robust stateful stream processing in production. We'll also use a realtime analytics dashboard to visualize the results we're computing in realtime. This will allow us to easily see the effects of the code we're developing as we go along. Some of the topics covered will be: - Apache Flink - Stateful Stream Processing - Event Time vs. Processing Time - Fault tolerance - State management in the face of faults - Savepoints - Data re-processing Bio Jamie Grier is Director of Applications Engineering at data Artisans where he’s extremely excited to be able to help others realize the potential of Apache Flink in their own projects. Jamie has been working on stream processing for the last decade at companies such as Twitter, Gnip and Boulder Imaging. This has spanned everything from ultra-high-performance video stream processing to realtime social media analytics.
- Apache Flink 1.0
Please join us for an exciting evening to learn more about Apache Flink (http://flink.apache.org) from Stephan Ewen (https://www.linkedin.com/in/stephanewen) who is committer/VP in the Apache Flink project and co-founder/CTO of Data Artisans (http://data-artisans.com). Sponsors: • Workville (http://workvillenyc.com/) is hosting the event. • MapR (http://www.mapr.com) is offering pizza and beverage. Schedule: 6:00 pm - 6:25 pm : Networking, pizza and drinks 6:25 pm - 6:30 pm : Welcome by Rassul Fazelat 6:30 pm - 8:00 pm: Talk by Stephan Ewen Talk description: The talk will consist of two parts: 1. The first part discusses some fundamental patterns and problems behind continuous streaming applications in general, and how we built Apache Flink 1.0 to handle these problems. The talk will dig into questions around latency critical applications, fault tolerance and processing guarantees (exactly once, how to achieve then end-to-end), into processing data streams by "event time", or "processing time", handling out-of-order event streams, and about migrating, repairing, or versioning live streaming programs. 2. The second part of the talk is a technical deep dive into Apache Flink, discussing how the functionality discussed from a user and application point of view is actually implemented. The talk will cover the integration with Apache Kafka, the mechanisms behind fault-tolerance and high-availability, event time and windowing, savepoints and program reinstatements, as well as techniques for managing large online state. Bio: Stephan Ewen is a committer in the Apache Flink project and co-founder and CTO of Data Artisans. Before founding Data Artisans, Stephan was leading the development of Flink since the early days of the project. Stephan holds a PhD in Computer Science from TU Berlin.
- Extending the Yahoo! Streaming Benchmark & Winning Twitter Hack-Week with Flink
Please join us for an exciting evening to learn more about Real-Time stream processing with Apache Flink (http://flink.apache.org) from Jamie Grier (https://www.linkedin.com/in/jamiegrier) who is Director of Applications Engineering at data Artisans (http://data-artisans.com). RSVP Details: You must have your full name as it appears on your ID on your meetup account to pass beyond building security. Guests must join the NYC Apache Flink Meetup. Sponsor: McKinsey & Company (http://www.mckinsey.com/) Schedule: 6:00 pm - 6:30 pm : Networking, pizza and drinks 6:30 pm - 6:45 pm: Flink Community Update by Slim Baltagi 6:45 pm – 8:00 pm : Talk by Jamie Grier Talk description: In this talk Jamie will be delving further into the recent work he’s done with Apache Flink which resulted in the recent blog post published on February 2nd, 2016: “Extending the Yahoo! Streaming Benchmark and Winning Twitter Hack-Week with Apache Flink” (http://data-artisans.com/extending-the-yahoo-streaming-benchmark/) Jamie will go through some comparisons between Storm (and other systems like it) and Flink both in terms of processing guarantees and performance. He’ll also talk about some of the new architectural patterns that can be realized via a modern stateful stream processing system like Flink and what kind of advantages these patterns provide. Bio: Jamie Grier is Director of Applications Engineering at data Artisans where he’s extremely excited to be able to help others realize the potential of Flink in their own projects. His goal is to help others design systems to solve challenging problems in the real world.Jamie has been working in the field of streaming computation for the last decade. This has spanned everything from ultra-high-performance video stream acquisition and processing to social media analytics. Prior to joining data Artisans, Jamie was at Twitter working on rethinking the real-time analytics stack with the goals of making it much more efficient and also capable of computing accurate results in real-time without relying on the “Lambda Architecture” for correctness. Jamie is interested in streaming computation and mechanically sympathetic software architectures. He is particularly interested in building systems that are both high performance and highly scalable and his favorite quote is “You can have a second computer once you’ve shown you know how to use the first one”
- Apache Flink: What, How, Why, Who, Where?
Please join us for an exciting evening to learn more about Apache Flink (http://flink.apache.org) from Slim Baltagi, Director of Big Data engineering at Capital One; founder and organizer of the Apache Flink meetups in Chicago, New York City (NYC), Washington DC Area, Dallas-Fort Worth (DFW) Area and also co-organizer of the Boston, Paris-France, Sao Paulo-Brazil and Flink meetups. Sponsors: • Bloomberg LP (http://www.bloomberg.com)is providing location for the event, pizza and beer • Capital One (http://www.capitalone.com)is providing transportation from Chicago and accommodation in NYC for the speaker Schedule: 6:00 pm - 6:30 pm : Networking, pizza and drinks 6:30 pm – 7:30 pm : Talk Talk description: This introductory level talk is about Apache Flink: a multi-purpose Big Data analytics framework leading a movement towards the unification of batch and stream processing in the open source. With the many technical innovations it brings along with its unique vision and philosophy, it is considered the 4 G (4th Generation) of Big Data Analytics frameworks providing the only hybrid (Real-Time Streaming + Batch) open source distributed data processing engine supporting many use cases: batch, streaming, relational queries, machine learning and graph processing. In this talk, you will learn about: 1. What is Apache Flink stack and how it fits into the Big Data ecosystem? 2. How Apache Flink integrates with Hadoop and other open source tools for data input and output as well as deployment? 3. Why Apache Flink is an alternative to Apache Hadoop MapReduce, Apache Storm and Apache Spark. 4. Who is using Apache Flink? 5. Where to learn more about Apache Flink? Bio: Slim Baltagi (https://www.linkedin.com/in/slimbaltagi)is currently director of Big Data engineering at Capital One. He has over 18 years of IT and business experience and has spent the last 5 years of his life hadooping and more recently sparking and flinking! He enjoys evangelizing about Big Data technologies and maintaining a Big Data Knowledge Base (http://www.sparkbigdata.com/): Hadoop, Spark, Flink...He is also founder and organizer of the Apache Flink meetups in Chicago, New York City (NYC), Washington DC Area, Dallas-Fort Worth (DFW) Area and also co-organizer of the Paris and Boston Flink meetups.