Big-Data Architecture and Spark on AWS (2 sessions)


Details
Agenda
16:30 - 17:00 Gathring (beers and pizza)
17:00 - 18:00: first session
18:00 - 19:00 second session
Session 1: Big Data and Architectural Patterns on AWS
Abstract:
The world is producing an ever increasing volume, velocity, and variety of big data. Consumers and businesses are demanding up-to-the-second (or even millisecond) analytics on their fast-moving data, in addition to classic batch processing. AWS delivers many technologies for solving big data problems. But what services should you use, why, when, and how? In this session, we simplify big data processing as a data bus comprising various stages: ingest, store, process, and visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on. Finally, we provide reference architecture, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
Presenter: Ran Tessler, role: Solutions Architecture Manager, LinkedIn Profile: Here (https://www.linkedin.com/in/ran-tessler-b331915?authType=NAME_SEARCH&authToken=i6r0&locale=en_US&trk=tyah&trkInfo=clickedVertical%3Amynetwork%2CclickedEntityId%3A15368727%2CauthType%3ANAME_SEARCH%2Cidx%3A1-1-1%2CtarId%3A1455183009307%2Ctas%3ARan%20Tessler)
Session 2: Data Science and Apache Spark on EMR
Abstract: Organizations need to perform increasingly complex analysis on their data — streaming analytics, ad-hoc querying and predictive analytics — in order to get better customer insights and actionable business intelligence. However, the growing data volume, speed, and complexity of diverse data formats make current tools inadequate or difficult to use. Apache Spark has recently emerged as the framework of choice to address these challenges. Spark is a general-purpose processing framework that follows a DAG model and also provides high-level APIs, making it more flexible and easier to use than MapReduce. Thanks to its use of in-memory datasets (RDDs), embedded libraries, fault-tolerance, and support for a variety of programming languages, Apache Spark enables developers to implement and scale far more complex big data use cases, including real-time data processing, interactive querying, graph computations and predictive analytics. In this session, we present a technical deep dive on Spark running on Amazon EMR. You learn why Spark is great for ad-hoc interactive analysis and real-time stream processing, how to deploy and tune scalable clusters running Spark on Amazon EMR, how to use EMRFS with Spark to query data directly in Amazon S3, and best practices and patterns for Spark on Amazon EMR.
Presenter: Jonathan Fritz, role: Senior Product Manager - Amazon EMR, LinkedIn Profile: Here (https://www.linkedin.com/in/fritzjonathan?authType=NAME_SEARCH&authToken=Kxhe&locale=en_US&srchid=2372499271455182957441&srchindex=1&srchtotal=2&trk=vsrp_people_res_name&trkInfo=VSRPsearchId%3A2372499271455182957441%2CVSRPtargetId%3A46524937%2CVSRPcmpt%3Aprimary%2CVSRPnm%3Atrue%2CauthType%3ANAME_SEARCH)

Big-Data Architecture and Spark on AWS (2 sessions)