Data Science meets Software Development AND Is Apache Spark Ready for the Cloud?


Details
Hi Big Data DC, we have a double feature co-organized with Washington DC Area Apache Spark Interactive (https://www.meetup.com/Washington-DC-Area-Spark-Interactive/) for you.
Summary: Data Science meets Software Development
Alexis works in a Data Innovation Lab with a horde of Data Scientists. Data Scientists gather data, clean data, apply Machine Learning algorithms and produce results, all of that with specialized tools (Dataiku, Scikit-Learn, R...). These processes run on a single machine, on data that is fixed in time, and they have no constraint on execution speed.
With his fellow Developers, Alexis' goal is to bring these processes to production. Developers and Scientists have very different constraints: developers want the code to be versioned, to be tested, to be deployed automatically and to produce logs. Developers also need it to run in production on distributed architectures(Spark, Hadoop, …), with fixed versions of languages and frameworks (Scala…), and with data that changes every day.
In this talk, Alexis will explain how Developers work hand-in-hand with Data Scientists to shorten the path to running data workflows in production.
Biography: Alexis Seigneurin @ASeigneurin
Software engineer for 15 years and consultant for Ippon Technologies (http://www.ipponusa.com). Ippon delivers Digital, Big Data and Cloud applications on top of proven Java expertise in the US and France.
Throughout many projects, Alexis explored many aspects of data management - cleansing, processing, indexing, reporting… - with many languages, frameworks, systems and databases. Alexis has used Spark since early 2014 and specifically Spark with Cassandra when working on real-time reporting applications.
Summary: Is Apache Spark Ready for the Cloud?
Many technology companies are turning to the cloud for scalable, elastic infrastructure to store and analyze user behavior data, information from wearables, sensor data, and more. However, running big data tools, like Apache Spark, in the cloud presents a host of challenges. Spark is typically deployed in a dedicated data center as a next step in an organizations big data deployment strategy to gain deeper and faster insights. However, as the advantages of big data in the cloud become more apparent and gain wider adoption, can organizations also reap the benefits of Spark as a service without sacrificing its primary benefit—speed? In other words, is Spark ready for the cloud?
In this presentation, Praveen Seluka, a software engineer and Apache Spark expert at Qubole, will outline how the combination of Apache Spark and AWS can be implemented to ensure high performance, based on real-world experience. The audience will learn how to effectively deploy Spark in the cloud, key technological challenges with delivering it as a service that can scale and deliver the performance Spark was designed to deliver, and the important benefits that can be achieved through Spark as a Service.
Speaker Bio:
Praveen Seluka is a software engineer at Qubole. Prior to Qubole, Praveen worked as a software engineer at Microsoft and Yahoo. Praveen has won several coding competitions such as Kaggle and Codechef. He has a master’s degree in information systems from the Birla Institute of Technology and Science, Pilani.
Agenda:
6:30 - Drinks/Apps and Networking
7 - Introduction
7:15 - Talks
8:30 - Wrap up/shut down
Getting Here:
AddThis HQ is located next to the Silver lines' Spring Hill Metro station. Free parking is also available. If you have trouble getting in please call Brad at 571.278.5205
Food:
AddThis will provide beer and sodas and Ippon and Qubole will provide food. There are plenty of local bars available if people would like to continue the discussion after the talk.
Sponsors: AddThis (http://www.addthis.com/), Ippon (http://www.ippon.fr/), Qubole (http://www.qubole.com/)

Data Science meets Software Development AND Is Apache Spark Ready for the Cloud?