addressalign-toparrow-leftarrow-leftarrow-right-10x10arrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscontroller-playcredit-cardcrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobe--smallglobegmailgooglegroupshelp-with-circleimageimagesinstagramFill 1languagelaunch-new-window--smalllight-bulblightning-boltlinklocation-pinlockm-swarmSearchmailmediummessagesminusmobilemoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonprintShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstar-shapestartickettrashtriangle-downtriangle-uptwitteruserwarningyahooyoutube

Message boards will no longer be available after July 18, 2024.

We recommend saving any important information beforehand. Going forward, you can stay connected with your groups using the Discussions feature (we'll be rolling out some exciting updates soon)!

Learn more about the upcoming changes in this article;

X-post: LABDUG - Hey Big Data, meet Apache Spark By Marco Vasquez of MapR

From: Subash D.
Sent on: Wednesday, April 23, 2014, 3:16 PM

The LABDUG is hosting Marco Vasquez of MapR

Sign up here

http://www.meetup.com/Los-Angeles-Big-Data-Users-Group/events/175709772/

Abstract:

Spark is a fast and powerful engine for processing Hadoop data. It runs in Hadoop clusters through  Hadoop YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both general data processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning. Spark supports Scala, Java and Python.

Hadoop has been a huge success in the data world. It’s disrupted decades of data management practices and technologies by introducing a massively parallel processing framework. The community and the development of all the Open Source components pushed Hadoop to where it is now.

That's why the Hadoop community is excited about Apache Spark. The Spark software stack includes a core data-proccessing engine, an interface to Hive for interactive querying, Spark streaming for streaming data analysis, and growing libraries for machine-learning and graph analysis. Spark is quickly establishing itself as a leading environment for doing fast, iterative in-memory and streaming analysis.  

This talk will give an introduction the Spark stack, explain how Spark has lighting fast results, and how it complements Apache Hadoop

 

Bio

Marco Vasquez is a data scientist for MapR Technologies where he is responsible for solving complex business and technical problems using advanced analytics techniques that include use case discovery, machine learning, and data engineering. He works with customers to help develop big data strategies, improve workflows, or create new business opportunities. Marco’s industry experience covers areas such as research and development, large scale architecture and software engineering, bioinformatics, image and video processing, content security, and management consulting. He holds a BS degree in Chemistry from University of California, Los Angeles.