Next Meetup

Streamlio, GridGain, Cassandra+Spark: FREE Workshop at Index
We're happy to announce two new Index ( sessions. 2/20 is the free SMACK 2.0 workshop. Moscone West, 3-5:30pm. Register ( by 2/20 with the code CD3ALEXY to attend the Community Day for free and the main program for just $280. (1) Streaming -- Streamlio (2) Memory computing -- GridGain (3) Cassandra+Spark (1) Building modern data pipelines by unifying Apache Pulsar, Apache Heron, Apache BookKeeper For today’s enterprises, ensuring that data pipelines are available to every corner of the organization is key to building next generation data-driven applications. In this talk Karthik Ramasamy of Streamlio will present on how to combine three best of breed open-source projects to have a solid data infrastructure that are is easy to develop against and simple to operate at scale in production. He will provide an overview of the merits of the three open source systems and then benefits they bring when integrated: Apache Pulsar: unified queuing and streaming Apache Heron: stream processing Apache BookKeeper: distributed stream storage Karthik Ramasamy is the co-founder of Streamlio that focuses on building next generation real time processing engines. Before Streamlio, he was the engineering manager and technical lead for real-time analytics at Twitter where he co-created Twitter Heron. He has two decades of experience working in parallel databases, big data infrastructure, and networking. Karthik is the author of several publications, patents, and "Network Routing: Algorithms, Protocols and Architectures". He has a Ph.D. in computer science from the University of Wisconsin, Madison with a focus on big data and databases. (2) Apache Spark and Apache Ignite: Where Fast Data Meets the IoT It is not enough to build a mesh of sensors or embedded devices to obtain more insights about the surrounding environment and optimize your production systems. Usually, your IoT solution needs to be capable of transferring enormous amounts of data to storage or the cloud where the data have to be processed further. Quite often, the processing of the endless streams of data has to be done in real-time so that you can react on the IoT subsystem's state accordingly. This session will show attendees how to build a Fast Data solution that will receive endless streams from the IoT side and will be capable of processing the streams in real-time using Apache Ignite's cluster resources. In particular, attendees will learn about data streaming to an Apache Ignite cluster from embedded devices and real-time data processing with Apache Spark. Live-Coding Workshop (3) Building Your First Spark & Cassandra Application: A Code-Along Adventure w/ Russell Spitzer Not sure where to start with Cassandra and Spark? Together let’s walk through starting your first Spark Application. We’ll walk through the setting up your IDE and integration tests, everything you need to build your first scalable and distributed Spark App. Learn how to use embedded Cassandra and Spark to write your own tests which are easily debuggable in standard IDEs. This will be a short but interactive adventure! Feel free to bring your own laptop and come code along! We will be using IDEA along with the template provided by Datastax About Russell Spitzer: After earning his Ph.D in bioinformatics from UCSF, Russell Spitzer took his love of big data to DataStax. There he has worked on all aspects of integrating Cassandra with other Apache technologies like Spark, Hadoop and Solr. Now his main focus on the integration of Cassandra with Apache Spark via the Spark Cassandra Connector. We are working with the IBM community teams to make their flagship developer conference, Index ( ), the most meaningful and fun experience for Bay Area developers. Alexy Khrabrov talks about Index with Markus Eisele, Selection Committee Chair and Director of Developer Advocacy, Lightbend: In our communities, we created and popularized the SMACK Stack ( ) -- a way to reason about end-to-end data pipeline architectures. Building and running such pipelines, and the components comprising them, are the key themes of Index. The conference starts with the free Index Community Day ( ), 2/20 which consists of 14 half-day sessions on the key technologies, many either directly relevant or of strong interest to most of us: • Spark • Kafka • Docker • Kubernetes • OpenAPI • Hyperledger • Istio • TensorFlow • Cloud Foundry You can build multiple viable architecture from these technologies, and they are often used together. To explore the progress made since SMACK 1.0, introduced in 2015, we are putting together a SMACK 2.0 panel, brainstorming the emerging SMACK Stacks. There is a wealth of expertise from many of the companies that present By the Bay regularly: Lightbend, Twilio, Slack, Uber, Google, Facebook, IBM, Eero, and many others. You can already meet many speakers at the IBM developerWorks TV playlist for Index: We’ll update this description as we ramp up our Index + SMACK 2.0 events!

Moscone West

800 Howard St · San Francisco, CA

What we're about

Public Group

A San Francisco-based Hadoop Ecosystem user group focused on helping Hadoop users share experiences, problems and solutions, as well as learn new skills for building large-scale Hadoop-based systems.

Please fill out this short survey to help determine the best date/time for most people to meet - .

Email us to share your experiences about development, operations, use cases, or related projects.

Members (3,579)

Photos (40)