Skip to content

The Emerging Fast Data Architecture by Dean Wampler

Photo of Sean Glover
Hosted By
Sean G. and 2 others
The Emerging Fast Data Architecture by Dean Wampler

Details

We are proud to partner up with Reactive TO (https://www.meetup.com/Reactive-TO/) and Toronto Apache Spark (https://www.meetup.com/Toronto-Apache-Spark/) meetups to host Dean Wampler of Typesafe. Dean will be travelling to Toronto to first present at the Toronto Apache Spark meetup on Weds Feb 24th. You can find that event here (https://www.meetup.com/Toronto-Apache-Spark/events/227574278/). On Thursday Feb 25th he'll be hosted by our own fine meetup. Make sure you RSVP for both!

Dean Wampler, Ph.D., is the Architect for Big Data Products and Services and a member of the Office of the CTO at Typesafe (http://typesafe.com/). Dean focuses on the evolving Fast Data stack for streaming applications based on the Typesafe Reactive Platform (http://typesafe.com/platform), Spark (http://spark.apache.org/), Kafka (http://kafka.apache.org/), Mesos (http://mesos.apache.org/), and other tools.

Dean is a contributor to several open source projects and organizes the Chicago-Area Scala Enthusiasts (http://meetup.com/chicagoscala/) meetup group. He's the author of the Programming Scala, 2nd Edition (http://shop.oreilly.com/product/0636920033073.do) and Functional Programming for Java Developers (http://shop.oreilly.com/product/0636920021667.do), and the co-author of Programming Hive (http://shop.oreilly.com/product/0636920023555.do), all from O'Reilly. He lurks on twitter, @deanwampler (http://twitter.com/deanwampler).

The Emerging Fast Data Architecture - Dean Wampler, Typesafe

Classic, Big Data architectures are evolving to better support stream processing scenarios, which provide a competitive advantage when you need to reduce the time between data arrival and information extraction. The term Fast Data has been coined for these new architectures.

The most prominent, emerging Fast Data architecture starts with a troika of new tools: Spark (http://spark.apache.org/), with its “mini-batch” approach to stream processing in its Spark Streaming (http://spark.apache.org/streaming/) module, Kafka (http://kafka.apache.org/), for large-scale ingestion and management of event data, organized into topics, and Cassandra (http://cassandra.apache.org/) for scalable, durable data storage (although other databases are often used instead). It also leverages existing tools, like the Hadoop Distributed File System (https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html) and YARN (https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html).

Additional tools are emerging to fill important gaps. Akka Streams (http://doc.akka.io/docs/akka-stream-and-http-experimental/1.0-M2/scala.html) provides low-latency stream processing and integration with other streaming systems that support the Reactive Streams (http://www.reactive-streams.org/) standard. The rest of the Typesafe Reactive Platform (https://www.typesafe.com/products/typesafe-reactive-platform) provides a complete suite of components for building the microservices needed to glue everything together. Finally, Mesos (http://mesos.apache.org/) is the next-generation cluster management and resource scheduling system, ideal for the new and evolving requirements of Fast Data architectures.

I’ll discuss the forces driving the evolution of this architecture, based on particular real-world scenarios. I’ll discuss how the requirements of Fast Data lead us to the components I’ve cited and how each component supports these requirements. Finally, I’ll consider the future of this young, evolving ecosystem.

Tentative schedule

6:30 Doors Open

7:00 An introduction by Katrin and Kevin

7:15 The Emerging Fast Data Architecture, Dean Wampler, Typesafe

8:25 Social. Stick around and get to know your fellow Scala peers

Hope to see you there!

Photo of Scala Toronto group
Scala Toronto
See more events
Loyalty One
438 University Ave, Suite 1200 · Toronto, ON