This is a meetup for Triangle users of Apache Spark ( http://spark.apache.org ), the high-speed cluster programming framework.
We are having regular meetings, where we talk about all things Spark from beginner to expert, discussing the core Spark project, MLlib, GraphX, SparkR, Spark SQL, and anything else that seems pertinent.
You are all welcome to join and propose topics for the meetups in the discussion area!
We often get the question of how can I help. It's simple:
With the broader adoption of message brokers like Apache Kafka as well as distributed, message-sending architectures, the need for tools that can process vast amounts of data quickly became critical. To fill this need, we have several competing products, including Spark Streaming. In this talk, we will understand the use cases for stream processing and how Spark's concept of distributed batch processing reduces down to micro-batches in the streaming case. We will understand the two streaming models for Spark: DStreams and Structured Streaming with DataFrames, and will see examples of streaming applications in Scala and F#.
Kevin Feasel is a Microsoft Data Platform MVP and CTO at Envizage, where he specializes in data analytics with T-SQL and R, forcing Spark clusters to do his bidding, fighting with Kafka, and pulling rabbits out of hats on demand. He is the lead contributor to Curated SQL (https://curatedsql.com), president of the Triangle Area SQL Server Users Group (https://www.meetup.com/tripass), and author of PolyBase Revealed (https://www.apress.com/us/book/9781484254608). A resident of Durham, North Carolina, he can be found cycling the trails along the triangle whenever the weather's nice enough.