- Modernizing a Hadoop Database w/ Spark, DC/OS & a read-only Cassandra database
Join John Cabaniss and the Shadow-Soft team as they provide an in-depth overview of a customer project for a top 50 retailer where they were able to design and implement an update to their query and resource management systems, enabling them to reduce the query time from their legacy Hadoop cluster by 10x. Further improvements to this architecture/roadmap which are currently being designed will be shared, determined by several competing factors: internal infrastructure dependent on the legacy design, time to develop, time to market, and customer experience. Any enterprise/company with legacy systems (or architects/developers) that are with wrestling modernization efforts, updates, or improvements to existing systems will benefit from the lessons learned through this customer success story. Speaker: John Cabaniss has spent 20 years around emerging technology, first as a researcher at Los Alamos National Labs and GTRI, then as a founder or co-founder for various businesses in the start-up community. He brings experience in architecture and implementation of complex computing systems in both public and private sector, always with a mind towards achieving results.
- Children’s Healthcare of Atlanta; NextGen Sequencing using Hadoop, Spark, & Kudu
In 2017, Children’s Healthcare of Atlanta undertook Next Generation Sequencing (NGS) as a new initiative. Using open-source tools such as Hail, Apache Spark and Apache Kudu, Children’s built a robust, scalable and secure platform to support NGS in the clinical setting. The resulting infrastructure, which co-locates genomic and phenotypic data, enables variant review and sign out as well as analytics and translational medicine using familiar tools like SQL. The platform comprises the entire clinical pipeline from raw reads to HGVS-called variants, informative QC and variant reports and data storage in Hail VDS’s in a Kudu storage layer in Hadoop. The upstream data is then presented to the clinician in a friendly web application for streamlined variant review and sign out. Remember you can always support CHOA by donating your time or money. https://www.choa.org/donors-and-volunteers/ways-to-give This meet-up will take place at Piedmont Center - Building 15 - 3575 Piedmont Rd, Suite P140 · Atlanta, GA. Food and Drinks will be provided one our sponsors. Casual Conversation from 6:15 to 6:45, Presentation will start @ 6:45. Look forward to seeing everyone on the 19th.
- Macy’s Omni-Catalog – a real-time fast data story using Spark & Cassandra
Please join Big Data Atlanta & SMACK as we learn from the Macy’s leadership team about how they build a modern omnichannel catalog with continuously changing product attribution and availability in an era of "Big Data" and "Fast Data". It is one of the largest such catalog ever assembled in the retail industry, managing hundreds of millions of availability at UPC location level and billions of raw inventory combinations. All of these are done in real-time with a fast data architecture to keep the catalog up to date to serve online and in-store experience. You will hear from their lead architect and leadership about how they have leveraged messaging, in-memory database, Spark, and Cassandra to build this solution that spans the catalog across multiple data centers and the cloud to achieve resiliency and geo-redundancy. This meet-up will take place at Macy's Technology Offices in John's Creek. Food and Drinks will be provided by the Macy's technology team. Casual Conversation from 6:30 to 7PM, Presentation will start @ 7PM. Look forward to seeing everyone on the 21st of February.
- Hazelcast Essentials Training ~ Free Training Course
Free Instructor-Led Training in Atlanta: Overview Hazelcast Essentials is a course designed for Java Developers looking to take their first steps in understanding In-Memory Data Grids (IMDG). By the end of the course, the attendee will be able to construct Hazelcast Clusters and deliver basic caching services. The candidates should be familiar with Core Java concepts and APIs (collections, concurrency). Students will be introduced to the fundamental features of Hazelcast and how they may be applied to solve various use cases. This course is suitable for Developers and Architects with no prior or very basic knowledge of Hazelcast. Agenda Topics to be covered in the training: Hazelcast Architecture Cluster formation with various discovery mechanisms Cluster deployment strategies Fault Tolerance and Failure Recovery Distributed operations: Caching, Computing, and Messaging Distributed Caching: IMap Partitioning and Replication Persistence High-Density Memory Store and Hot Restart Hazelcast Serialization What to Bring Bring your laptop, prepared with: A recent Java 8 JDK Your IDE of choice installed – IntelliJ Idea, Eclipse, NetBeans etc. Download lab code from https://github.com/wildnez/public-training/tree/... Build the labs using Maven or Gradle and set as Java project Location Atlanta Tech Village 3423 Piedmont Road Northeast Atlanta, GA 30305
- DataSciCon - Workshops and Learning Tracks (registration @ datascicon.tech)
DataSciCon Data Science. Machine Learning. Big Data. Analytics. Join us in Atlanta for a 3-day event, November 29 - December 1, on Data Science and Machine Learning! The first day is a workshop day, and there will be 4 concurrent tracks. Workshops and sessions include: Data Science, Machine Learning, Analytics, Big Data and more! A serious conference on all things Data Science, in a relaxed, fun environment. REGISTRATION @ datascicon.tech ~ Workshops must be purchased in a combo pass! ~ Conference-only tickets are available also. WORKSHOPS AVAILABLE: * Introduction to Machine Learning with Python & TensorFlow * Data Science with R * Data Science for CxOs and Biz Analysts * Data Analytics with Tableau * Data Pipelines with Apache Kafka INDIVIDUAL or Group Registration November 29 and main conference Nov 30 - Dec 1. For data scientists, data engineers/developers, and data architects. CxO - VP - Biz Analyst: We'll be offering great content for those in Management and Analyst roles - a full 3 days of deep dive workshops and sessions. You'll get up to speed on everything in the Data Science / Machine Learning / Big Data space quickly! Sessions & Workshops: Industry experts and community members will present fundamentals and the latest in the world of Data Science. Brought to you by the organizers of another world-class conference, Connect.Tech, DataSciCon.Tech will bring the energy and community spirit of our previous events for three days of learning and networking. We invite you to join us November 2017 for this unique experience! http://www.datascicon.tech/
- Bringing web-scale computing to the enterprise! A SMACK stack discussion.
We are hosting this meet-up with our friends @ SMACK ~ Atlanta: https://www.meetup.com/SMACK-Atlanta/events/239940277/ Companies like Netflix, LinkedIn, PayPal, Twitter, and Airbnb are what we would traditionally think of as web-scale companies, and over the course of the past decade, they have invested heavily in their infrastructure to solve scale and performance requirements that traditional architecture could not handle. During this process, they developed a new technology stack (or pattern) made up of open source solutions that have helped them scale elastically to meet and exceed growing demand and at the same time collect, process, and analyze customer data to make real-time decisions. One of the patterns that have emerged from these web-scale companies is called the SMACK stack ( Spark-Mesos-Akka-Cassandra-Kafka). This will be the first of a multi-part series where we bring in industry experts to talk about the different components and how each can help your organization take part in what’s being called “The Digital Transformation.” We will help you better understand how these open source projects fit together and why the traditional enterprise organization is now seeing a needed utilize these same systems (or patterns) that the web-scale companies have already pioneered to solve similar use-cases in their environment. In the first part, we will provide a technical overview the components of the “SMACK” stack and talk about building a modern, cloud-native architecture that is Responsive, Elastic, Resilient, and Message Driven. Covering a few of the key principals including 1) Microservices (Web, Mobile, IoT) 2) Durable Messaging Backplane 3) Data Persistence & Storage 4) Stream Processing 5) Machine Learning & Deep Learning 6) Intelligent Management 7) Cluster Analysis 8) Infrastructure (On-Premise, Cloud, Hybrid) This discussion will also dive into one of the core components of the SMACK stack ~ Akka, which is an advanced toolkit and message-driven runtime based on the Actor Model that helps development teams build the right foundation for successful microservices architectures and streaming data pipelines. This discussion will be led by, Sean Walsh who embraced Akka as the cornerstone of an IoT platform for the Viridity Energy software stack. As a result of that experience, he has personally set out to evangelize, architect, and implement reactive technologies at major enterprises around the world. Sean is now Field CTO for Lightbend. He is the author of Reactive Application Development, which is published by Manning and Akka runs through his veins.
- Leveraging Messaging Platforms such as Kafka for Real-time Streaming Transaction
Easily Leveraging Messaging Platforms such as Kafka for Real-time Streaming Transaction Processing Unfortunately, most of us suffer from the complexity of the architectures required for real-time data processing at scale. Many technologies need to be stitched together, and each technology is often complex by itself. Usually, we end up with a strong discrepancy between how we, as engineers, would like to work vs. how we end up working in practice. In this session, we will talk about how to radically simplify the architecture and speed up development time and application latency. We will cover how you can build applications to serve real-time processing needs without having to spend months building infrastructure, while still benefiting from properties such as guaranteed delivery, high scalability, distributed computing, and fault-tolerance. We will discuss use cases where stream processing often requires transactional and database-like functionality. Kafka (or any messaging broker) allows you to bridge the worlds of streams and databases when implementing core business applications (inventory management for large retailers, IoT based patient sensor monitoring in healthcare, fleet tracking in logistics, etc.), for example in the form of event-driven, containerized microservices. Join us Thursday, October 19th with Colin McNaughton, Head of Engineering at Neeve Research, and an author on several open source projects including Eagle, Robin, and Lumino. Colin will share experience, techniques, and best practices for building real-time applications and services using X Platform™, a powerful, easy-to-use library for developing highly scalable, fault-tolerant, distributed stream processing applications on top of Apache Kafka or other messaging brokers.
- Drinks and Data with Looker and Snowflake
More than ever, organizations need an adaptable analytics infrastructure to meet the growing demands of their business and customers. Creating an impactful analytics pipeline requires a thoughtful approach as you develop your data architecture and BI implementation. Come enjoy drinks with the Looker and Snowflake teams at The Optimist (http://theoptimistrestaurant.com/) on September 20th and interact with some of Atlanta's thought leaders in data. Space is limited so please RSVP at the link below as soon as possible to reserve your spot. We look forward to seeing you there! http://discover.looker.com/201709AtlantaSnowflakeHH_2.RegistrationPage.html
- FREE cloud analytics event in Atlanta: Hosted by AWS and Snowflake Computing
AWS and Snowflake Computing are hosting a FREE cloud analytics event in Atlanta on June 8th. I wanted to share the event with our group as it will be a great opportunity to learn more about some interesting technologies in the Data and Analytics space and to network with your peers. Here's the registration link with some additional information on the event. Please sign-up online via the link below. Signing up on the meetup page does not register you for this event. Signup for this event! (https://cloudanalyticscitytour.com/Atlanta/?referredby=MarcoCiccone) At the event, you will come away with a clear understanding of the following key concepts: 1) Getting all your data into one location – fast and simple 2) Integrating all data types without hassle 3) Analyzing all your data for deep insight 4) Advancing your organization with data for deep insight 5) Gigabytes or petabytes – data warehousing done at scale" Thanks, Geoff
- Building Event Data Pipelines with Kafka and Hadoop
Eric Sammer, CTO of Rocana, and author of Hadoop Operations (http://shop.oreilly.com/product/0636920025085.do) will present a solution for effectively ingesting and archiving event data at scale. In this talk he will discuss event pipelines and integration of Splunk with Hadoop. Agenda 7:30-8:00 Social 8:00-9:00 Presentation 9:00-9:15 Questions This meeting was originally organized by the Atlanta Hadoop User Group: https://www.meetup.com/Atlanta-Hadoop-Users-Group/events/235335372/