What we're about

This meetup is focused on the Future of Data and the open community data projects governed by the Apache Software Foundation. Geared towards developers, data scientists and ALL Data enthusiasts who are building modern data applications. Our meetups cover all data -- data-in-motion and data-at-rest. Meetups provide an opportunity to listen, share and work hands on with other technologists in the open source and open community Apache tools.

Upcoming events (2)

Hello, HBase! Hello, Phoenix!

Network event

Online event

During our 6th event in our “Hello, “ series of introductory Big Data topic meetups, we’re going to introduce you to Apache HBase & Apache Phoenix as available through CDP Operational Database data service. This discussion assumes that you may have heard about them but not that you understand how the combination is similar to or different from a traditional RDBMS.

This presentation will cover the basics of when to use HBase alone vs with Phoenix. We will discuss the strengths and weaknesses of HBase & Phoenix and how you create & deploy an example Java-based application to work with them.

Some of the questions we will address include:
-How exactly does the Apache HBase architecture support both high availability and massive scalability?
-What are the most important use cases which demonstrate Phoenix’s most compelling features?
-How easy it is to build scalable, distributed applications leveraging developers’ existing knowledge of traditional relational databases? We’ll answer these questions and more!

This is still a tricky time for public gatherings, but Future of Data is committed to providing great tech content & facilitating discussions in the "Big Data" space. In order to do our part to fight the spread of COVID-19's Delta variant, this will be an exclusively online event with an originating Time Zone of EDT (the event time displayed on this page will reflect your equivalent local time). Our Providence, RI-focused group is the host venue, we thought it might be of interest to our wider membership (you are welcome to sign up for it here).

Processing DICOM Medical Image Files with Spark

Network event

Online event

Many Spark ETL examples show data transformation through SparkSQL. Not all data engineering tasks fit well into the table-oriented SQL paradigm. In some cases, the data may be in a non-tabular format, such as images, PDFs, or other binary data types. Spark is a well suited to these sorts of data engineering tasks as well.

In this meetup, we will leverage the PySpark framework to read a large number of Digital Imaging and Communications in Medicine (DICOM) medical images produced by MRI scans, use the numpy and pydicom libraries to transform them into a dataset more suitable for training a machine learning model, and write the resultant PNG objects to an output S3 bucket. We are using the data from the "RSNA-MICCAI Brain Tumor Radiogenomic Classification" Kaggle competition, but this approach can be used for general purpose DICOM processing.

Link to Related Tutorial:
https://www.cloudera.com/tutorials/processing-dicom-files-with-spark-on-cde.html?utm_source=mktg-community&utm_medium=meetup

Link to Related Video:
https://www.youtube.com/watch?v=nlKzF0-mKIg

Cloudera User's Page:
https://www.cloudera.com/users

This is still a tricky time for public gatherings, but Future of Data is committed to providing great tech content & facilitating discussions in the "Big Data" space. In order to do our part to fight the spread of COVID-19's delta variant, this will be an exclusively online event with an originating Time Zone of EDT (the event time displayed on this page will reflect the equivalent local time). Our Northern Virginia-focused group is the host venue, we thought it might be of interest to our wider membership (you are welcome to sign up for it here).

The URL for accessing the "live stream" will be provided to pre-registrants here, on this page, no later than 48 hours prior to the event.

Photos (26)