What we're about
Upcoming events (1)
Many Spark ETL examples show data transformation through SparkSQL. Not all data engineering tasks fit well into the table-oriented SQL paradigm. In some cases, the data may be in a non-tabular format, such as images, PDFs, or other binary data types. Spark is a well suited to these sorts of data engineering tasks as well.
In this meetup, we will leverage the PySpark framework to read a large number of Digital Imaging and Communications in Medicine (DICOM) medical images produced by MRI scans, use the numpy and pydicom libraries to transform them into a dataset more suitable for training a machine learning model, and write the resultant PNG objects to an output S3 bucket. We are using the data from the "RSNA-MICCAI Brain Tumor Radiogenomic Classification" Kaggle competition, but this approach can be used for general purpose DICOM processing.
Link to Related Tutorial:
Link to Related Video:
Cloudera User's Page:
This is still a tricky time for public gatherings, but Future of Data is committed to providing great tech content & facilitating discussions in the "Big Data" space. In order to do our part to fight the spread of COVID-19's delta variant, this will be an exclusively online event with an originating Time Zone of EDT (the event time displayed on this page will reflect the equivalent local time). Our Northern Virginia-focused group is the host venue, we thought it might be of interest to our wider membership (you are welcome to sign up for it here).