Dataiku is returning to DC and is excited to join with ACM to present two talks focused on bringing data science to the field of astronomy!
6:45pm: Weighing the Benefits of Simulated NASA Data for Model Training by Patrick Masi-Phelps, Data Scientist at Dataiku
7:15pm: Building Data Pipelines for Astronomical Data by Ignacio Toledo, Data Analyst and Astronomer at ALMA Labs
Weighing the Benefits of Simulated NASA Data for Model Training by Patrick Masi-Phelps, Data Scientist at Dataiku:
In December 2017, researchers at Google and University of Texas, Austin announced the discovery of two exoplanets using deep learning techniques. In this talk, Patrick Masi-Phelps will discuss the Dataiku data science team's efforts to follow up on this research. We've incorporated simulated planetary transits and false positives in addition to the real, observed data used by Google and UT Austin. Patrick will talk about the pros and cons of using simulated data in the model training process, along with other challenges like accessing terabytes of data from NASA, chaining data pipelines, and tuning different network architectures.
Building Data Pipelines for Astronomical Data by Ignacio Toledo, Data Analyst and Astronomer at ALMA Labs:
ALMA is a radio astronomy observatory that collects over 4300 hours of high-quality data annually across its 66 antennas, amounting to more than 1TB of scientific data daily. Due to limited resources, this data is often only inspected for quality assurance purposes and is then sent out immediately to be processed by astronomers. Meanwhile, at least 750 GBs of monitoring and operational data are being stored daily – and no one is using it. This leaves a lot of room for error and ignores a lot of potentially fruitful data.
To fill these gaps, we’ve begun a data science initiative at ALMA focused on creating pipelines for more efficient data collection and educating our engineers and astronomers on data science methodologies. This meetup aims to share our experiences building out a data science infrastructure within the field of astronomy, particularly through the use of data science platforms. Audience members will learn how to build more efficient data pipelines, and how data science can be used to generate productive results in fields like astronomy.
Patrick Masi-Phelps is a Data Scientist at Dataiku, where he helps clients build and deploy predictive models. Before joining Dataiku, he studied math and economics from Wesleyan University and was most recently a fellow at NYC Data Science Academy. Patrick is always keeping up with the latest machine learning techniques in astronomical and public policy research.
Ignacio Toledo is a Data Analyst and Astronomer on Duty at the Atacama Large Millimeter/Submillimeter Array (ALMA), currently the world's biggest ground based observatory. His primary work has been the implementation of an optimal scheduler for ALMA's astronomical observations, and he has recently been involved in the efforts to build a modern data science team.