Skip to content

Spark + AI Meetup @ ExceL Center

Photo of Matthew Thomson
Hosted By
Matthew T. and 2 others
Spark + AI Meetup @ ExceL Center

Details

Spark + AI Summit Organizing Committee, Databricks, and Apache Spark London Meetup are sponsoring and hosting this meetup on the eve of Spark + AI Summit Europe. This meetup is open to everyone in the community, and you don’t have to be registered for the summit to attend.

So join us for an evening of Spark + AI tech-talks, community camaraderie, refreshments, and networking.

Tech-Talk 1: What we learned from running large scale Natural Language Processing systems in production

Abstract:
Martin And Rafal who started the Spark London meetup over 4 years ago will give an overview of their lessons learned from developing, testing and scaling Natural Language Processing systems in production over the last few years. Topics will include validating/testing, deploying and scaling deep learning algorithms using Apache Spark.

Bio:
Martin Goodson
Martin is the Chief Scientist and CEO of Evolution AI, specialists in large-scale natural language processing. Martin has designed data science products that are in use at companies like Dun & Bradstreet, Time Inc., John Lewis, and the Royal Bank of Scotland. Previously, Martin was a researcher at the University of Oxford, where he conducted research on statistical matching problems for DNA sequences.

Rafal Kwasny
Rafal is a Founder and CTO of Evolution AI where he specializes in deploying Natural Language Processing systems in the enterprise environments. Previously architect of the greenfield analytics platform for the Sony Playstation 4, specialist in big data in investment banking systems.

Tech-Talk 2: Analyzing Astronomical Data with Apache Spark

The volume of data recorded by current and future astrophysics experiments, and their complexity require a broad panel of knowledge in computer science, signal processing, statistics, and physics. Precise analysis of those data sets is a serious computational challenge, which cannot be done without the help of state-of-the-art tools. This requires sophisticated and robust analysis performed on many machines, as we need to process or simulate several times data sets. Among the future experiments, the Large Synoptic Survey Telescope (LSST) will collect terabytes of data per observation night, and their efficient processing and analysis remains a major challenge.

In this work, we investigate how to leverage Apache Spark to process and analyse future data sets in astronomy. We study the question first in the context of the FITS file format used across a wide range of astrophysical experiments. To this purpose we designed a data source API extension (spark-fits) to manipulate telescope images and astronomical tables with Apache Spark without performing data conversion, and we developed a new Apache Spark extension (spark3D) to manipulate 3D data sets and perform efficient queries: distribute 3D shapes, datasets join and cross-match, nearest neighbours search, spatial queries, and more.

We will then share experience in interfacing existing codes in astronomy with Apache Spark: the difficulties but also the gain in using such a framework compared to previous performances.

Finally I will introduce AstroLab Software (https://astrolabsoftware.github.io), a project aiming at providing advanced software tools to overcome modern science challenges faced by research groups.

Bio:
Julien Peloton is a research software engineer working at CNRS on big data solutions for science. During his PhD in Paris (FR) and his postdoctoral years at the University of Sussex (UK), he analyzed the data of a telescope observing the oldest accessible light in the universe (Cosmic Microwave Background).

He now spends most of his time sharing R&D efforts between groups, improving interoperability between industry and research in open source projects, and developing new collaborative tools to allow research communities to more fully exploit the big data ecosystem tools.

Photo of Apache Spark+AI London group
Apache Spark+AI London
See more events