ADAM standard - processing of genomic big data with Spark


Details
Event Agenda
18:30 - 19:00
Arrival and socializing
19:00 - 20:00
Michal Okoniewski, Scientific IT Services, ETH Zürich
"ADAM standard and other ways of scalable cloud-processing of genomic big data with Spark"
Next generation sequencing (NGS) technology has become a serious computational challenge since its commercial introduction in 2008. Currently, thousands of machines worldwide produce daily billions of sequenced nucleotide base pairs of data. Due to continuous development of faster and economical sequencing technologies, processing the large amounts of data produced by high throughput sequencing technologies became the main challenge in bioinformatics. It can be solved by the new generation of software tools based on the paradigms and principles developed within the Hadoop ecosystem. This talk presents the overall perspective for data analysis software for genomics and prospects for the emerging applications with a particular emphasis on the ADAM (http://bdgenomics.org/projects/adam/) standard.
The presentation will include an introduction to analysing the genomic big data in a scalable form with ADAM. It will also include examples of using Spark, SparkR and Parquet that can be more generic in the data science.

ADAM standard - processing of genomic big data with Spark