curatedTCGAData: integration of The Cancer Genome Atlas in Bioconductor
Details
Abstract: The MultiAssayExperiment data structure (Ramos et al. 2017) provides a framework for managing and organizing experiment results on a set of samples in Bioconductor, and has been the topic of previous BiocNYC meetups (https://github.com/waldronlab/BiocNYC#workflow-for-multi-omics-data-analysis-by-levi-waldron). Briefly, the MultiAssayExperiment container eases the burden of data management by creating a graph representation of biological units and their relationship to multiple assay measurements. Numerous resources exist for accessing data from The Cancer Genome Atlas data (TCGA) through websites, command line tools, R packages, and other software interfaces. Other resources provide complete data access, but tend to be focused either on access to data files or on providing specific pre-determined analyses. We introduce curatedTCGAData and TCGAUtils to provide accessible, user-friendly, and integrated datasets using the Bioconductor data framework of SummarizedExperiment, GenomicRanges, and MultiAssayExperiment. curatedTCGAData allows one-line generation of multi'omic and pan-cancer datasets linking specimens to patients, extensive clinicopathological and laboratory data, and manually curated subtypes.
About the speaker: Marcel Ramos is a software developer and creator of MultiAssayExperiment, curatedTCGAData, TCGAUtils, and other widely used packages such as BiocManager and RaggedExperiment. He is a member of the Waldron Lab at the CUNY Graduate School of Public Health and Health Policy and of the Bioconductor core team at the Roswell Park Comprehensive Cancer Center.
