ProtBERT Protein Sequence Classification and Graph Algorithms for Life Science


Detalles
++++ PLEASE NOTE THAT WE START AT 5PM CET ON MARCH 15 ++++
Talk #1: Protein Sequence Classification using ProtBert Model from Hugging Face Library by Mani Khanuja (https://www.linkedin.com/in/manikhanuja/)
The study of protein localization (location of protein in a cell) is important to comprehend the function of protein and has great importance for drug design and other applications. Therefore, we will talk about how we can leverage Natural Language Processing (NLP) techniques for protein sequence classification. The idea is to interpret protein sequences as sentences and their constituent – amino acids – as single words. It was first introduced in this research paper: ProtTrans: Towards Cracking the Language of Life's Code Through Self-Supervised Deep Learning and High Performance Computing: https://www.biorxiv.org/content/10.1101/2020.07.12.199554v2.full
We will cover the following:
- What is ProtBert?
- Feature Engineering of Protein Sequence.
- Fine-Tuning and deploying Pytorch ProtBert Model from Hugging Face library on Amazon SageMaker.
- Leveraging Amazon SageMaker Distributed Data Parallel (SDP) feature during training.
GitHub link: https://github.com/aws-samples/amazon-sagemaker-protein-classification
Talk #2: Building Amazon Neptune based MedDRA terminology mapping for pharmacovigilance and adverse event reporting by Vaijayanti Joshi (https://www.linkedin.com/in/vaijayanti-joshi/)
Life Science companies are witnessing substantial growth in number of Adverse Events (AE) being reported for their products. This may be due to the increase in data volumes coming from journals, articles, social media, and non-standardized data sources. Evolving regulations and increasing pressure to improve quality and patient safety, while maintaining patient privacy rights, and providing efficient and cost effective operations are leading the organizations to rethink their strategy around legacy systems and manual processes.
Identifying and reporting adverse events (AE) during the clinical trials as well as post-approval is a critical part of ensuring long term product safety. Regulatory agencies require any serious adverse events be expeditiously reported when brought to the attention of the product manufacturers. Pharmacovigilance is the process of collecting, detecting, assessment, monitoring and prevention of adverse effects of pharmaceutical products.
In this blog post, we will demonstrate how customers can improve their pharmacovigilance processes by accelerating collection, transformation, and analysis and enrichment of data from sources such as call centers, as well as by identifying AEs, mapping them to appropriate ontologies such as MedDRA, and visualizing the results as a precursor to processing for submission to regulatory authorities.
Zoom link: https://us02web.zoom.us/j/82308186562
Meetup: https://meetup.datascienceonaws.com
Related Links
O'Reilly Book: https://www.amazon.com/dp/1492079391/
Website: https://datascienceonaws.com
Meetup: https://meetup.datascienceonaws.com
GitHub Repo: https://github.com/data-science-on-aws/
YouTube: https://youtube.datascienceonaws.com
Slideshare: https://slideshare.datascienceonaws.com
Support: https://support.pipeline.ai

ProtBERT Protein Sequence Classification and Graph Algorithms for Life Science