Skip to content

“Machine Learning Genomics Sequences Classification: NLP vs. LSTM vs. LLM

Photo of Ernest Bonat, Ph.D.
Hosted By
Ernest Bonat, P.
“Machine Learning Genomics Sequences Classification: NLP vs. LSTM vs. LLM

Details

Hello Data Scientists,

Natural Language Processing (NLP) and Deep Learning (DP) algorithms including Long Short-Term Memory (LSTM) have been effectively applied to genomics text data classification, including DNA and protein string sequences. Large Language Models (LLMs) have been trained on vast amounts of text data to perform various language-related tasks, including text classification. LLMs offer a powerful approach for DNA sequence classification, leveraging their ability to learn and generalize from large datasets. By adapting these models for biological data, researchers can gain insights into genomic sequences and their functions, advancing fields such as genomics, bioinformatics, and personalized medicine. Examples of LLMs for DNA sequence classification include BERT, GPT, BioBERT, DNA-BERT, Transformer-XL, XLNet, and T5. Examples of LLMs for protein sequence classification include Evolutionary Scale Models, ProtBERT, and Tasks Assessing Protein Embeddings. The main question of this presentation is: Can LLMs perform better than NLP and DL algorithms for genomics text data classification? This presentation will show a simple comparison of these models for protein sequence classification with multiple class labels. Let's find out if we really need to use LLMs for genomics text data classification.

Thanks

Ernest Bonat, Ph.D.

Photo of Hillsboro Python Machine Learning Meetup group
Hillsboro Python Machine Learning Meetup
See more events