- Building Machine Learning Clustering Models for Gene Expression RNA-Seq DataLink visible for attendees
Hello Data Scientists,
Potential clustering applications in Bioinformatics include identifying groups of patients who respond differently to medical treatments for specific diseases and revealing groups of functionally related genes with similar expression patterns based on their proximity. Visualizing, interpreting, and analyzing high-dimensional and large-scale biological data can be challenging unless the data is organized into clusters. Another example is clustering genes or biomedical images to uncover hidden patterns from unlabeled datasets. In this presentation Dr. Bonat will cover the following Machine Learning techniques: K-Means clustering, handling imbalanced classes using SMOTE, PyDNA library for DNA sequence analysis, PCA for feature dimensionality reduction, the Elbow Method and the Kneed library for selecting the optimal number of clusters, the Adjusted Rand Index metric for evaluating clustering performance, and clustering Silhouette Analysis. This presentation is based on the published paper on Medium.com titled “Building Machine Learning Clustering Models for Gene Expression RNA-Seq Data”.
Thanks
Ernest Bonat, Ph.D.