Past Meetup

Camilla Montonen-Support Vector Machines and Kernels for Computational Biology.

This Meetup is past

32 people went

Location image of event venue

Details

This is at Skills Matter so you will need to sign up there too https://skillsmatter.com/meetups/7466-camilla-montonen-support-vector-machines-and-kernels-for-computational-biology Thanks!

Ben­Hur A, Ong CS, Sonnenburg S, Schölkopf B, Rätsch G (2008) Support Vector Machines and Kernels for Computational Biology.

http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000173

The widespread adoption of high­throughput sequencing machinery has produced an unprecedented amount of genomic data for biologists to analyse. To fully leverage the potential patterns hidden in the petabytes of DNA and RNA sequence information requires the use of machine learning algorithms and specialised kernels, which can capture the valuable domain knowledge provided by biological scientists. A common problem in computational biology is that of binary classification. Support vector machines (SVMs) have achieved good results in this domain and have thus been eagerly adopted by computational biology researchers. Ben­Hur et al provide a gentle introduction to support vector machines and kernels in the context of binary biological prediction problems.

To explain the concepts of large margin separation and kernel functions, Ben­Hur et al use a computational biology problem known as splice­site recognition. In eukaryotic organisms, the process of gene expression involves transcribing a sequence of DNA into a molecule known as premature mRNA. Premature mRNA contains two types of regions: coding regions known as exons and 'junk' regions known as introns. The boundary of these two sites is often recognized by the presence of specific dimers GT and AG at these sites. However, only 0.1%–1% of occurrences of these dimers in the genome represent true locations of splice sites, which leads to an interesting question: Can we use support vector machines to help classify sites as splice sites and non­splice sites?

Ben­Hur et al explain the principles of maximum margin separation, kernel functions and classifier performance by exploring various aspects of this question. The paper is a great and gentle introduction into the world of support vector machines and also gives insight into some cool applications of machine learning technology. Moreover all of the data and code used in the paper is open­source. In six words: this is a paper to love!