DataPhilly September 2013 - Data Storytime
Details
Welcome back from summer, and now for something completely different! This month we have two awesome speakers lined up who will be discussing their research and the related datasets. Aaron Masino (http://www.linkedin.com/pub/aaron-masino/15/894/389) from CHOP (http://cbmi.chop.edu/index.php/people/40-people/128-aaron-masino-phd.html) will be discussing his work developing a clinical diagnostic pipeline for whole genome testing. Sadia Afroz (https://www.cs.drexel.edu/~sa499/) will be discussing her work with stylometry and anonymity. Thanks to AWeber (http://aweber.jobs/) for providing food this month!
Agenda:
6:30 PM - 7:00 PM - Food, Networking, and a word from our sponsors
7:00 PM - 7:30 PM - Developing a clinical diagnostic pipeline for whole genome testing, by Aaron Masino
7:30 PM - 8:00 PM - Stylometry and anonymity, by Sadia Afroz
8:00 PM - 8:30 PM - Lightning Talks
8:30 PM - Leave for Nodding Head (https://plus.google.com/117164681594927807176/)
More Details:
Developing a clinical diagnostic pipeline for whole genome testing, by Aaron Masino
Abstract
In this talk, I will cover some of our recent efforts at The Children’s Hospital of Philadelphia’s Center for Biomedical Informatics to develop a clinical diagnostic pipeline for whole genome testing. Specifically, I will discuss research on algorithms that prioritize a patient’s genetic variants relative to patient phenotypes. The algorithms utilize semantic similarity metrics to provide a measure of similarity between the Human Phenotype Ontology terms known to annotate a given gene and those terms describing the patient. Genes are then ranked by their similarity scores. P-value tables describing the probability of randomly obtaining a similarity score greater than or equal to the observed score provide a statistical significance measure of the ranking. The p-value tables are estimates of the true distributions and were generated from roughly 9 billion data points computed using a Scala Akka Monte Carlo simulation deployed on Amazon’s EC2.
Bio
Dr. Masino is currently a member of the Center for Biomedical Informatics at the Children's Hospital of Philadelphia where his research includes algorithm development for personalized medicine and intelligent software design and implementation in support of translational research and clinical care. Prior to joining CHOP, Aaron was a senior scientist at SAIC and MZA Associates Corporation, where he developed adaptive optics system concepts, control algorithms, and mathematical atmospheric laser propagation models. He also created innovative simulation and analysis software platforms for adaptive optics system performance prediction. Aaron received a PhD in applied mathematics from the University of Central Florida in 2004. He also holds a master's in aerospace engineering from the University of Colorado and a bachelor's in mathematics from Rutgers University.
Stylometry and anonymity, by Sadia Afroz
Abstract
In digital forensics, questions often arise about the authors of documents: their identity, demographic background, and whether they can be linked to other documents. The field of stylometry uses linguistic features and machine learning techniques to answer these questions. While stylometry techniques can identify authors with high accuracy in non-adversarial scenarios, their accuracy is reduced to random guessing when faced with authors who intentionally obfuscate their writing style or attempt to imitate that of another author. In my talk I will talk about current authorship attribution techniques and how they can be evaded.
Bio
Sadia Afroz is a PhD candidate in Computer Science at Drexel University where she works at the Privacy, Security and Automation Laboratory (PSAL) with Rachel Greenstadt. Sadia is also involved with SCRUB at UC Berkeley.
Directions to Nodding Head
Head South down 15th St (past City Hall)
Walk 4 blocks, turn right onto Sansom St
Nodding Head is on your left (230 ft)

