Data science in Life science: Graphs, Machine Learning, and Notebooks.


Details
This event is in partnership with the Philly GraphDB Meetup (http://meetup.com/philly-graphdb)
Emerging uses for graphs in the sciences - linking data from source to publication and beyond.
The push for reproducibility and the reuse of publicly funded data has meant that scientists are faced with a number of challenges. Recent work within the Neotoma Paleoecological data (http://neotomadb.org) has illustrated the need for systems that can accommodate potentially low-quality legacy data from tools such as the DeepDive infrastructure, while still supporting highly technical Bayesian frameworks. The use of Neo4j (http://www.neo4j.com) graph database provides one framework for linking resources to provide rich metadata and support for credentialed crowd-sourcing of information about resources. This talk will showcase both this application and the implementation of a large graph database to study networks of knowledge from the National Science Foundation’s own data.
About the Speaker: Simon Goring (https://www.linkedin.com/in/simon-goring-376105b3/)
Simon Goring (http://goring.org) has had a varied academic career: A city kid growing up in Toronto, a forest tech, with a diploma from Sir Sandford Fleming and a stint in the woods of northern Manitoba, a canoe guide on the Bow and North Saskatchewan Rivers, and a plant biologist with a B.Sc from UNBC. He finished his Ph.D with Rolf Mathewes in Biology at SFU, using fossil pollen to understand climate and vegetation change in British Columbia over the last 10000 years. His approach to paleoecology and data analysis brought him to a post doc at the University of Wisconsin in the department of Geography, where he is now an Assistant Scientist, working as the Technical Lead of the Neotoma Paleoecological Database, and a member of the Leadership Council for the EarthCube program (http://earthcube.org).
The GUODA platform: Advancing biodiversity informatics with Mesos, Spark and JupyterHub
The Global Unified Open Data Access (http://www.guoda.bio) platform combines the scalability of Apache Mesos and Spark, with the ease of use of the Jupyter notebook interface and pre-built large scale biodiversity datasets to allow biologists and data scientists to rapidly explore large scale biodiversity questions.
About the Speaker: Alex Thompson
Alex Thompson is an IT expert with the Advanced Computing and Information Systems Lab at the University of Florida working on the iDigBio project (https://www.idigbio.org). iDigBio is a project funded by the National Sciences Foundation to make data and images for millions of biological specimens available in electronic format for the research community, government agencies, students, educators, and the general public.
Venue & food are kindly donated by our friends at Monetate (https://www.monetate.com/)
This talk will be recorded and made available for following the event.

Data science in Life science: Graphs, Machine Learning, and Notebooks.