DSM Breakfast Seminar: Facebook & Apache Spark
Details
This year Melbourne is hosting the 24th ACM International Conference on Information and Knowledge Management (http://www.cikm-2015.org/).
We are delighted that some of the international speakers who will be in town have agreed to present to us at our breakfast seminar.
This is another free event made possible by the continued support of the DSM sponsors, now including KPMG who are hosting this one. There will be an honesty box though, with all proceeds going to our fundraising cause (https://www.meetup.com/Data-Science-Melbourne/events/224817861/).
Note the event end time, please put in your late pass for work so as not to disturb others by leaving mid session. Also the seats are capped at 180, but there is lots of standing room, so it's first in best dressed.
-------------------------
7:15am - breakfast
7:45am - James Shanahan: An Overview of Apache Spark for Data Science
8:30am - Sofus Macskassy: An Overview of (some) Machine Learning at Facebook
9:15 - Joint Q&A
10am - Close
------------------------
An Overview of Apache Spark for Data Science
Apache Spark is an open-source cluster computing framework. It has emerged as the next generation big data processing engine, overtaking Hadoop MapReduce which helped ignite the big data revolution. Spark maintains MapReduce’s linear scalability and fault tolerance, but extends it in a few important ways: it is much faster (100 times faster for certain applications), much easier to program in due to its rich APIs in Python, Java, Scala, R, SQL, and its core data abstraction, the distributed data frame, and it goes far beyond batch applications to support a variety of compute-intensive tasks, including interactive queries, streaming, machine learning, and graph processing. This talk will provide an accessible introduction to those not already familiar with Spark and its potential to revolutionize academic and commercial data science practices.
http://photos1.meetupstatic.com/photos/event/5/f/1/8/600_442824344.jpeg
Dr. James G. Shanahan has spent the past 25 years developing and researching cutting-edge data science systems. He is SVP of Data Science and Chief Scientist at NativeX, a mobile ad network. Previously, he has (co) founded several companies including: Church and Duncan Group Inc. (2007), a boutique consultancy in large scale data science; RTBFast (2012), a real-time bidding engine infrastructure play for digital advertising systems; and Document Souls (1999), an anticipatory information system. In addition, he has held appointments at Xerox Research, Mitsubishi Research, and at Clairvoyance Corp (a spinoff research lab from CMU).
Dr. Shanahan has been affiliated with the University of California at Berkeley and at Santa Cruz since 2009 where he teaches graduate courses on big data analytics, distributed systems, machine learning, and stochastic optimization. He also advises several high-tech startups and is executive VP of science and technology at Irish Innovation Center (IIC). He has published six books, more than 50 research publications, and over 20 patents in the areas of machine learning and information processing. Dr. Shanahan received his PhD in engineering mathematics from the University of Bristol, U. K., and holds a Bachelor of Science degree from the University of Limerick, Ireland. He is a EU Marie Curie fellow. In 2011 he was selected as a member of the Silicon Valley 50 (Top 50 Irish Americans in Technology). Jimi has been a competitor in the international kite racing circuit since 2013 and placed in the top 10 (in his age category) in the World Championships in the Formula Kite Racing class in 2014 and 2015.
-----------------------------------
An Overview of (some) Machine Learning at Facebook
How do we do scalable machine learning at Facebook and where is it used? I will in this talk first provide an overview of some of our machine learning infrastructure and the tools that we use to make machine learning scalable and easy to use. I will also discuss some of the challenges we face to keep ahead of ML needs. I will in the latter half of the talk discuss one specific ML use case on predicting attributes of nodes in a large social graph.
http://photos4.meetupstatic.com/photos/event/6/c/c/f/600_441867855.jpeg
Sofus A. Macskassy is part of the applied machine learning team at Facebook. He previously ran the user modeling group at Facebook in their Core Data Science team, was part of the research faculty at USC/ISI, and he was the Director of Fetch Labs. He received his PhD in machine learning/information filtering at Rutgers University. He is passionate about learning about users to better serve them through better filtering, ranking and recommendation. He was the general chair of KDD-2014, serves on the editorial board of JAIR and ML, and is well published at top-tier conferences and journals.
