2 Talks: Intro to Apache Spark + NLP of Conversational Data


Details
2 talks: One on Apache Spark. The other on Natural Language Processing (NLP).
This will be a joint event with https://www.meetup.com/Vancouver-Spark/ and https://www.meetup.com/PolyglotVancouver/ and https://www.meetup.com/MachineLearning/events/178254972/ . Thanks to Rindra Ramamonjison and George Chow for organizing this.
Talk #1: Introduction to Apache Spark
Apache Spark (http://spark.apache.org/) has quickly grown to be one of the most active projects in big data, with more contributors in the past year than Hadoop. In this talk, we’ll introduce you to the core concepts behind the engine, recent additions, and where it’s going next. While the Spark engine is designed for ease of use and speed, its most unique strength is generality, in that it can efficiently support and combine many workloads that usually required separate engines (e.g. MapReduce, SQL and machine learning). We’ll show how we are taking advantage of this strength with higher-level libraries built on Spark like Shark for SQL, MLib for machine learning, and Spark Streaming.
Speaker Bio: Andy Konwinski
"I am a cofounder of Databricks (http://databricks.com). Before that, I was a PhD student and then Postdoc in computer science in the AMPLab (http://amplab.cs.berkeley.edu) at the University of California, Berkeley. I’m focused on large scale distributed computing systems, such as those used by web companies like Google, Facebook, and Yahoo!
In particular, I am interested in resource management, scheduling, and rapid application development in cluster environments. I’ve worked on scheduling in Hadoop. I was one of the three creators of Mesos (http://mesosproject.org), a cluster scheduling system that has been adopted by Twitter as their private cloud platform. I worked with systems engineers and researchers at Google on Omega (http://www.wired.com/wiredenterprise/2013/04/google-john-wilkes-new-hackers/) (Eurosys paper (http://eurosys2013.tudos.org/wp-content/uploads/2013/paper/Schwarzkopf.pdf)), their next generation cluster scheduling system.
More about me via my LinkedIn profile (http://www.linkedin.com/pub/andy-konwinski/22/97/162), my publication list on Google Scholar (http://scholar.google.com/citations?user=0VwIiIsAAAAJ&hl=en), and my Twitter feed (http://twitter.com/andykonwinski)."
Talk #2:
Topic Labelling and Summarization of Conversational Data
"Our lives are increasingly reliant on multimodal conversations with others. We email for business and personal purposes, attend meetings in person, chat online, and participate in blog or forum discussions. Going through such overwhelming amount of data, to satisfy a particular information need, often leads to an information overload problem. This calls for automated methods to analyze, re-organize and summarize such amount of data. My current research looks at algorithms for topic labelling and automatic summarization of textual data. Regarding the topic labelling research, I have proposed a novel frameworks that assigns the most representative phrases for a given set of sentences covering the same topic, taking advantage of semantic knowledge. For automatic summarization, I have proposed two novel frameworks to generate query-based and generic summaries composed of grammatical sentences, using minimal syntactic information and domain specific NLP and NLG components. We successfully applied our approaches over challenging conversational datasets and demonstrated that our methods significantly outperform baselines and previous state-of-the-art models."
Speaker Bio
Yashar (http://mehdad.net/) is a post-doctoral research scientist in the Laboratory for Computational Intelligence at the Department of Computer Science, University of British Columbia. He completed his PhD from University of Trento in early 2012, pursuing research in natural language processing and cross-lingual textual entailment, working at the Human Language Technology Lab, Bruno Kessler Foundation (FBK-irst). He was involved in the BIN (Business Intelligence Network of Canada) during his postdoctoral research activities, and has worked in a large scale EU funded project (COSYNE) during his PhD studies. He has been a recipient of several awards including the best talk in the Postdoctoral Research Day, University of British Columbia, and the best research poster in the NSERC-BIN Research Symposium. His research interests in natural language processing and machine learning areas include (but not limited to) summarization, topic labelling and modelling, semantics, conversational data and cross-lingual applications. He has also served different conferences, workshops and journals as a reviewer, program committee and co-chair.
Schedule
• 6:00PM Doors are open, feel free to mingle
• 6:20 Presentation starts
• 8:00 Off to a nearby watering hole (Mr. Brownstone?) for a pint, food, and/or breakout discussions
Getting There
By transit there a number of high frequency buses (check Google Maps or the Translink site for your particular case) that will get you there. For the drivers, there is a fair bit of street parking (free and pay) in the area, especially after 6.

2 Talks: Intro to Apache Spark + NLP of Conversational Data