- Graph Native Machine Learning
Welcome to our BostonML Meetup for July hosted by Indigo. We're pleased to have Brandy Freitas speaking with us about how we can leverage graphs and graph technology in industrial ML applications. Where: Inidigo, 500 Rutherford Avenue. Charlestown, MA 02129 When: 6:00 (Doors), 6:30 (talk) Logistics: - Indigo is located on the second floor of the North Building in Hood Business Park at 500 Rutherford Ave, Charlestown, MA. - There is plenty of free parking in the front lot outside of the building (attendees will not be towed; no parking pass or car identification is needed), and we are a 5 minute walk from the Sullivan Square T Stop. - The space capacity is 300. We will, as usual, admit on a first-come, first-serve basis until full Talk Abstract: Graph databases have become much more widely popularized in the recent year. By representing highly connected data in a graph, we have access to a host of graph native algorithms that depend on and exploit the relationships between our data. Computing and storing graph metrics can add strong new features to nodes, creating innovative predictors for machine learning. Using algorithms designed for path finding, centrality, community detection, and graph pattern matching, we can begin to rely less on inflexible, subject-driven feature engineering. In this session, Brandy Freitas will cover the interplay between graph analytics and machine learning, improved feature engineering with graph native algorithms, and outline the current use of graph technology in industry. Speaker Bio: Brandy Freitas is a principle data scientist at Pitney Bowes, where she works with clients in a wide variety of industries to develop analytical solutions for their business needs. She is a research physicist-turned-data scientist based in Boston, MA. Her academic research focused primarily on protein structure determination, applying machine learning techniques to single-particle cryoelectron microscopy data. Brandy is a National Science Foundation Graduate Research Fellow and a James Mills Pierce Fellow. She holds undergraduate degrees in physics and chemistry from the Rochester Institute of Technology and did her graduate work in biophysics and computational statistics at Harvard University.
- How we build ML Products at Pluralsight
Abstract In this talk, we will discuss the process, both technical and human-centric, for developing machine learning products and experiences within the Pluralsight app. You will hear from a machine learning engineer on how our algorithms deliver relevant, contextual recommendations; how we approach what to build, when, and for whom; and how we treat experimentation as a first-class citizen in the development process. Speaker bio Connor McKay is a Principal Machine Learning Engineer at Pluralsight where he helped stand up the organizations architecture for rapidly shipping ML into production. He's been working in the ML space for around 5 years in everything from government organizations to tech companies. When not training tensorflow graphs, Connor spends his time reading political philosophy, biking, and cooking. Logistics Pluralsight is excited to host this meetup in their Downtown Boston office on the 8th floor of 60 State Street Boston MA. (T stops: State, Government Center, Downtown Crossing). Doors open at 6pm and the talk will start at 6.30pm. The room can hold a max of 75 people (seated and standing). Registration is mandatory; attendees will be allowed in until the room is full. Pizza and beer will be provided. Please bring an ID: Pluralsight staff will be in the building hallway to check everyone in. If your ID name does not match your name on Meetup, please be prepared to display your registration confirmation.
- Deep Learning on Your Phone
We're very excited to welcome Dr. Jameson Toole of Fritz to tell us about the ins-and-outs of building machine learning systems on mobile devices! Logistics: - The talk will be held at Spotify's office in Somerville. See address info above. Doors will open at 6 and the talk will start at 6:30. - The room can hold a max of 100 people (seated and standing). We will leave the registration open and allow people in *until the room is full* - We will have pizza and drinks! - Please bring ID Abstract: Machine learning and AI algorithms now outperform humans on tasks ranging from image recognition to language translation. However, sending video, audio, and other sensor data up to the cloud and back is too slow for apps like Snapchat, features like “Hey, Siri!”, and autonomous machines like self-driving cars. Developers seeking to provide seamless user experiences must now move their models down to devices on the edge of the network where they can run faster, at lower cost, and with greater privacy. This talk covers the reasons developers should be deploying deep learning models to edge devices, common roadblocks they will face, and how to overcome them. Bio: Dr. Jameson Toole is the CTO and cofounder of Fritz—helping developers teach devices how to see, hear, sense, and think. He holds undergraduate degrees in Physics, Economics, and Applied Mathematics from the University of Michigan as well as a PhD in Engineering Systems from MIT. His work in the Human Mobility and Networks Lab centered on urban and transportation planning. Prior to founding Fritz, Jameson built pipelines for Project Wing at Google[X].
- Hands-on Genomics Lab
Since the completion of the Human Genome Project in 2003, there has been an explosion in data fueled by a dramatic drop in the cost of DNA sequencing, from $3B for the first genome to under $1,000/genome today. This workshop will focus on the application of Apache Spark and related projects to life science challenges by way of touching upon GATK4 pipelines, Genotype-phenotype association tests, and population scale risk-modeling. ~~~~~~~~~~~~~~~~~~~~~~~~ Registration: https://pages.databricks.com/201903-US-UA-Genomics-Cambridge-Reg.html ~~~~~~~~~~~~~~~~~~~~~~~~ 8:30-9:00 Breakfast 9:00-9:50 Opening Remarks, Customer Use Case and Set-up 9:50-10:00 Break 10:00-10:45 Accelerating Variant Calls with Apache Spark 10:45-11:30 Characterizing Genetic Variants with Spark SQL 11:30-11:45 Break 11:45-12:15 Disease Risk Scoring with Machine Learning 12:15-12:30 Q&A
- Modern Tools for ML at Scale: Kubeflow and TFX
We're really excited to have Marc Romeijn from Spotify give our first talk of 2019! Logistics: - The talk will be held at Spotify's office in Somerville. See address info below. Doors will open at 6 and the talk will start at 6:30. - The room can hold a max of 100 people (seated and standing). We will leave the registration open and allow people in *until the room is full* - We will have pizza and drinks! - Please bring ID Abstract: This talk will focus on all the engineering aspects involved in Machine Learning at scale. A common warning shared with aspiring Data Scientists & ML engineers is that 90% of the work is about gathering, cleaning and validating data plus deploying and monitoring models. Yet for a long time most of the open source ML tooling focused on the modeling part. We will first give an overview of the different ML Engineering frameworks out there, both open and closed source. We will then focus in on Kubeflow Pipelines and TFX (Tensorflow Extended), both of which are open source, by giving an end-to-end example highlighting why these frameworks are incredibly powerful. Throughout the talk we will work to implement an end-to-end example, a deep neural network for predicting ad click-through rate (CTR) using a public dataset from Criteo. This example includes transforming and validating the data, training a model in a distributed way, validating and monitoring model performance and last but not least deploying the model. Bio: Marc Romeyn is an experienced Machine Learning engineer at Spotify. His primary focus at Spotify has been building ML systems to help music experts to create the best possible playlists that are consumed by millions of people every day. Prior to Spotify most of his work was focused on Natural Language processing.
- Scalable Pipelines and Email Health at HubSpot
Do bring an ID :) Schedule: 6:00 - 6:30 Food and Drinks 6:30 - 7:00 Session I 7:00 - 7:30 Session II 7:30 - 8:00 Schmooze ~~~~~~~~~~~~~~~~~~~~~~~~ Building HubSpot’s Email Anti-Abuse Model Abstract: This talk will be centered around HubSpot efforts in reducing email abuse through the appropriate use of machine learning. More specifically, Mukul it will cover the process of building the Email List Health model, which seeks to detect low quality email lists uploaded by customers that may yield high bounce rates. Along the way he will talk about dealing with class imbalance, using autoencoders to deal with noisy data, and model stacking to improve performance. Bio: Mukul Surajiwale is a Software Engineer on the Machine Learning Models team at HubSpot. ~~~~~~~~~~~~~~~~~~~~~~~~ Deploying Machine Learning Pipelines at Scale Abstract: In this talk, George will explain how HubSpot ships models to production. From generating data dumps for building models, to deploying them to production, building inference pipelines that use them and everything in between. He will also share a brief overview of our tech stack and some lessons we learned along the way. Bio: George Banis is the Technical Lead on HubSpot's Machine Learning Data team. He has been building the ETL platform that enables HubSpot to access features from multiple data stores in real time. He has also worked on high-throughput pipelines that now make billions of predictions each day.
- Interpretable Representation Learning for Visual Intelligence
PLEASE BRING an ID ! Schedule: 6:00 - 6:30 Food and Drinks 6:30 - 7:15ish Session 7:15 - 8:00 Schmooze ~~~~~~~~~~~~~~~~~~~~~~~~ Abstract: Recent progress of deep neural networks in computer vision and machine learning has enabled transformative applications across robotics, healthcare, and security. However, despite the superior performance of the deep neural networks, it remains challenging to understand their inner workings and explain their output predictions. My research has pioneered several novel approaches for opening up the “black box” of neural networks used in vision tasks. In this talk, I will first show that objects and other meaningful concepts emerge as a consequence of recognizing scenes. A network dissection approach is further introduced to automatically identify the emergent concepts and quantify their interpretability. Then I will describe an approach that can efficiently explain the output prediction for any given image. It sheds light on the decision-making process of the networks and why they succeed or fail. Finally, I will talk about ongoing efforts toward learning efficient and interpretable representations for video understanding, generative adversarial networks, and learning in virtual environment. ---- Bio: Bolei Zhou recently received his Ph.D. at MIT CSAIL. His research is in computer vision and machine learning, with a focus on visual scene understanding and interpretable deep learning. He received the Facebook Fellowship, Microsoft Research Asia Fellowship, MIT Greater China Fellowship. His research on interpreting deep networks was featured in media outlets such as TechCrunch, Quartz, and MIT News. He organized tutorials on deep learning for visual recognition and interpretable machine learning at CVPR’17 and CVPR’18 respectively, and co-organized workshops on Places Challenges at ICCV”17, ECCV’16, and ICCV’15. Detail of his research is at the webpage http://people.csail.mit.edu/bzhou/
- Architecting Recommender Systems
Schedule: 6:00 - 6:30 Food and Drinks 6:30 - 7:15 Session 7:15 - 8:00 Schmooze ~~~~~~~~~~~~~~~~~~~~~~~~ Abstract: The design of modern recommendation and personalization systems hinges on a few product considerations: how do users interact with your recommendations, what metadata is available, and how should recommendations feel. In this talk, we'll explore how these considerations impact the design of machine learning algorithms for recommendation, how to prototype and experiment with these systems, and how to manage their lifecycle in production. We will also discuss the use of open source tools for recommendation, such as TensorRec and LightFM. ~~~~~~~~~~~~~~~~~~~~~~~~ Bio: James Kirk is a Senior Machine Learning Engineer at Spotify where he develops core recommendation and personalization systems. He is particularly focused on the intersection of user experience and machine learning algorithm design. He is the primary maintainer of TensorRec, the open source recommender system framework. Prior to Spotify, James worked at Catalant, Quantopian, and Amazon Robotics.
- Skip-Thought Vectors & Optimization
Schedule: 6:00 - 6:30 Food and Drinks 6:30 - 7:00 Session I 7:00 - 7:30 Session II 7:30 - 8:00 Schmooze ~~~~~~~~~~~~~~~~~~~~~~~~ Session I: Skip-Thought Vectors for Financial Document Topical Analysis Abstract: In this talk, I will discuss a consulting project that I have been a part of with an insurance company in developing topic clustering across financial documents to help analysts find information. An avalanche of financial information is published every day, making it very difficult to manually process and find relevant documents. Using Skip-Thought Vectors, where sentences that share semantic and syntactic properties are mapped to similar vectors, one can train an end-to-end deep neural network for topical analysis. Using such a model, we created a system that auto-processes new documents every day, auto-highlights topics of interest to the company and allows them to quickly discover the competitive landscape of the market sector. Bio: Nikhil Murthy is a budding entrepreneur, researcher and consultant. As founder of Cognite Solutions, he consults for a variety of organizations on implementing deep learning solutions, with sectors ranging from health and energy to finance and automotive. By teaching the Coursera offering “An Introduction to Practical Deep Learning,” Nikhil has enabled thousands across the world to learn and explore deep learning. He has authored research papers in machine learning in the fields of robotics, finance and natural language processing, and presented his findings in multiple software conferences, most recently at CVPR in Honolulu. ~~~~~~~~~~~~~~~~~~~~~~~~ Session II: Everything You Wanted to Know About Optimization Abstract: In recent years the use of adaptive momentum methods like Adam and RMSProp has become popular in reducing the sensitivity of machine learning models to optimization hyperparameters and increasing the rate of convergence for complex models. However, past research has shown when properly tuned, using simple SGD + momentum produces better generalization properties and better validation losses at the later stages of training. In a wave of papers submitted in early 2018, researchers have suggested justifications for this unexpected behavior and proposed practical solutions to the problem. This talk will first provide a primer on optimization for machine learning, then summarize the results of these papers and propose practical approaches to applying these findings. Bio: Madison May is a co-founder and machine learning architect at indico, a local machine learning startup dedicated to solving big problems with small data.