Что мы из себя представляем
Предстоящие мероприятия (3)
Talk Title: Scalable Machine Learning Pipelines with Dask **Coffee is served at the in-person seminar** A recording of the talk will be posted afterwards on our YouTube channel at https://www.youtube.com/channel/UCN0kf0sI01-FXPZdWAA-uMA Jason Carpenter is a Machine Learning Engineer at Manifold, an artificial intelligence engineering services firm with offices in Boston and Silicon Valley. He has experience delivering machine learning and data engineering solutions that are integrated into core business strategy for Manifold's clients. The clients he's engineered solutions for range from developers of web and mobile applications to manufacturers of industrial hardware. Prior to joining Manifold full-time, he acquired his Master of Science in Data Science from the University of San Francisco, while simultaneously developing two open-source python packages. His package swifter, which automatically decides the quickest way to apply any function to a pandas dataframe, relies on Dask as it's workhorse for parallel processing. He previously co-presented this talk at AnacondaCon 2019. Talk Description: Dask is a powerful library within the PyData ecosystem. There are a number of great resources on how to use Dask for parallel processing, from documentation and blog posts to tutorial videos. However, we noticed that there is not yet any comprehensive resource specific to the applications of Dask in machine learning pipelines. This talk aims to fill that gap. Dask is useful in various stages of machine learning pipelines, from data preprocessing to hyper-parameter tuning. We will present a unified approach for the application of Dask in ML workflows that can help you build scalable ML pipelines. We will focus on a case study where the goal is to classify journal papers into different topic categories. Key audience takeaways will include: 1. How to identify challenges that can be addressed using Dask in Machine Learning 2. A set of design patterns for applying Dask to Machine Learning workflows 3. A set of examples with code, taken from real-world applications
Title: From Data Points to Data Dan: Combining Log Analysis, Survey Analysis and Interviews to Segment Google Analytics Customers Abstract: Google Analytics has a wide user base, from hobbyist bloggers to employees of Fortune 100 corporations. In order to better understand our users, and to get more precision around the proportion of each user type that make up our customer base, we embarked on a customer segmentation project. This long-term research project used both qualitative and quantitative methods to scope and define customer “use cases,” or particular tasks that directed the front-end interactions of a user’s session. Our quantitative approach consisted of collecting all front-end user interactions, and performing Latent Dirichlet Analysis to arrive at groupings of 25 use cases, as well as conducting a survey to investigate how users’ background impact their usage. In parallel, our qualitative approach included over 50 subject interviews to understand what use cases were important from the user’s perspective. We used this research, along with product subject matter experts, to help assign labels to each of our use case parameter groupings. Using the labeled LDA topics, we measured engagement by user across each, and performed k-means clustering on individual users to arrive at 12 user segments. The qualitative interpretation of these clusters through 40 interviews led to a set of personas, which will provide further inspiration for product development. Sundar Dorai-Raj is the lead data scientist and manager for Google Analytics, a free web and app measurement service for advertisers and publishers. Since 2009, Sundar has held similar roles within Google at YouTube, Video Ads and Fiber. He has a Ph.D. in Statistics from Virginia Tech and a Masters in Applied Math from the University of Alabama.