• Morning chat about Data Engineering

    215 Lexington Ave

    We are fans of NYC-CoffeeOps. So we decided to start a similar meetup but for Data Engineers in NY at Insight Data Science Office. We will use the lean coffee process (https://www.meetup.com/nyc-kanban/pages/19274429/Running_a_Lean_Coffee) to help facilitate discussion. Stop by and bring your questions about Data Engineering tools, challenges, interviews, frameworks anything that you wanted to ask for a long time! If this event is successful, we will do it biweekly. Inspired by CoffeeOps in NY https://www.meetup.com/NYC-CoffeeOps

  • Life After Deployment: Maintaining Models in Production

    Insight Data Science

    *This is a joint event with DataIku: https://www.meetup.com/Analytics-Data-Science-by-Dataiku-NY Schedule: 6:30pm: Pizza + Beer networking 7:00pm: 10-minute talks from NVIDIA, Bloomberg, and Dataiku 7:30pm: Open Q&A Putting models into production is often seen as the completion of machine learning projects - but what happens post-deployment? This meetup will focus on this often underappreciated (and unpredicted) side of machine learning, addressing how models evolve and tackling the organizational and engineering challenges of maintenance, such as managing technical debt and compiling complexity. In a series of 10-minute talks with Twitter, Bloomberg, and Dataiku, we will discuss different industry approaches to model maintenance. These talks will be followed by a Q&A panel with all of the speakers - come ready with a question or two! Abstracts: Challenges and learnings to bring research into production by Nicolas Koumchatzky, NVIDA: As a manager or a large team working on ML problems, one issue I encountered was the difficulty to translate valuable research work into production systems. Researchers need flexibility and fast iteration, while production systems need safety, scale and robustness. In this talk, we will go over a few of the experiences I had over the years, and what I tried to do to improve the situation, with various degrees of success. System Design v. Infrastructure by Michael Burkholder, Bloomberg: When developing machine learning models for production, there is no one-size-fits-all recipe for system design. It's important to consider the engineering and product objectives early on in model development lifecycle, and to spend effort developing supporting infrastructure. In this talk, I will give an anecdotal case-study illustrating the modeling tradeoffs faced while developing an ML platform to compute the prices of financial instruments, and the infrastructure requirements to support the platform. Exploring and Preventing Technical Debt by Patrick Masi-Phelps, Dataiku: Patrick will discuss the concept of technical debt in production machine learning projects, a concept refined in 2015 by Google researchers (Sculley et. al), building on the software engineering concept introduced in the 1990s. Taking the extra time to simplify pipelines and account for changes in model inputs and configuration parameters can save time and mitigate risks of models in production. He'll talk theory then present a couple examples from clients in healthcare and aviation. Bios: Nicolas Koumchatzky started as a Quant, using models to evaluate the price of complex financial derivatives. He quickly joined a startup called DerivExperts in Paris to deliver that service to third-party buyers. After spending 5 years there as a manager, he embarked into another startup adventure at Madbits, focused on deep learning for image+text search, which was promptly acquired by Twitter. There, he developed deep learning models for image and spam filtering, moving on to create the first iteration of the first deep learning platform at Twitter called DeepBird. He then became a manager for the Twitter Cortex team, developing the ML platform with automation, better recommender systems and an improved version of the deep learning platform. A year ago, he joined NVIDIA as a Director of AI Infrastructure to build an ML platform to develop self-driving cars. Michael Burkholder received his PhD in Mechanical Engineering from Carnegie Mellon University, studying nonlinear, chaotic, and stochastic electrochemical systems. He leads an ML team at Bloomberg LP developing high-performance models and infrastructure to power Bloomberg's risk analysis engine. Michael enjoys roasting his own coffee and listening to vinyl. Patrick Masi-Phelps is a Data Scientist at Dataiku, where he helps clients build and deploy predictive models. Before joining Dataiku, he studied math and economics at Wesleyan University

  • Databricks Hands-On Workshop at Insight

    Insight Data Science

    *This is a joint event with DataCouncil.ai: https://www.meetup.com/DataCouncil-AI-NYC-Data-Engineering-Science/ Hosted by: Keira Zhou - DataCouncil.ai Organizer, Data Engineer @ Capital One Masha Danilenko - Data Engineering Lead @ Insight Layla Yang - Solutions Architect @ Databricks, Inc. Summary: Curious what's new with Databricks? Want extremely valuable hand-on experience? Interested in learning more about Databricks usage at other companies? Then this is a perfect event for you! Agenda: -Delta Lake Workshop -MLflow Demo -Real-world Spark implementation Tech talk by Eyeview NOTE: Please bring your laptop computers—this is a very hands-on event!

  • Acing Your Data Engineering Interview

    Insight Data Science

    *This is a joint event with DataCouncil.ai: https://www.meetup.com/DataCouncil-AI-NYC-Data-Engineering-Science/ ------------------- Hosted by: ------------------- Keira Zhou - DataCouncil.ai Organizer, Data Engineer @ Capital One Masha Danilenko - Data Engineering Lead @ Insight Helena Zhang - Data Engineering Program Director @ Insight ------------------- Summary: ------------------- Do you leave interviews with all the answers you couldn’t think of under pressure? You’re certainly not alone—technical interviews are the toughest part of the job process for many. In this workshop, Data Engineering Program Directors at Insight, who have interviewed hundreds of candidates and prepared hundreds more to get their dream jobs, will use walkthroughs, activities, and resources to give you tips for whiteboarding, live coding, and remote Data Engineering interviews. We’ll cover how to best prepare and perform at each type of interview, ranging from algorithms to system design, SQL, and the essential behavioral component. As a bonus, Helena and Masha will organize an interactive part where you will play the interviewer and interviewee and experience the whole spectrum of interview roles. Whether you’re thinking about entering the job market or an interview veteran looking for jobs in Data Engineering or related fields, this workshop can help you succeed. ------------------- Itinerary (2 hrs): ------------------- Arrival/mingling: 20 mins Introduction: 10 min The hiring process workflow Format and logistics of interviews What to expect System design whiteboard demonstration: 30 min Q+A: 5 min Interactive small group interviews: 50 min Wrap-up and Insight plug: 5 min

  • Personal Branding: Find your unique value and your audience.

    Insight Data Science

    This interactive workshop will provide strategies to help identify your personal brand — where your passion, personality, and strengths align — plus some ways you can promote your brand and grow your career. Important: to attend the workshop register at the AnitaB.org platform: https://community.anitab.org/event/anitab-org-nyc-presents-workshop-personal-branding-with-insight-data-science-find-your-unique-value-and-your-audience/ Workshop Lead: Stephanie Mari is the Head of Coaching and Development at Insight Data Science where, for the last three years, she’s worked with Insight Fellows to help them identify their unique professional value and transition to new roles in data science and engineering. She also works closely with the Insight team and alumni community to provide coaching and training on professional skills related to effective communication, strong collaboration, and self-management. Agenda: 6:00 – 6:30: Check-in; registration; networking; 6:30 – 6:35: Welcome words from AnitaB.org and our host and partner Insight Data Science 6:35 – 8:00: Interactive Workshop 8:00 – 8:30: General networking 8:30: End of event

  • Beyond Data Engineering: Careers in AI + DevOps

    35 E 21st St

    *This is a joint event with DataCouncil.ai: https://www.meetup.com/DataCouncil-AI-NYC-Data-Engineering-Science/ Schedule: 6:00 - Doors & Food 6:30 - Talk 1 7:15 - Talk 2 7:45 - Wrap & Chat Talk 1: Lessons from Building and Deploying AI Systems - Speaker: Chuck Yee, ML Research Engineer @ Bloomberg - Abstract: AI has experienced unprecedented hype over the past seven years, but what is the reality of applied AI within a business ecosystem? In this talk, I’ll be sharing stories of success and failure building and deploying AI systems, and personal observations from my time in the industry. - Bio: Chuck-Hou is an Insight Fellow who recently joined Bloomberg’s NLP team after gaining experience building deep learning models for x-ray interpretation at Imagen Technologies. Talk 2: CICD From the Ground Up - Speaker: Max McKittrick, Data Engineer, ECI @ Capital One - Abstract: At Capital One, ECI team manages a clickstream application, and pipeline durability is a critical consideration. In this talk, I want to discuss our lessons learned, and how we have improved our CICD practices over the past few months. - Bio: Max McKittrick is a data engineer at Capital One, where he started after completing Insight Data Engineering New York. He received his MS from the University of Illinois and worked as a researcher and data consultant prior to attending Insight.