- The Battle of Ephemeral Environments
Welcome back to another FSE event! This evening will focus on solving a problem that all growing engineering teams have - the sharing of a limited number of development environments. We've got two engineers who've helped solve the same problem, and we'll learn about both of their teams' approaches! Read on for more. ~ Talk #1 ~ Make QA Easier with Wonqa, A New Tool to Create Branch-Specific QA Environments in AWS (~20 mins) Jules Terrien, Software Engineer @ Wonder Jules is a software engineer at Wonder where he's worked on the full stack from React components to NodeJS applications and infrastructure projects. Before Wonder, Jules was at Nova Credit, a fintech startup based in San Francisco and prior to becoming a software engineer, Jules helped start a couple startups in and outside of tech. Talk Abstract: Wonqa is an open source library released by the Wonder team last year to help teams easily create QA environments. Prior to writing and using Wonqa, the Wonder team operated with a single QA environment which caused a number of problems as engineers had to synchronize deploys to avoid conflicts and allow product teams to QA accurately. With Wonqa, Wonder engineers can create QA environments per branch, hosted on custom domains, at the click of a button. Wonqa uses a number of awesome tools to make this possible including AWS, Docker and DNSimple/Certbot. ~ Talk #2 ~ Ephemeral Dev Environments at Greenhouse with Dajoku (~25 mins) Paul Alvarez, Software Engineer @ Greenhouse Software Paul Alvarez is a developer of internal tools and services at Greenhouse. He didn't coin the name "Dajoku" but he likes it very much. He enjoys playing guitar and bass in progressive rock bands. Talk Abstract: This talk will cover Greenhouse’s solution to a problem all growing engineering teams face at some point. You’ve got an increasing number of developers simultaneously working on more and more branches of code, with a QA team to match. But what happens when the number of available environments stays constant? Enter Dajoku. With the ability to scale up and down pre-seeded environments within minutes, Slack messages like "@channel any of the dev1-dev9 envs free?" are no longer required to get code changes in front of others.
- Stack Magic with GraphQL
We're up and running in our new office! Thanks for the patience as we've made the move. Come join us in breaking in the new digs for a night focused on the incredibly powerful GraphQL. ~ Presentation #1 ~ Resource Limitations in GraphQL: Trials and Tribulations (~30 mins) Drew Brien, Senior Software Engineer @ Greenhouse Drew is a Senior Software Engineer working on APIs at Greenhouse Software. A Boston native, Drew moved to NYC about 7 years back and has been working in software since. He enjoys snowboarding and watching Boston sports teams collect trophies. Talk Abstract: Drew will be discussing the concept of resource limitations in GraphQL APIs. While building the GraphQL API for Greenhouse's Onboarding product, the Engineering team had to find a solution to the following problem: how do you prevent consumers from overwhelming the API infrastructure? While most REST APIs solve this issue by implementing a standard rate-limiting algorithm, this approach isn't sufficient to protect GraphQL APIs. We will chat about how REST and GraphQL APIs differ in this regard, and we will compare a few solutions that we considered. ~ Presentation #2 ~ Architecture of scalable and resilient NodeJS apps with GraphQL & event-driven serverless (~35 mins) Tanmai Gopal, Co-Founder @ Hasura Tanmai is the co-founder of hasura.io. He is a polyglot developer whose areas of interest and work span React, GraphQL, NodeJS, Python, Haskell, Docker, Postgres, and Kubernetes. He is passionate about making it easy to build things and is the instructor of India's largest MOOC, imad.tech, with over 250,000 students. Talk Abstract: The talk will cover how state of the application can be architected to be stored in the database itself and how updates to state can be used to build reactive user interfaces which update in real-time, with GraphQL Subscriptions and live-queries. We will then look at how serverless functions can be used to execute business logic and how these functions can be triggered on database events, which are updates to this state. In short it will cover the different architecture patterns, open-source tools used, code-samples, observed benefits, pros/cons, and how this pattern fits into the larger GraphQL and serverless revolution that we are undergoing.
- Text Stack 2.0
Got an exciting evening for you all! This one goes out to all you Vim vs Emacs enthusiasts out there. We've got two speakers who will be presenting pretty interesting approaches (serverless and distributed stacks) to building a browser-based IDE/text-editor. Read on for more, and hope to see you there! ~ Presentation #1 ~ Building Conclave: A decentralized, real-time, collaborative text editor for the browser (~30 mins) Sun-Li Beatteay, Software Engineer @ DigitalOcean Sun-Li is a software engineer working at DigitalOcean on their Spaces product. He's a recent transplant, having moved to NYC only a year ago from Seattle. He's also an active writer on Medium where he posts tutorials and articles about technology (https://medium.com/@SunnyB). Talk Abstract: Sun-Li will be talking about his experience building a decentralized collaborative text editor, Conclave, on a remote team. This talk will focus on the challenges that the Conclave team faced and their solutions. These topics include how to create a decentralized application using modern browser technology, maintain consistency in a distributed architecture, and cheaply scale a real-time application to handle many concurrent users. For anyone interested in dApps, distributed systems or fun open source projects, you won't want to miss this presentation. ~ Presentation #2 ~ Building a Browser-Based Serverless IDE using Docker and Websockets (~20 mins) Kareem Amin, Co-Founder @ Clay Kareem is a co-founder of Clay, a rapid development platform that allows you to automatically pull data from the web and the SaaS services you use into a spreadsheet-like UI so you can quickly build flexible tools that automate your work. He's previously led product and development teams at Microsoft, Sailthru, and News Corp/WSJ. Talk Abstract: We'll walkthrough how to simulate AWS lambda using Docker containers to run user submitted code that is written in an IDE in the browser. We'll motivate this discussion with the challenges of developing Lambda functions today and discuss what the future of software development looks like with serverless technology.
- Stack Wars: The Slack Awakens
A long time ago in a galaxy far, far away...a world exists in which our whole lives are run through Slack. We're not there yet, but these two presenters are leading us in that direction! Come join us to learn about two really cool integrations built on top of Slack. ~ Presentation #1 ~ CI for CI! How re-imagining Continuous Integration within a Chat Interface helps Troops to ship and iterate on cutting edge product at breakneck speeds (~30 mins) Scott Underwood, Software Engineer @ Troops.ai Scott Underwood is an engineering leader at Troops.ai, an NYC startup building artificial intelligence and workflow automation for Sales and Customer Success teams. Scott has over 8 years of experience in JVM based languages and has led engineering teams in ad-tech, modeling & simulation, and computer vision. Talk Abstract: Scott will walk through the challenges a growing engineering team faces to set up a scalable continuous integration solution, and how this so often leads to a Rube Goldberg CI pipeline. We will explore how Troops used chat ops to simplify its CI process, increase engineering transparency, and alleviate many headaches born out of managing and scaling multiple CI subsystems. ~ Presentation #2 ~ Slack Maestro: Helping Users Stay on Topic (~20 mins) Andrej Ficnar, Data Scientist @ Schireson Andrej is a data scientist at Schireson Associates, a strategic data science consulting firm focusing on media and advertising. There, he employs various modeling approaches to build custom data solutions for clients, and manages current data products. Previously, Andrej worked at Columbia and Oxford as a theoretical physicist with a focus on applied string theory. Talk Abstract: Andrej will talk about building a smart bot for Slack that learns each channel’s topic and warns users if they go off topic. The bot relies on an implementation of a new NLP method introduced at NIPS 2015, called Word Mover’s Distance. We'll discuss the brains behind such a bot, some practical challenges in building it, and also see a live demo of the bot in action.
- Stacky McStackface
The internet has spoken - Stacky McStackface is happening! We hope you can join us for a night of data analytics stacks and tools. Once things wrap up, if you'd like to continue to talk shop we'll head to Lillie's on 17th & 5th. ~ Presentation #1 ~ Data Stack: 0 to 100 in under a week (~25 mins) Greg Ratner, Co-Founder & CTO @ Troops.ai Greg Ratner is co-founder and CTO of Troops.ai, an NYC startup building artificial intelligence and workflow automation for Sales and Customer Success teams. Greg is a serial entrepreneur with passion for functional programming who has over 15 years of experience building and leading teams and scaling distributed software systems. Talk Abstract: Greg will share a practical guide on how to build a full data engineering stack from the ground up for a growing startup. We will talk about the challenges of setting up business intelligence and data engineering solutions rapidly and discuss 3rd party tools available to simplify this process. At the end of this talk you will be armed with a blueprint to allow data-centric decision making in your organization and empower your business users with data. ~ Presentation #2 ~ Mining Precision Interfaces from Query Logs (~30 mins) Thibault Sellam, Data Science Researcher @ Columbia University (http://sellam.me/) Thibault Sellam is a postdoctoral scientist at Columbia, working on data exploration, human-in-the-loop machine learning and more generally anything that involves data management, AI and people. Previously, Thibault was a PhD student at the University of Amsterdam (the Netherlands) where he studied data mining, and he took several breaks to work in the industry (Microsoft Research, JPMorgan) and tour with a French pop band. Talk Abstract: Interactive tools make data analysis both more efficient and more accessible. Yet, designing interactive interfaces requires technical expertise and domain knowledge. Experts are scarce and expensive, and therefore it is currently infeasible to provide tailored (or precise) interfaces for every user and every task. In this talk, I will present a data-driven approach to generate tailored interactive interfaces. I will introduce Precision Interfaces, a system that examines an input query log, identifies how the queries change, and generates interactive web interfaces to express the changes.
- Stacksgiving - The Time (Series Data) of Your Life
"Stacksgiving" will be a night focused on work with time series data. Queue some Green Day, because I hope you have the time (series data) of your life! #dadjoke We'll have some Thanksgiving-themed eats available. The cranberry sauce will be flowing and delicious side dishes plenty. Presentation #1: Time Series on a Time Crunch (~25 mins) Fiona Condon, Search Engineer @ GIPHY Fiona Condon is an engineer on GIPHY's Search and Discovery team, working to help you find the best GIFs. Before GIPHY, she worked on search ranking at Etsy, helping you find the best gifts. She co-hosts a weekly online radio show out of a shipping container in Bushwick. Talk Abstract: Designing new infrastructure at scale is a challenge—doing it on a tight schedule is plain hard. Architecting to avoid operational surprises and building for the right kind of flexibility requires a combination of technical pragmatism and effective human communication. Using GIPHY’s user analytics launch as a case study, this talk will cover some best principles for engineering low-risk time series indexes in Elasticsearch for uncertain load, and detail how we planned for foolproof backfills to adapt to changing requirements. I’ll also share some learnings from our effective short-term cross-team collaboration. Presentation #2: TimescaleDB: Re-Imagining PostgreSQL for Time-Series Data (~35mins) Mat Arye, Software Developer @ TimescaleDB Mat has been working on data infrastructure in both academia and industry. As one of TimescaleDB's core architects he works on performance, scalability, and query power. Previously, he attended Stuyvesant, The Cooper Union, and Princeton. Talk Abstract: Today everything is instrumented, generating more and more time-series data streams that need to be monitored and analyzed. When it comes to storing this data, many developers often start with some well-trusted system like PostgreSQL, but as their data hits a certain scale, give up its query power and ecosystem by migrating to some NoSQL or other "modern" time-series architecture. They face the traditional trade-off: query power or scale. This perceived trade-off isn't necessary. We leverage the nature of time-series workloads -- inserting new data about recent events and rarely making updates -- to scale PostgreSQL for time-series data. This is achieved by automatically partitioning data. However, the user does not need to worry about this partitioning and can use all-of-SQL (e.g., secondary indexes, rich query predicates and group bys, aggregations, windowing functions, CTEs, JOINs). I’ll present performance benchmarks that show TimescaleDB scales much better than PostgreSQL for time-series workloads involving billions of row, even on a single node. TimescaleDB is a PostgreSQL extension (Apache 2 license).
- Stack to School
We've got two awesome presentations for you! Check below for more details. Also, we're switching it up on you this month! We'll be hosting at a new location - so try not to show up to Greenhouse's office :) Presentation #1: The anatomy of Zocdoc’s Patient Powered Search (~30-45 mins) Brian D'Alessandro & Pedro Rubio from Zocdoc Brian is the Head of Data Science at Zocdoc, an online doctor marketing place and booking tool, and is also an Adjunct Professor for NYU's Center for Data Science graduate program. Prior to Zocdoc, Brian was VP of Data Science at Dstillery, an online advertising firm. Brian is a veteran data scientist and leader with over 15 years of experience developing machine learning driven practices and products. Brian holds several patents and has published dozens of peer reviewed articles on the subjects of causal inference, large scale machine learning and data science ethics. Brian is also the drummer for the critically acclaimed indie rock band Coastgaard. Pedro is the Head of Search Engineering at Zocdoc, and online marketplace and booking tool. Pedro is a technology lead with over 15 years of experience dealing with operating systems, electronic trading systems, and full stack web development. Pedro’s specialty is taking disparate and complex systems and simplifying them to create powerful solutions. He’s passionate about plotting paths forward through the toughest problems (like simplifying the process to find a doctor!). Pedro is not in a rock band, but has had many rocks. Talk Abstract: The anatomy of Zocdoc’s Patient Powered Search - a peak into the architecture and machine learning that powers Zocdoc’s doctor discovery and booking marketplace. Most physician search systems require patients to know exactly what they’re looking for, either in terms of the appropriate specialty for a given condition or the medical terminology to describe the condition. At Zocdoc, we have built a patient friendly search system to power our core doctor discovery and booking platform using various products from the AWS stack and custom Machine Learning pipelines. This talk will focus on the anatomy of our Patient Powered Search and will cover both the architecture and algorithms that enable us to go from "ear ache" to "otolaryngologist." Presentation #2: The Hows & Whys of a Distributed SQL Database (~30-40 mins) Vivek Menezes from Cockroach Labs Vivek has worked as a developer and engineering manager since the dotcom boom. He joined Google in the early 2000s, where he had the good fortune to work on cloud infrastructure projects including frontend load-balancing, web crawl/indexing infrastructure, Borg, and the search engine. In 2015, he left Google to join Cockroach Labs because he felt there was a need for a new database platform. Since joining the CockroachDB team, he has become the acting Guru for online schema changes. Yes, CockroachDB supports schema changes without downtime. When he's not at work, he's got his nose in a book or has his hands full chasing his sons around Prospect Park. Talk Abstract: The Hows & Whys of a Distributed SQL Database Developers have had to deal with some serious tradeoffs when picking a database technology. Legacy databases can’t simultaneously meet both the scale and data integrity requirements of distributed applications, leaving developers to build transaction workarounds, complicated sharding schemes, and any number of development hacks to keep their applications functioning for a global user base. CockroachDB was built to solve exactly these problems. The SQL database for building global cloud services, CockroachDB guarantees correctness at scale by using self-organizing nodes to form a high-availability data layer that can span private and public clouds.
- Pot Calling the Kettle Stack
Come one come all! It's time for the next FSE Meetup. Here's this event's speaker list. Speaker #1: Matt Rogish (@MattRogish) from ReactiveOps (~25 mins) Matt has worked as a programmer, DBA, was CTO for The J. Peterman Company, helped create an awesome open-source mobile application framework as Director of Development for Toura, and built an amazing development team and disruptive financial application as CTO at Funding Gates. He was Director of Product for Single Platform (a Constant Contact company), helping the product team (Design, PM, Engineering) be more awesome and was the CTO of Rails Machine, a Ruby on Rails hosting company. Now, as co-founder and CEO of ReactiveOps (https://www.reactiveops.com (https://www.reactiveops.com/)), a DevOps-as-a-Service and Kubernetes/AWS/GKE consulting company he’s lead growth from zero to 14 people (and growing!), built a company/product/strategy, managed P&L, and (so far) have kept it from cratering into the ground. Talk Abstract: Kubernetes at a high level In this talk, Matt will discuss Kubernetes at a high level, how it compares to other Docker orchestration frameworks, why we like it, and a quick survey of our open-source thing called Pentagon and K8scripts that allow you to super-quickly get a production / staging / test clusters up and running in AWS/VPC with little effort. At the conclusion of this discussion, you’ll have much more info as to when Kube works and when it doesn’t, and why you’d want to use it at your company. Speaker #2: Evan Jones from Bluecore Evan is a software engineer at Bluecore in New York. Previously he worked at Twitter, and was a co-founder of a failed startup. He gets obsessed with technical problems that he doesn’t understand, and writes about them on his web site so he never has to think about them again. Talk Abstract: Data corruption at Twitter TCP has a checksum and Ethernet has a CRC, both of which detect corrupt data. The math says this should make it extremely unlikely for applications talking to each other inside a data center to receive corrupt data. Unfortunately, it still happens. When it happened at Twitter it took a team of a few dozen people and multiple days to clean up the mess. I’ll talk about how we discovered that our applications were receiving corrupt data, and what we did to stop the bleeding. Then I’ll walk though a high-level overview of TCP, Ethernet, and switches to show how corruption can sneak past the CRC and checksum defenses. Finally, I’ll describe how you can protect your applications by adding a strong CRC or using encryption.
Nabil Ahmad: Redirecting to Infinity I'll be discussing an issue we had recently where crawler traffic, an application bug, and an unknown server optimization resulted in a website gradually returning infinite 301 redirect loops for every request. Nabil Ahmad is the Chief Technology Officer for Dotdash. Prior to joining Dotdash in 2013, Nabil worked at Barnes & Noble serving as the head of technology for BarnesAndNoble.com. Rick Mangi: Surviving the Battles in the War Against the Machines Humans are inferior to computers, but our hubris encourages us to push the limits of what they can do to the point where we find ourselves in a constant state of battle against them. As tecnologists, we are continuously at war with our machines, code and the chaos of the internet. Once we recognize our own inferiority and the futility of our plight we can begin to make intelligent decisions which allow us to win individual battles and perhaps survive to fight another day. Through a few tales, some interesting quotes and a bit of insight I will hopefully help you to survive the next skirmish. Rick Mangi is the Director of Platform and DevOps at Chartbeat. He has been fighting the battle for 20ish years with a variety of startups and big faceless corporations. Kenton Jacobson: Mistakes happen Even the best laid plans will see fundamentally surprising incidents arise. Kenton Jacobsen explains how to fail quickly and quietly, and fix things even faster, by drawing from his time at TheBlaze and Vogue when things didn’t go quite as expected. Kenton is Director of Engineering for Vogue, Glamour and GQ leading teams of engineers working on Isomorphic JS, React, GraphQL, and PHP. Before this, Kenton led engineering for the viral news website, TheBlaze. In his free time, Kenton is into philosophy, rock climbing, and Internet memes.