Hamilton: An Open Source Python Micro-Framework for Data / Feature Engineering
At Stitch Fix, we have 130+ “Full Stack Data Scientists” who, in addition to doing data science work, are also expected to engineer and own data pipelines for their production models.
One data science team, the Forecasting, Estimation, and Demand team, was in a bind. Their feature generation process was causing them iteration & operational frustrations in delivering time-series forecasts for the business. In this talk, I’ll present Hamilton, a novel open-source Python micro framework, that solved their pain points by changing their working paradigm.
Specifically, Hamilton enables a simpler paradigm for Data Science & Data Engineering teams to create, maintain, execute, and scale code for feature/data transforms, especially when there is a chain of them. Hamilton does this by building a DAG of dependencies directly from Python functions. Tune in to hear what Hamilton is, what it looks like to use it, what benefits it provides, and where it's going.