Skip to content

Data Science at Zillow: The Zestimate® and Beyond

U
Hosted By
user 4.
Data Science at Zillow: The Zestimate® and Beyond

Details

NOTE: There will be an encore presentation of this meetup. If you are waitlisted for this one, or can't make it, there will be at least one more opportunity to catch this great presentation from the folks at Zillow.

We're kicking off 2015 with a great presentation from the folks at Zillow! The will be talking about the products they have built using R, Python and Graphlab.

Zillow’s mission is to empower consumers with information and tools to make smart decisions about homes, real estate and mortgages. At the heart of this mission is a living database of more than 100 million U.S. homes – including homes for sale, homes for rent and homes not currently on the market.

One of Zillow’s core innovations is its advanced statistical predictive products, including the Zestimate®, the Rent Zestimate and the ZHVI® family of real estate indexes.

The living database of all homes is built from a range of disparate sources, incorporating streams of deeds, house and parcel records, tax assessment data, listings of properties for sale or rent and mortgage information. Zillow users – home owners and professionals – add to this data on the Zillow web-site by entering information such as updating home facts and entering descriptive text and photos. GIS data is used to overlay information such as flood zone information, type of street, adjacency to waterfront and proximity to schools.

The real estate marketplace, with a rich but very messy information set, poses unique challenges in the world of “Big Data”. To harness the full power of the data, Zillow uses techniques as entity resolution, bad record filtering, attribute outlier screening and missing data imputation. The cleaned data is then input to statistical machine learning algorithms localized to each real estate market, leading to the calculation of the Zestimate and other valuations.

The data science team maintains a single software environment for the entire life cycle of the products, from prototyping to production. The primary analytic platforms used include R, Python and Graphlab, leveraging their rich statistical modeling and data analysis capabilities. We use predictive modeling at every stage in the process, from data integration to missing data to producing the Zestimate to user experience. Using feedback from data feeds, user input and manual reviewers, we are constantly refining and improving the models. This focus on advanced data analysis and statistical modeling has helped Zillow establish itself as the leader in the on-line real estate marketplace.

Speaker Bios

Nicholas McClure, Senior Data Scientist, Zillow

Nick is a senior data scientist for the Zestimate® group. He works on data cleaning projects such as predicting listing fraud, entity resolution and Zestimate quality. Prior to joining Zillow this year, Nick worked at Caesar's Entertainment in Las Vegas as a Gaming Statistician and Data Scientist. He has worked on everything from casino game design, optimal slot machine placement, and predicting customer worth. His team received the 2013 Chairman's Award in Rigor for their work on table game optimization. He received a NSF IGERT Fellowship for studying infectious disease while working on his Masters and Ph.D. in Applied Mathematics from the University of Montana where he studied the role of mutation in Cystic Fibrosis lung infections.

Yeng Bun, Senior Data Scientist

Yeng joined Zillow in 2006 and plays a major role in the production and quality of our Real Estate Market Reports. He also developed and implemented parts of the AVM that produces Zestimates. Prior to joining Zillow, he spent four years at Samurai EC and Lehman Brothers, where he developed parts of real-time systems for market makers to quote and trade stock options and FOREX with various exchanges. He has also worked at Insightful Corp. as a developer of statistical software and at QUEST Integrated, where he worked on research projects in areas of computational fluid dynamics, computer-aided engineering and water jet machining. Yeng holds a Ph.D. in Applied Math from the University of Washington.

Mike Babb, GIS Analyst

Mike Babb is currently a GIS Analyst with Zillow who specializes in the creation, discovery, and formalization of spatial relationships into machine-comprehensible data for input into the Zestimation® algorithm. In addition to working at Zillow, Mike is also a PhD Candidate in Geography at the University of Washington investigating missing data in population surveys with an emphasis on race and ethnicity questions. In addition to missing data, Mike is interested in the spatio-demographic variation in population processes such as migration, mortality, segregation, and political representation. Prior to joining Zillow full time, Mike was the administrator of the Northwest Census Research Data Center: a joint venture between the University of Washington and the US Census Bureau facilitating access to restricted economic and demographic microdata.

Photo of Python Data Science - Seattle - Bellevue group
Python Data Science - Seattle - Bellevue
See more events
Zillow
1301 Second Avenue, Floor 31, Seattle, WA 98101 · Seattle, WA