What we're about

Welcome to the Chicago Data Engineering Meetup group.

We are a group of data engineers, software developers, system architect interested in sharing practical experiences building complex data pipelines at scale. Also getting update to latest technologies and platforms for data engineering, highly focused on open source tools and cloud.

Join us for friendly tech talks where people share case studies, solutions and challenges faced by data engineering.

We follow a conventional format of 1 or 2 presentations from volunteers in the group and/or invited experts, and general conversation and socialising afterwards.

And, for sure, pizza and beers!

Some topics of interest are:

- Spark
- Streaming processing
- Graph Databases
- NoSQL
- Data Pipeline
- Big Data
- Advanced Analytics
- Software Engineering
- Real-world examples of data engineering applications

Call for presentations:

We provide a supporting environment to share your ideas and get feedback on your work. If you are interested in presenting, please contact organizer here on this meetup group.

Upcoming events (1)

Transfer learning with Spark + IBOSS Data Reduction Method

QuantumBlack has agreed to sponsor our February Meetup There will be couple of talks, lots of networking along with pizza and beer! Please bring an ID to get into the building. - - - - Agenda: 6:00pm - 6:30pm Meet and greet. Networking. 6:30pm - 7:00pm Transfer learning through Spark DL pipelines - Talk by Vishal Rajpal 7:00pm - 7:30pm On Data Reduction of Big Data - Talk by Min Yang 7:30pm - 8:00pm Q/A with Speakers and QuantumBlack - - - - Title: Transfer learning through Spark Deep Learning pipeline Speaker: Vishal Rajpal Abstract: Transfer learning is a research problem in machine learning that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem. For example, knowledge gained while learning to recognize cars could apply when trying to recognize trucks. With the amount of time and data required to train neural nets, transfer learning is becoming more relevant as we try to leverage existing models. We will look at the Spark Deep Learning pipeline with code snippets to understand available API options Speaker Bio: Vishal Rajpal is a Principal Data Engineer at QuantumBlack. He oversees architecture, information security and data engineering for analytics and machine learning development projects. His specializes in leveraging best practices of product and design engineering for faster analytics development and deployment. Prior to QB, Vishal had worked at Fractal Analytics, MSCI and Accenture. ---- Title: On Data Reduction of Big Data Speaker: Min Yang Abstract: The big data paradigm has drawn a significant amount of attention in recent years as costs of acquiring and storing data have plummeted. Instead, bottlenecks have been shifted to fast and in-depth analysis. However, this shift has created its own set of problems, the most obvious one is that large datasets are often computationally expensive to process. Proven statistical methods are no longer applicable with extraordinary large data sets due to computational limitations. A critical step in Big Data analysis is data reduction. In this presentation, I will review some existing approaches in data reduction and introduce a new strategy called information-based optimal subdata selection (IBOSS). Under linear and nonlinear models set up, theoretical results and extensive simulations demonstrate that the IBOSS approach is superior to other approaches in term of parameter estimation and predictive performance. The tradeoff between accuracy and computation cost is also investigated. When models are mis-specified, the performance of different data reduction methods are compared through simulation studies. Some ongoing research work as well as some open questions will also be discussed. Speaker Bio: Min Yang is Professor of statistics at University of Illinois at Chicago. Before he joined UIC in 2012, he worked at University of Nebraska-Lincoln as an Assistant Professor from 2002 to 2005 and University of Missouri as an Assistant and Associate Professor from 2005 to 2012. Min Yang received his PhD from UIC in 2002. His primary research area is subdata selection in big data analysis and optimal design of experiments, which is mainly supported by NSF. Min Yang has won the prestigious NSF CAREER award in 2008. He has published more than 10 papers in Annal of Statistics and JASA. Currently he serves as Associate Editors for five statistical journals including JASA and Statistica Sinica. --- Sponsors: This event is sponsored by QuantumBlack, a McKinsey Company.

Past events (1)

November Meetup

Mckinsey & Company

Photos (2)