What we're about

Houston Data Science is a Meetup group by and for Houston's data science community.

Our goal is to foster Houston's data science community by providing a forum for learning, teaching, and networking. We welcome both beginners and experts, as well as those simply interested in learning more about data science and the Houston data science community.

Upcoming events (2)

Intro to Data Science Bootcamp

7625 Katy Fwy

Register on the Dunder Data website (https://www.dunderdata.com/event-info/intro-to-data-science-bootcamp). This is a paid event. Target Student This course is targeted for those that are beginning their data science journey in Python. It is expected you are comfortable with the fundamentals of the Python programming language. For those that still wish to take this course but lack prior Python knowledge, you can take the Intro to Python Bootcamp, which takes place the week before May 6-10 or complete a precourse assignment on your own. Expert Instructor This course is taught by Ted Petrou, an expert at data exploration and machine learning using Python. He is the author of Exercise Python, Master Data Analysis with Python, and Pandas Cookbook. Course Objective The major course objective is to teach 'end-to-end' data analysis. By the end of the course, you will be able to examine datasets and produce an investigative report that mixes code, visualizations, and text. Syllabus Day 1: Essential Commands The pandas library is confusing for beginners as it offers a variety of different commands to accomplish the same task. You will learn a small yet powerful subset of pandas that allows you to maximize your data analysis capabilities without getting dragged down by syntax. Day 2: Grouping Data Splitting data into independent groups is one of the most common and fundamental operations you can apply to your data. A variety of different grouping tasks will be covered. By the end of day 2, you will have enough firepower to complete a project where you ask and then answer your own data analysis questions. Day 3: Tidy Data Tidy data is a structure of data that makes further data analysis easier. A variety of "messy" datasets will be tidied and compared to their original form. Day 4: Time Series and Regular Expressions Pandas provides excellent time-series functionality that will be covered. Also, lots of raw data requires cleaning by finding and extracting information within text. Regular expressions can be enormously helpful and will be covered. Day 5: Exploratory Data Analysis You will learn a powerful routine for proceeding through a data analysis. You will also learn how to make informative and elegant data visualizations using Matplotlib and Seaborn. Precourse Assignment A precourse assignment on the fundamentals of Python will be provided along with an introduction to Pandas that must be completed before the start of the course.

Intro to Machine Learning

7625 Katy Fwy

Register on the Dunder Data website (https://www.dunderdata.com/event-info/intro-to-machine-learning-5). This is a paid event. Target Student This course is intended for those who are familiar with the Python data science ecosystem but have not yet explored machine learning. Knowledge of the fundamentals of Python is a required pre-requisite for the course. For students without these skills that would still like to participate, a thorough pre-course assignment is available that covers these fundamentals. Major Course Objective This course will dive deep into a single machine learning problem. The major course objective is to learn the concepts and tools of an entire machine learning workflow so that you can apply it to any future problem. We will be using the scikit-learn library, an excellent tool for doing machine learning in Python. Day 1: The Machine Learning Model with Scikit-Learn • Use any machine learning model in scikit-learn with its Estimator objects • Use the three-step process common to all machine learning Estimators - Import, Instantiate, Train • Explore the details of linear regression, k-nearest neighbors, decision trees, and random forests • Build these models from scratch and then with scikit-learn Day 2: A Modern Machine Learning Workflow in Scikit-Learn • Create a modern and comprehensive workflow for building an end-to-end machine learning solution • Properly evaluate your model with a variety of different scoring metrics • Select the best model through hyperparameter grid searching • Impute missing values, transform your data, and engineer new features • Use newly released tools to integrate Pandas with Scikit-Learn • Build a complex pipeline to contain the entire workflow • Persist models onto disk so that they may be used later with new data

Past events (62)

Intro to Machine Learning Bootcamp

The Cowork Lab


Photos (85)