Machine Learning project demo day and Open house day


Details
Presented by NYC Data Science Academy students who is about to finish 12 weeks full time program, apply for Jan 2016 program to be a Data Scientist (http://nycdatascience.com/data-science-bootcamp/).
++++++++++++++++++++++++
During this event you will see some of the best of the best Machine learning projects created by NYC Data Science Academy 12 weeks Data Science bootcamp students. You will experience the great concept and its ability to help solve complicated projects.
You will also have an opportunity to meet our bootcamp students and find out more about what it is like to be a student at NYC Data Science Academy and gain a overview of our program. Join us for some data wrangling tips, fun facts and lively discussion.
Event schedule:
6:30-7:00 Arrival and pizza, Open house Q&A
7:00-8:30 the best five pieces of works generated through this bootcamp will be presented, with interesting findings explore
8:30-9:30 Network and meet our students.
====================
Project 1: Hillary Clinton Email Analysis
Project Description: Throughout 2015, Hillary Clinton has been embroiled in controversy (https://en.wikipedia.org/wiki/Hillary_Clinton_email_controversy)over the use of personal email accounts on non-government servers during her time as the United States Secretary of State.
Thanks to the Freedom of Information Act, on Monday, August 31, 2015, the State Department released nearly 7,000 pages of Clinton's heavily redacted emails.
John Montroy, Jake Lehrhoff and Chris Neimeth took those emails and wrangled them for your exploring pleasure. During our presentation they will cover the tasks involved with munging and analyzing this data, including NLTK, sentiment analysis, MYSQL, Python, Flask, an AWS instance and lots of elbow grease.
The project output includes:- A dashboard to enable exploration of Hillary’s emails
-
Displays of topics by sender and recipient
-
Sentiment analysis of emails
Speaker Bio:
Chris Neimeth is a serial entrepreneur in the technology, media and entertainment businesses.Chris has served in various strategic roles: CEO of Salon Media Group Inc., President of IAC Partner Marketing, Executive Vice President of Ticketmaster, President/CEO of Real Media, Chief Commercial Officer of Daylife, Senior Vice President for The New York Times Company Digital, and founder of Grey Interactive.He has twice served as member of the Aspen Institute Forum on Communication and Society, and is a two time elected Director of the Interactive Advertising Bureau. Projects: http://blog.nycdatascience.com/uncategorized/mass-shootings-in-america/
Jake Lehrhoff is a man of many hats. For six years he taught middle school English and chaired the department at a school for children with moderate-to-severe emotional and behavioral disorders. He developed a system of intradepartmental supervision to monitor the efficiency and effectiveness of the billing department of a rheumatology laboratory. He wrote a novel about an autistic boy and edited the memoire of a triathlete. Jake holds a BA in psychology from Wesleyan University and an MA in psychology from Brandeis University, where he studied quantitative research methods and statistics and graduated with a perfect GPA. Jake takes great satisfaction in solving problems and is excited to apply his knowledge of machine learning and skills in R and Python to tackle new challenges. Blog: http://blog.nycdatascience.com/author/jake.lehrhoff/
John Montroy is a graduate of Middlebury College with a B.A. in Physics. After a summer of particle physics at CERN with the Harvard ATLAS team, he began his career as a data analyst in the auto industry. He has been programming since the age of 12, and delights in clean, re-usable, and functionally-oriented code. A self-starter and curious thinker, his interests run the gamut from mathematics to classical music. In his spare time, he can be found playing piano or mandolin, singing barbershop, and running. github: https://github.com/jmontroy90/teamhrc blog: http://blog.nycdatascience.com/author/jmontroy90/
-------------------
Project 2: Startup VS Venture Capital matchmaker app
Project description: Starting a company is hard - figuring out how to fund the company can be even harder. M3 is an application dedicated to making the funding process easier. We use sophisticated matching algorithms based a "fingerprint" of a startup - we can run a startup's fingerprint against our database of thousands of venture capital firms, and pair it with the best matches. These matches are venture capital firms who have funded similar startups in the past, and will fund similar startups in the future. In short - we save you time, money, and effort by finding the right venture capital firm, right away.
Speaker Bio: John Montroy graduated with a B.A. in Physics. After a summer of particle physics at CERN with the Harvard ATLAS team, he began his career as a data analyst in the auto industry. He has been programming since the age of 12, and delights in clean, re-usable, and functionally-oriented code. A self-starter and curious thinker, his interests run the gamut from mathematics to classical music. In his spare time, he can be found playing piano or mandolin, singing barbershop, and running.
Avi Yashchin is the Founder and CEO of CleanEdison, which was acquired by Kaplan Inc (NYSE:GHC) in 2014. Data Scientist at NYC Data Science Academy. Instructor at General Assembly in the Business Fundamentals and Tactics course, and a mentor at Founder Institute, Tech Stars, and SJF Ventures. He has bachelor degree of Computer Science from Johns Hopkins and MBA from NYU.
-------------------------------
Project 3: lottery selection
Project description:
This project analyzes data on prize amounts from 6 different parimutuel lotteries in order to infer the selections that players choose most frequently, finding results with a remarkable level of agreement across multiple data sets. Because popular selections result in lower prizes (since a fixed amount of money is shared equally among winners), we can quantify the expected payout for each selection and the degree to which choosing popular combinations lowers a player's expected prize. This project also provides a case study on the trade-offs between accuracy, interpretability, and ease of implementation for machine learning models.
Speaker Bio: After starting his career as a Ph.D. in pure mathematics, Stephen Penrice has worked continuously to grow his technical proficiency in order to take on more and more challenges with an applied focus. His latest work in the finance industry has focused on models for commercial and residential mortgages as well as consumer credit. He has also worked in the gaming industry, where he invented more than 10 patented lottery games.
Reference link:
https://www.linkedin.com/in/stephen-penrice-8611a62
https://github.com/lotterdata/proj_4_bootcamp
------------------
Project 4: Kaggle Walmart data competition
Project description: Walmart released a data set of purchases for over 1.2 million items (over 600K in each of the training and testing sets) in the hopes that data scientists could predict the type of shopping trip being made. Using logistic regression, random forests, XGBoost and support vector machines will be demonstrated.
Bio: Brandon Schlenker, Nate Aiken, Joe Eckert, and Daniel Donohue worked on the Walmart data set.
Brandon Schlenker has a B.A. in Math from Northwestern University and most recently finished his Master's in Applied Math from the University of Delaware. Brandon looks forward to a career utilizing machine learning and all things at the intersection of math, statistics, and computer science.
Joe Eckert is currently studying with the NYC Data Science Academy to pursue his passion for big data. Joe previously worked for 3 years at JPMorgan's Corporate Bank. He graduated in 2012 with a BA in Financial Economics from the University of Rochester. Joe is a highly motivated, strategic and analytical professional who thrives in high pressure environments. He is an outgoing and charismatic team player with a knack for project management and solving complex problems.
Nate Aiken graduated from City College in 2014 with a BS in Biology with a focus in Neuroscience. His experience studying vision and hearing in labs at City and Rockefeller
University lead him to the bootcamp. He he enjoys the challenges of working with big data and finding the meaningful relationships within.
Daniel Donohue (A.B. Mathematics, M.S. Mathematics) spent the last three years as a Ph.D. student in mathematics studying topics in algebraic geometry, but decided a few short months ago that he needed a change in venue and career. Thankfully, he found the compelling world of data science. From his past experiences, he brings with him a voracious appetite for knowledge and learning, and a keen ability to explain difficult concepts in down-to-earth terms and skills that will serve him well as he looks forward to an exciting and fulfilling career as a data scientist.
------------------
Project 5: The Gradient Boosters and the Ross-Mann (Project)
Project Description: Our team, the Gradient Boosters, was challenged by Rossmann, the second largest chain of German drug stores, to predict the daily sales for 6 weeks into the future for more than 1,000 stores. Exploratory data analysis revealed several novel features, including spikes in sales prior to, and preceding store refurbishment. We also engineered several novel features by the inclusion of external data including Google Trends, macroeconomic data, as well as weather data. We then used H20, a fast, scalable parallel-processing engine for machine learning, to build predictive models utilizing random forests, gradient boosting machines, as well as deep learning. Lastly, we combined these models using different ensemble methods to obtain better predictive performance.
Bio: David Comfort, D.Phil. is a data scientist, scientist, activist and writer. His doctoral research at Oxford University was in protein nuclear magnetic resonance (NMR) and computational biology. His post-doctoral research at UCLA involved genomics, bioinformatics as well as protein NMR. In addition, his undergraduate work was in mechanical engineering at the University of Maryland, College Park.
He was most recently with Bench International, a consultancy and retained executive search firm for the Biopharmaceutical Industry. David has written several peer-reviewed scientific papers, as well as a proposal for the John D. and Catherine T. MacArthur Foundation Program on Global Security and Sustainability entitled, "Nature and the City: The political ecology of the environment, urbanization and sustainability." In addition, David has been a long-time activist in the areas of environmentalism, human rights and sustainable development.
Reference link: https://github.com/aviyashchin/KaggleProject

Machine Learning project demo day and Open house day