Homework 1


Details
All,
A few people mentioned after the talk that it might be a good idea to actually put some of these concepts to work. To that end, I'm going to assign some homework. Baby name data is available here (http://www.ssa.gov/oact/babynames/limits.html). The smaller file is National data, please download it and try to answer, using Python/Pandas the following 4 questions.
-
Find the total number of births in each year. If you can, find the total for both male and female births. Plot your results.
-
Find the top 1000 names for each year. If you can, find the top 1000 female and top 1000 male names and plot by gender.
-
Plot the trend of the names 'John', 'Harry', 'Mary' and 'Marilyn' over all of the years of the data set. Try to make a stack of 4 plots.
-
Extra Credit: Find the number of distinct names, taken in order of popularity from highest to lowest, in the top 50% of births. Plot for both male and female births over the full range of the data set.
Good luck!

Sponsors
Homework 1