Skip to content
Homework 1

Details

All,

A few people mentioned after the talk that it might be a good idea to actually put some of these concepts to work. To that end, I'm going to assign some homework. Baby name data is available here (http://www.ssa.gov/oact/babynames/limits.html). The smaller file is National data, please download it and try to answer, using Python/Pandas the following 4 questions.

  1. Find the total number of births in each year. If you can, find the total for both male and female births. Plot your results.

  2. Find the top 1000 names for each year. If you can, find the top 1000 female and top 1000 male names and plot by gender.

  3. Plot the trend of the names 'John', 'Harry', 'Mary' and 'Marilyn' over all of the years of the data set. Try to make a stack of 4 plots.

  4. Extra Credit: Find the number of distinct names, taken in order of popularity from highest to lowest, in the top 50% of births. Plot for both male and female births over the full range of the data set.

Good luck!

Photo of NOVA-Python group
NOVA-Python
See more events