Skip to content

Details

All,

A few people mentioned after the talk that it might be a good idea to actually put some of these concepts to work. To that end, I'm going to assign some homework. Baby name data is available here (http://www.ssa.gov/oact/babynames/limits.html). The smaller file is National data, please download it and try to answer, using Python/Pandas the following 4 questions.

  1. Find the total number of births in each year. If you can, find the total for both male and female births. Plot your results.

  2. Find the top 1000 names for each year. If you can, find the top 1000 female and top 1000 male names and plot by gender.

  3. Plot the trend of the names 'John', 'Harry', 'Mary' and 'Marilyn' over all of the years of the data set. Try to make a stack of 4 plots.

  4. Extra Credit: Find the number of distinct names, taken in order of popularity from highest to lowest, in the top 50% of births. Plot for both male and female births over the full range of the data set.

Good luck!

Sponsors

ScienceLogic

ScienceLogic

Providing room and board!

Accelebrate

Accelebrate

Give aways and support

Novetta

Novetta

Food sponsor

You may also like