Homework 10: Data Exploration

Please turn in one script with sections for both parts, below.

US Baby Names

For this part of the homework, you will need the baby names data that I downloaded from the Social Security Administration and cleaned up for you (top-1000-baby-names.csv). This file has the top 1000 boys and top 1000 girls names for each year 1880–2017 and includes the percentage of all boys or girls that year with that name. I've supplied the script I used to read in and clean that data as hw10-clean-bnames.R, but you don't need to run that file as the result has already been saved. That script downloads the most recent data directly from the Social Security Administration website. You should look at that script and see what it does and if you like you can run it yourself to recreate the top 1000 names file.

Question 1:

Show me the popularity over time of names that are similar to yours. First you must define similarity. A simple regular expression, soundex (See R script for the dplyr lecture for an implementation), etc. Explain how you defined similarity and why. Illustrate and discuss any pattern over time that you uncover. Investigate what drives any such pattern.

Note: you are welcome to expand this to look beyond your own name if there are other more interesting patterns.

Question 2: Old Testament names

Have biblical old testament names been increasing or decreasing in popularity?

I've given you a text file with all the names in the old testament (old-testament.txt). You can read this in as a character vector using scan:

oldt <- scan("http://r-research-tool.schwilk.org/assignments/old-testament.txt",
             "character")

What is the pattern for the popularity of old testament names over time? Does the pattern differ for boys and girls? Show me a plot or two to illustrate your answer. Also provide me with a table showing the top 20 old testament names in the whole data set (averaged over all years). If you created multiple plots to explore the question, guide me through your logic in your comments.

Back to top | E-mail Schwilk