Homework 12: Independent project


For this final assignment, use data from your own work to demonstrate data reading, cleaning and reshaping, presentation quality figures, and statistical analyses. I highly prefer that you use your own data, but if you do not have any data of your own with which to work, then find a publicly available data set and use it. You can even use one of the data sets we have used in this class already, but your analyses must go substantially beyond anything we did in class or in previous assignments.

If you use a data set I've used in class, you may have to be creative to demonstrate all of the steps below in a new way. You will not get credit for work we already did in the course.

If the data you use is not available via http, then attach the necessary data files to your email on submission. I will save them into the working directory from which I run your script, so use local paths when referring to files in your script. Include your name int he data file names so they will not clash with others'.

You will turn in a single well-documented R script which includes the following steps:

  1. Data reading and cleaning
  2. Data reshaping and tidying (if necessary). This may involve string operations and regular expressions and may transformation between wide and long data formats.
  3. Transformations and/or summarizing (if necessary)
  4. Graphical exploration. Walk me through the steps and explain in comments your logic and what the data tell you. Please edit and compose this! Do not show me a simple transcript of your session, construct a logical progression as if you are writing an essay. One question should lead to the next. Use the full power of ggplot when making your figures. Label the axes! Avoid code duplication; define some ggplot style elements and re-use them if possible.
  5. Statistical models/analyses. This should be combined with the graphical analyses, above. I understand that not all of you have had much statistics, but attempt some analyses. If linear models are appropriate you should be able to use those. If you do not know how to do the appropriate test, do what you can and explain its shortcomings.
  6. I do not want to see repeated code. Write helper functions to allow code re-use when possible.
  7. If needed, demonstrate use of an R package that we did not use in the class. Explain how you found this package and why you are using it.

Comment your coding cleanly and concisely. Pay close attention to style.

Back to top | E-mail Schwilk