Syllabus: R as a Research Tool (BIOL 6325)

Course Overview

TTU review statement

As per the mandatory course review recently implemented by the Provost, this course content is pending review and approval by the Texas Tech University System Board of Regents. See https://www.depts.ttu.edu/provost/news/2025/course-content-oversight-and-review.php

Room and Time:

11:00-12:20 T TH, Biology room 405

Office hours:

Wednesdays 12:00-1:30 PM or by appointment. Building ESB2, room 402B. I can meet via zoom as well. Email me for advice or to set up a meeting. Programming is difficult and I will help you.

Course description

This is a workshop course in which I will teach the basics of computer programming using the computer language “R”, an open-source, interactive software package specifically designed for scientific numerical computation. R is a language designed around a core set of statistical libraries and offers advanced statistical capabilities. The language also offers very good graphical capabilities for exploring data or preparing figures for presentations and publication. Additionally, the course will teach basic computer programming principles that can apply to other computer languages such as Python or C++. This is not a statistics course, but it is aimed at teaching you to write code for data science. Therefore, I will cover visualization, data shaping, and model fitting in addition to the traditional programming subjects. My goal is to provide graduate students with tools to become better users of computers — programming allows a scientist to use the computer to answer the questions that are most important to the researcher and not be limited by pre-packaged tools.

There are no prerequisites. However, if you have gaps in your knowledge regarding operating system use (especially file paths and file naming) you will need to do some extra work to catch up. I will help you of course, but this is a programming course, not a course in basic computer use.

Expected learning outcomes

The overarching goal of this course is to produce scientists who can efficiently and accurately investigate data and do so in an open and reproducible workflow.

After completion of the course students will be able to

  • use the R console as a calculator and assign variables.
  • use and understand the R data storage objects (vectors, matrices, dataframes) and the differences among data types such as real numbers, integers, character strings and booleans.
  • write functions in R and break a large problem into a set of functions.
  • be fluent in some programming concepts especially important in data analysis and be able to solve problems in a modular programming style.
  • have some familiarity but not necessarily fluency with other programming concepts such as functional programming style, object-oriented programming, recursion, and regular expressions.
  • understand and be able to use debugging methods.
  • use a text editor and produce a plain-text workflow.
  • search for and install domain-specific R packages and be able to use the documentation of such packages.
  • take a new data set and efficiently investigate it for patterns using data reshaping techniques and exploratory visualization.
  • recognize repeated patterns in data messiness and be able to handle them, understand common pitfalls and mistaken assumptions about data classification.
  • engage in good code and data organization practices and use a consistent programming style.
  • understand the basics of reproducible research including passing familiarity with version control and RMarkdown so that the student could go on learn more about these subjects with some existing knowledge.

Methods of assessing learning outcomes

There will be no exams. Evaluations will be based solely on completion of weekly assignments. I will provide detailed feedback on the code students turn in each week.

Grading scale:

The final grade will be based on the average of the weekly programming assignments (see “assignments”, below). There are 12 assignments each worth 20 points. I will drop the lowest score. Grading framework: A => 90%; B => 80%; C 70%; D => 60%; F < 60%

Resources, required supplies

Software

You will need to install on your own computer:

  1. Download and install R, https://www.r-project.org/.
  2. Download and install RStudio, http://www.rstudio.com/download.
  3. Install required packages:
pkgs <- c("tidyverse", "Lahman", "lubridate", "maps", "nlme", "pbkrtest", "RColorBrewer",
"scales")
install.packages(pkgs)

Books

There is no textbook and there are no required outside readings. See the website for recommendations on supplementary material.

Course outline

Week 1 Basic R features
Week 2 Introduction to visualization and start overview of data types
Week 3 Introduction to main data types in R
Week 4 Introduction to functions
Week 5 Programming structures: logical operations and loops
Week 6 Functions and understanding scope
Week 7 Debugging, data frames and factors
Week 8 Math and simulations in R
Week 9 Strings, regular expressions
Week 10 The Grammar of Graphics
Week 11 Introduction the to Tidyverse, reshaping data
Week 12 Tidy data
Week 13 Dates and times, statistical models
Week 14 Perception and color, statistics packages
Week 15 Markup languages and version control
Week 16 Version control, text editors

Assignments

Out-of-class assignments will be given on a weekly basis and are due each Monday. For each assignment, please turn in a well-documented R script via email. I may also ask for specific outputs, example results, or graphs. Name each file starting with your last name, underscore, first name, then a hyphen and the homework name. Use the “.R” extension for R scripts, (e.g., if I were to submit the first assignment I would name the file Schwilk_Dylan-HW01.R). When emailing your assignment to me, please use a subject line with the following format: “R-research-tool: HW01”. In fact, please use “R-research-tool:” as a preface to the subject line for any email you send regarding the class.

There will be 12 assignments. Each assignment is worth 20 points. I will drop the lowest grade so the total points available will be 220. Assignments might vary tremendously in how long they take to complete because programming is full of false starts and frustration! Start early and allow yourself 8-12 hours of total time over several days to complete each assignment. Some assignments will take much less time if you understand the material well or are lucky. Others might frustrate you. Do please reach out to me if you are spinning your wheels and stuck. I will help you! Start early on the assignments so that there is time for me to give you advice if you are stuck. Making mistakes is fine. Always explain your though process in the comments in your code. Making mistakes and learning from them is how the learning happens.

Jan 26 HW01 Vectors and matrices
Feb 2 HW02 Introduction to data and vectors
Feb 9 HW03 Functions 1
Feb 16 HW04 Functions 2
Feb 23 HW05 Simulated evolution 1
Mar 2 HW06 Simulated evolution 2
Mar 9 HW07 Simulated evolution 3
Mar 16, March 23   None due
Mar 30 HW08 Strings
Apr 6 HW09 Data visualization with ggplot2
Apr 13 HW10 Shaping data and dplyr
Apr 20 HW11 Exploring large data sets: US baby names
Apr 27 HW12 Tidying and reshaping data

Policies

Texas Tech Policies and required syllabus statements

Texas Tech Policies Concerning Academic Honesty, Special Accommodations for Students with Disabilities, Student Absences for Observance of Religious Holy Days, Accommodations for Pregnant Students may be found on Canvas.

These statements can also be found at https://www.depts.ttu.edu/tlpdc/RequiredSyllabusStatements.php

Academic Honesty

It is the student’s responsibility to conduct him/herself in a civil manner while in the classroom. Please consult the university policy on and academic honesty (OP 34.12) and civility.

I do not tolerate any plagiarism. You must write your own code. If you use code (even just for ideas) from any other resource such as a classmate, a friend, or the internet, you must carefully cite that in your code comments. I will teach you how to use the internet to help you find solutions, but do not blindly google or skim StackOverflow! You are meant to complete the assignments using what you have learned up to that point in the course and the course progression is designed with that in mind. Plagiarism will result in a written report submitted to the Office of Student Conduct and to the chair of Biological Sciences.

“AI”, LLMs, and ChatGPT Policy

So called “AI” tools based on large language models (LLMs) such as ChatGPT are increasingly popular for quick code generation. These are generators of plausible sounding text (or plausible looking code) with no necessary relationship to truth. In other words, large language models are “bullshit generators”. I have tested such tools and many will lead you astray and produce buggy code. Some of the newer tools are better and are able to produce reasonably-working code for some tasks. However, these tools require oversight and checking and such oversight requires skills developed through writing your own code!

Therefore: The use of ChatGPT or any other Large Language Model (LLM) “AI” platform to to generate ether ideas or written content, or to produce any other material, is prohibited in this course. A large language model functions as a “plagiarism laundering service” that relies on uncredited material from scholars, writers, and programmers. It is built on theft.

Submission of AI-generated content (i.e., information, text, or images) as your own work is a violation of academic integrity and will result in referral to the Office of Student Conduct.

Attendance Policy

Attendance is required to perform well in this course. Please contact me if you need to miss class or if you need flexibility in any assignment due dates. I will endeavor to help you out. I understand that you all have your own researchto conduct, not to mention illness which can affect us all.

Back to top | E-mail Schwilk