Lab 1: Reproducible research with RMarkdown and Github

This first lab will serve as a general introduction to the major tools and platforms that we will be using throughout the semester, which are R, RStudio, and Git/Github. R is the name of the programming language itself and RStudio is a convenient interface.

RStudio

When you launch RStudio, you will see an interface that looks something like this:

The panel in the upper right contains your workspace as well as a history of the commands that you’ve previously entered. Any plots that you generate will show up in the panel in the lower right corner.

The panel on the lower left is where you can enter R commands. It’s called the console. Everytime you launch RStudio, it will have the same text at the top of the console telling you the version of R that you’re running. Below that information is the prompt. As its name suggests, this prompt is really a request, a request for a command. When you want to test out an R command and not necessarily save it to a file, this is the place to do it.

Enter the following command at the R prompt (i.e. right after > in the console) by typing it in and then pressing the Enter key:

2 + 2

Getting started with Github

Please sign up for an account on Github using your Mason email address if you do not have one already. If you already have a Github account, please update your profile to add your Mason email address to your account (you can add multiple email addresses to a single GitHub account).

On Blackboard, click on the GitHub classroom link in the Lab 1 post, and click through the series of prompts and authorizations to create a repository holding all of the starter files for this Lab.

Then follow these instructions to open the lab as a new project in RStudio: https://cdsbook.github.io/book/src/book/B_additional_setup_appendix.html#sec-create-new-rstudio-project

Finally you will need to install the tidyverse package if it is not already installed. The class textbook has instructions on how to check if a package is installed and install it: https://cdsbook.github.io/book/src/book/03_r_programming_chapter.html#installing-packages

RMarkdown

You learned the basics of RMarkdown in CDS 101. As a reminder RMarkdown is a type of file that can include both regular text and R code. You will convert the RMarkdown file into a final output file (like a PDF) in a process called “knitting” which formats the text into paragraphs and runs the R code and inserts the code output into the file.

Now that you’ve obtained your personal lab template from Github and opened it in RStudio, we can start learning about RMarkdown.

In this lab we will learn some of the more advanced features of RMarkdown which you can use to enhance the PDFs that you create for the rest of the semester.

Helpful resources

CDS 101 & 102 RMarkdown cheatsheet (a short list of the most useful commands for this class)
Official rmarkdown cheatsheet (more extensive)
RMarkdown reference guide (much more extensive)

Exercises

Once you have opened up the lab in RStudio as a new project, find the lab01.Rmd file in the Files tab in the lower-right pane of RStudio. Click on the file name to open it for editing.

You will be adding all your answers to this RMarkdown file. To convert it to a PDF, click the Knit button at the top of the editing pane (just above the file contents).

The first thing to do in every lab is to replace the name placeholder with your name. Line 3 of the lab01.Rmd file currently says:
```
author: "Fill in your full name"
```
Replace the value inside the quotation marks with your full name. Then click the knit button again. In the PDF that appears you should see that your name now appears at the top of the first page, after the title.

Sidenote: the RMarkdown header section

You might have noticed that the first few lines of the Rmd file seem to contain various settings, such as the title, font size, and margin width.

This is called the header section of the RMarkdown file. In Lab 1, it begins and end on lines 1 and 16. We mark the start and end with a line of 3 dashes: ---.

The instructions in the header section control the formatting of the final output file, but does not appear directly in the output (except for things like the title and author). (If you are curious, these header instructions are written in a format called “YAML”.)

In general you never need to change anything in the header sections of the RMarkdown files we give you (except for replacing your name). The default settings we give you will produce nicely formatted lab reports.
Scroll down to the Exercise 2 section of the lab01.Rmd file (which should be around line 32)

You can create tables using Markdown, copy-and-paste the following markdown into your file beneath the ## Exercise 2 section heading and knit to see it looks like.
```
| Column 1 | Column 2 | Column 3 | Column 4 |
| --- | ---: | :---: | :--- |
| Notice | what | the | colons |
| are | doing? | | |

Table: The table with poor spacing

| Column 1 | Column 2 | Column 3 | Column 4 |
| -------- | -------: | :------: | :------- |
| Notice   | what     | the      | colons   |
| are      | doing?   |          |          |

Table: The table with good spacing
```
Then answer the following questions below the tables:
- Do both tables look the same after being rendered?
- What are the snippets below each table doing?
Once you have completed this, commit the change you have made to the RMarkdown file. Remember to add an informative commit message like “Added author name”. (Good commit messages help other programmers quickly understand the history of your project.)
Sidenote: Git Commits

Two things about committing:
1. You should commit frequently. At minimum, you should try and make a commit each time that you’ve finished a lab exercise. You are graded on having enough commits - if you only commit once or twice in the entire lab then you will lose points.
2. Leave informative commit messages. “Added stuff” will not help you if you’re looking at your commit history in a year. A message like “Typed in Lab 1 RMarkdown examples” will be more useful. However your commit message should also be short and succint (aim for <10 words). Another part of the grade for your lab will be based on the quality of your commit messages.
A commit is a snapshot of your project at a particular point (like saving at a checkpoint in a video game). By creating a series of these commits (or checkpoints) we can

So far the commit only exists in RStudio, and has not been uploaded back to the Github website. We need to synchonize the history of our project back to GitHub since that is the central place where we share our code with other programmers (not to mention the instructor who will be grading it!).

In the world of Github, synchronizing from our local computer to the Github website is called a “Push”. If you want, you can try doing that now. In the Git tab of RStudio or in interactive commit window we opened earlier, you should see a green up-arrow labeled “Push”. Click it.

In future, you can push after every commit or wait to push until you have made several commits. Just make sure that you push at the end of the lab so that the final history of your lab is synchronized back to GitHub for your instructor to grade.
In the exercise 3 section of the answer file, add the following code:
```
![Image caption](test-image.jpeg)
```
Knit the file and see what you get in the Exercise 3 section.

Then edit the above code so that instead of displaying the test image, you instead display the picture in the knitting-process.png file (instead of test-image.jpeg) with a coption that reads Flowchart of the knitting process in RMarkdown.

When you have finished this exercise, save your file and commit it again.
Add the following lines of text in the Exercise 4 section of your file:
```
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

    Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
```
(Make sure there is a blank line between them, and that the second version of the line begins with 4 spaces.)

Then knit your file to a PDF again, and answer the following questions:
- The written contents of an RMarkdown file can typically be formatted in one of two ways: (1) regular text or (2) code-like text. What is the effect of adding 4 spaces at the start of the line?
- What problems do you create for the reader if you write a paragraph of text with spaces at the start of the line as we have done here? (Think about formatting, readability, etc.)
When you have finished this exercise, make sure you save your file and make another commit.
In regular text in RMarkdown we can only include characters that occur on a standard US keyboard. However you may sometimes math equations with different formatting and symbols that are not on the US keyboard.

To include math and formulas we use the $ symbol at the start and end of our math formula. For example, try adding this to your answer file:

$c = \sqrt{a^2 + b^2}$

…and see what you get when you knit.

You can also do things like fractions, Greek letters, and other mathematical symbols.

Using the table below as a guide, reproduce Isaac Newton’s equation that acceleration equals force divided by mass in your answer file:

$a = \frac{F}{m}$

Table of examples (to be used as a guide to formate the equation above - you do not need to copy these examples into your answers).

To get this… …write this

$x = 10$ $x = 10$

$x^{n}$ $x^{n}$

$x_{n}$ $x_{n}$

$\frac{a}{b}$ $\frac{a}{b}$

$\alpha$ (Greek letter “alpha”) $\alpha$

(For more complicated types of math, try Googling “latex mathematical symbols”.)

When you have finished this exercise, commit your RMarkdown file again.
Sidenote: Including symbols from other languages

This is for infomation only - unless specified in the instructions, everything you submit for this class should be written in English.

You might be interested in whether you can include languages with different alphabets in RMarkdown text. The answer is yes, but… it requires some fiddling with settings in the header section.

For example, if you wanted to include the address of the Mason Korea campus in an RMarkdown file:

119-4 송도문화로, Songdo 1(il)-dong, Yeonsu-gu, Incheon, South Korea

….then including that code above and knitting to a PDF will give you an error because the default settings used by RMarkdown don’t allow us to include characters from the Korean script (Hangul).

However if you were to add the following two lines the pdf_document settings in your Rmd file’s header section, then it should knit and display the Hangul characters:
```
  pdf_document:
    ...
    latex_engine: xelatex
    extra_dependencies: ["kotex"]
```
Note: you should not add the Korean address and symbols to your lab - this is just an example for the curious!
In CDS 101 you have learned how to include code chunks in an RMarkdown file using backtick symbols ` (remember that these are different from single quotation marks) as well as an r in curly brackets, When you knit, the output of the code chunk will be inserted below the code.

For example, this code chunk contains code that will produce a scatter plot of amount of REM vs. total sleep in different mammals (we will more about graphs like this in the next lab!):
```
```{r}
qplot(x = sleep_total, y = sleep_rem, data = msleep)
```
```
Copy the code chunk above into your answer file. (Make sure to include all the formatting symbols.)

You can run it interactively within the RMarkdown file by clicking the green play arrow in the top right of the code chunk. The graph should then appear below the code chunk in the file editor. This is a super convenient way to test that your code works as you are writing it.

You can also run the code by knitting the entire file. (Since this is slower, it is better to check that your code works first by running it interactively.)

In both cases, you will notice that a warning message appears above the graph, telling us that 22 rows have been removed because they contained missing values.

Often R will warn us of potential problems with out code, especially if we make assumptions that might be wrong. For example, here we might be assuming that all the mammals in our dataset have measured sleep values and appear in our graph - obviously that is not the case.

Warnings like this are helpful when we are writing code. However, when you have come to the end of a project and present it to other people, you might want to get rid of those kinds of messages.

We can customize the output of RMarkdown code chunks to do things like hide warning messages. To do this we need to add an option to the R code chunk. Code chunk options go inside the {r} curly brackets.

Modify your existing code chunk’s first line of formatting symbols so that it looks like this:
```
```{r, warning = FALSE}
```
If you interactive run the code chunk or knit the file again, you should see that the warnings above the graph have disappeared.

You do not need to remove warnings in CDS 101 assignments or CDS 102 warnings - leaving them in will never affect your grade. In fact, it is good to leave them in while you are initially writing your code. However, if you want to remove them from the final PDF, you can add the warning = FALSE option to any code chunk that produces warnings.

Commit your work again at the end of this exercise.
Sidenote: Other code chunk options

Many other code chunk options exist. You can do thinks like hide the code, change the size of the output, or even prevent the code chunk from running at all.

If you take a look back at the set-up code chunk at the top of your RMarkdown answer file (which we included for you), you will note that its options look like this:

{r setup, include = FALSE}
- The word setup is the name of the chunk (any code chunk can be given a name, although you will not normally need to do so).
- Then after the comma we have used the option include = FALSE to hide this code chunk and its output from the final knitted PDF (the code in this setup chunk will still be run, it will just be invisible). You should never use this option for the code you write in this class - we always want to see your code.
Later in the semester will will learn a few more code chunk options to modify the appearance of a code chunk’s output.

To get this…	…write this
\(x = 10\)	$x = 10$
\(x^{n}\)	$x^{n}$
\(x_{n}\)	$x_{n}$
\(\frac{a}{b}\)	$\frac{a}{b}$
\(\alpha\) (Greek letter “alpha”)	$\alpha$

How to submit

To submit your lab assignment, follow the two steps below. Your lab will be graded for credit after you’ve completed both steps!

Save and commit any last changes to your RMarkdown file, and push all your commits to GitHub. If you do this right, then you will be able to view your completed file on the GitHub website.
Knit your R Markdown document to the PDF format, and then upload that PDF to the Lab 1 posting on Blackboard.

Credits

This lab is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. It was written by Dominic White and Ajay Kulkarni for CDS 102 at George Mason University, and partly adapted from Lab 0 - Introduction to R and RStudio.

Lab 1: Reproducible research with RMarkdown and Github

RStudio

Getting started with Github

RMarkdown

Helpful resources

Exercises

Sidenote: the RMarkdown header section

Sidenote: Git Commits

Sidenote: Including symbols from other languages

Sidenote: Other code chunk options

How to submit

Credits