5 Import penguins data
5.1 Learning Objectives
- This chapter covers an ideal workflow for importing data into R
5.2 Access our data
Now that we have a project workspace, we are ready to import some data.
Use the link below to open a page in your browser with the data open
Right-click Save As to download in csv format to your computer (Make a note of where the file is being downloaded to e.g. Downloads)
5.3 Meet the Penguins

This data, taken from the palmerpenguins package was originally published by Gorman et al. (2014).
The palmer penguins data contains size measurements, clutch observations, and blood isotope ratios for three penguin species observed on three islands in the Palmer Archipelago, Antarctica over a study period of three years.

In raw format, each line of a CSV is separated by commas for different values. When you open this in a spreadsheet program like Excel it automatically converts those comma-separated values into tables and columns.
5.4 Make sure our data is in the correct directory
The data is now in your Downloads folder on your computer (by default)
Move/copy/save it to the
data/rawsub directory of your RStudio Project.

5.5 Make a script
Let’s now create a new R script file in which we will write instructions and store comments for manipulating data, developing tables and figures.
Your turn
Set your Rstudio theme to something that works for you:
#___________________________----
# SET UP ----
# An analysis of the bill dimensions of male and female
# Adelie, Gentoo and Chinstrap penguins
# Data first published in Gorman, KB, TD Williams, and WR Fraser.
# 2014.
# “Ecological Sexual Dimorphism and Environmental Variability
# Within a Community of Antarctic Penguins (Genus Pygoscelis).”
# PLos One 9 (3): e90081.
# https://doi.org/10.1371/journal.pone.0090081.
#__________________________----Why do we need scripts to be organised?
Scripts are a human readable record of our analysis - we need them to be organised, consistent, and well-documented. See Section 13.0.1 for general tips on how to do this.
Well organised scripts are useful in that provide a reference for what you did - think of them as the analysis equivalent of a lab notebook. The more detail provided in the form of comments, the more context we have. The more organised they are, the easier it is to find the sections we need to review. Just like a lab notebook they should be written as though someone else will be reading them.
5.6 Add Packages
Next load the following add-on package to the R script, just underneath these comments. Tidyverse isn’t actually one package, but a bundle of many different packages that play well together - for example it includes ggplot2 a package for making figures (Wickham (2023)).
Your turn
Note you may need to run
install.packages()to download and set up packages the first time you make a new project.
Click on the document outline button (top right of script pane). This will show you how the use of the visual outline
Allows us to build a series of headers and subheaders, this is very useful when using longer scripts.
5.7 Read in data
Now we can read in the data. To do this we will use the function readr::read_csv() that allows us to read in .csv files. There are also functions that allow you to read in .xlsx files and other formats like .json, however in this course we will only use .csv files.
We are also using here::here(), Müller (2025). The here package helps you load and save files in your project without worrying about changing your working directory, by always starting from the project’s main folder.
- First, we will create an object called
penguins_rawthat contains the data in thepenguins_raw.csvfile.
Your turn
| studyName | Sample Number | Species | Region | Island | Stage | Individual ID | Clutch Completion | Date Egg | Culmen Length (mm) | Culmen Depth (mm) | Flipper Length (mm) | Body Mass (g) | Sex | Delta 15 N (o/oo) | Delta 13 C (o/oo) | Comments |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PAL0708 | 1 | Adelie Penguin (Pygoscelis adeliae) | Anvers | Torgersen | Adult, 1 Egg Stage | N1A1 | Yes | 11/11/2007 | 39.1 | 18.7 | 181 | 3750 | MALE | NA | NA | Not enough blood for isotopes. |
| PAL0708 | 2 | Adelie Penguin (Pygoscelis adeliae) | Anvers | Torgersen | Adult, 1 Egg Stage | N1A2 | Yes | 11/11/2007 | 39.5 | 17.4 | 186 | 3800 | FEMALE | 8.94956 | -24.69454 | NA |
| PAL0708 | 3 | Adelie Penguin (Pygoscelis adeliae) | Anvers | Torgersen | Adult, 1 Egg Stage | N2A1 | Yes | 16/11/2007 | 40.3 | 18.0 | 195 | 3250 | FEMALE | 8.36821 | -25.33302 | NA |
| PAL0708 | 4 | Adelie Penguin (Pygoscelis adeliae) | Anvers | Torgersen | Adult, 1 Egg Stage | N2A2 | Yes | 16/11/2007 | NA | NA | NA | NA | NA | NA | NA | Adult not sampled. |
| PAL0708 | 5 | Adelie Penguin (Pygoscelis adeliae) | Anvers | Torgersen | Adult, 1 Egg Stage | N3A1 | Yes | 16/11/2007 | 36.7 | 19.3 | 193 | 3450 | FEMALE | 8.76651 | -25.32426 | NA |
| PAL0708 | 6 | Adelie Penguin (Pygoscelis adeliae) | Anvers | Torgersen | Adult, 1 Egg Stage | N3A2 | Yes | 16/11/2007 | 39.3 | 20.6 | 190 | 3650 | MALE | 8.66496 | -25.29805 | NA |
#___________________________----
# SET UP ----
# An analysis of the bill dimensions of male and female
# Adelie, Gentoo and Chinstrap penguins
# Data first published in Gorman, KB, TD Williams, and WR Fraser.
# 2014.
# “Ecological Sexual Dimorphism and Environmental Variability
# Within a Community of Antarctic Penguins (Genus Pygoscelis).”
# PLos One 9 (3): e90081.
# https://doi.org/10.1371/journal.pone.0090081.
#__________________________----
# PACKAGES ----
library(tidyverse) # tidy data packages
library(here) # organised file paths
library(janitor) # cleans variable names
#__________________________----
# IMPORT DATA ----
penguins_raw <- read_csv(here("data", "raw", "penguins_raw.csv"))
# check the data has loaded, prints first 10 rows of dataframe
penguins_raw
#__________________________----5.8 Running a script from console
Now that we have made a very simple script, we can check our blank slates set-up and that our script runs without errors:
5.9 Summary
In this chapter we have:
Set up our raw data in an organised subdirectory
Made safe filepaths to our data
Imported into R
Checked basic data import