5  Import penguins data

5.1 Learning Objectives

  • This chapter covers an ideal workflow for importing data into R

5.2 Access our data

Now that we have a project workspace, we are ready to import some data.

  • Use the link below to open a page in your browser with the data open

  • Right-click Save As to download in csv format to your computer (Make a note of where the file is being downloaded to e.g. Downloads)

5.3 Meet the Penguins

Photo of three penguin species, Chinstrap, Gentoo, Adelie

This data, taken from the palmerpenguins package was originally published by Gorman et al. (2014).

The palmer penguins data contains size measurements, clutch observations, and blood isotope ratios for three penguin species observed on three islands in the Palmer Archipelago, Antarctica over a study period of three years.

excel view, csv view

Top image: Penguins data viewed in Excel, Bottom image: Penguins data in native csv format

In raw format, each line of a CSV is separated by commas for different values. When you open this in a spreadsheet program like Excel it automatically converts those comma-separated values into tables and columns.

5.4 Make sure our data is in the correct directory

  • The data is now in your Downloads folder on your computer (by default)

  • Move/copy/save it to the data/raw sub directory of your RStudio Project.

File tab

Highlighted the buttons to upload files, and more options

5.5 Make a script

Let’s now create a new R script file in which we will write instructions and store comments for manipulating data, developing tables and figures.

Your turn

Set your Rstudio theme to something that works for you:

#___________________________----
# SET UP ----
# An analysis of the bill dimensions of male and female 
# Adelie, Gentoo and Chinstrap penguins

# Data first published in  Gorman, KB, TD Williams, and WR Fraser. 
# 2014. 
# “Ecological Sexual Dimorphism and Environmental Variability 
# Within a Community of Antarctic Penguins (Genus Pygoscelis).” 
# PLos One 9 (3): e90081.
# https://doi.org/10.1371/journal.pone.0090081. 
#__________________________----

Why do we need scripts to be organised?

Scripts are a human readable record of our analysis - we need them to be organised, consistent, and well-documented. See Section 13.0.1 for general tips on how to do this.

Well organised scripts are useful in that provide a reference for what you did - think of them as the analysis equivalent of a lab notebook. The more detail provided in the form of comments, the more context we have. The more organised they are, the easier it is to find the sections we need to review. Just like a lab notebook they should be written as though someone else will be reading them.

5.6 Add Packages

Next load the following add-on package to the R script, just underneath these comments. Tidyverse isn’t actually one package, but a bundle of many different packages that play well together - for example it includes ggplot2 a package for making figures (Wickham (2023)).

Your turn

Note you may need to run install.packages() to download and set up packages the first time you make a new project.

# PACKAGES ----
library(tidyverse) # tidy data packages
library(here) # organised file paths
library(janitor) # cleans variable names
#__________________________----
Tip

Click on the document outline button (top right of script pane). This will show you how the use of the visual outline

Allows us to build a series of headers and subheaders, this is very useful when using longer scripts.

5.7 Read in data

Now we can read in the data. To do this we will use the function readr::read_csv() that allows us to read in .csv files. There are also functions that allow you to read in .xlsx files and other formats like .json, however in this course we will only use .csv files.

We are also using here::here(), Müller (2025). The here package helps you load and save files in your project without worrying about changing your working directory, by always starting from the project’s main folder.

  • First, we will create an object called penguins_raw that contains the data in the penguins_raw.csv file.

Your turn

# IMPORT DATA ----

penguins_raw <- read_csv(here("data", "raw", "penguins_raw.csv"))

# check the data has loaded
penguins_raw
#__________________________----
studyName Sample Number Species Region Island Stage Individual ID Clutch Completion Date Egg Culmen Length (mm) Culmen Depth (mm) Flipper Length (mm) Body Mass (g) Sex Delta 15 N (o/oo) Delta 13 C (o/oo) Comments
PAL0708 1 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N1A1 Yes 11/11/2007 39.1 18.7 181 3750 MALE NA NA Not enough blood for isotopes.
PAL0708 2 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N1A2 Yes 11/11/2007 39.5 17.4 186 3800 FEMALE 8.94956 -24.69454 NA
PAL0708 3 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N2A1 Yes 16/11/2007 40.3 18.0 195 3250 FEMALE 8.36821 -25.33302 NA
PAL0708 4 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N2A2 Yes 16/11/2007 NA NA NA NA NA NA NA Adult not sampled.
PAL0708 5 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N3A1 Yes 16/11/2007 36.7 19.3 193 3450 FEMALE 8.76651 -25.32426 NA
PAL0708 6 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N3A2 Yes 16/11/2007 39.3 20.6 190 3650 MALE 8.66496 -25.29805 NA
#___________________________----
# SET UP ----
# An analysis of the bill dimensions of male and female 
# Adelie, Gentoo and Chinstrap penguins

# Data first published in  Gorman, KB, TD Williams, and WR Fraser. 
# 2014. 
# “Ecological Sexual Dimorphism and Environmental Variability 
# Within a Community of Antarctic Penguins (Genus Pygoscelis).” 
# PLos One 9 (3): e90081.
# https://doi.org/10.1371/journal.pone.0090081. 
#__________________________----

# PACKAGES ----
library(tidyverse) # tidy data packages
library(here) # organised file paths
library(janitor) # cleans variable names
#__________________________----

# IMPORT DATA ----
penguins_raw <- read_csv(here("data", "raw", "penguins_raw.csv"))

# check the data has loaded, prints first 10 rows of dataframe
penguins_raw
#__________________________----

5.8 Running a script from console

Now that we have made a very simple script, we can check our blank slates set-up and that our script runs without errors:

Your turn

source("scripts/01_import_penguins_data.R") # must specify dir

# source(here::here("scripts", "01_import_penguins_data.R"))
# using here 

5.9 Summary

In this chapter we have:

  • Set up our raw data in an organised subdirectory

  • Made safe filepaths to our data

  • Imported into R

  • Checked basic data import