1 R Basics
R is a programming language and environment for statistical computing and graphics. RStudio is an Integrated Development Environment (IDE) that makes using R easier by providing a user-friendly interface with helpful features (like a script editor, console, environment viewer, etc.). In other words, you write and run R code, and RStudio helps you organize and execute that code more conveniently. For this course, we can use RStudio (via Posit Cloud) to write and run R code, but remember that R and RStudio are two separate pieces of software – R is the engine under the hood, and RStudio is the dashboard and controls. Both R and RStudio may have their own updates, so keep an eye on updating them separately when working on your personal computer.
Tip: R and RStudio are free to download. You can install R from the CRAN website (Comprehensive R Archive Network) and RStudio from the Posit (formerly RStudio) website. In our classroom environment on Posit Cloud, these are already set up for you.
1.1 Your first R command
Let’s try some simple calculations in R to get a feel for it. You can use R as a calculator:
- What answer did you get?
The first line shows the request you made to R, the next line is R’s response
You didn’t type the > symbol: that’s just the R command prompt and isn’t part of the actual command.
Your turn
1.1.1 Perform some combos
You can combine operations, and R will follow the standard order of operations (BODMAS/BIDMAS rules: Brackets/Parentheses, Orders/Exponents, Division and Multiplication, Addition and Subtraction). For example:
Be careful with parentheses to ensure calculations happen in the order you intend. If you omit parentheses, R might give a different result than you expect.
1.1.2 Use R interactively:
Don’t be afraid to experiment in the console. R is read-evaluate-print by nature: you type an expression, R evaluates it, and prints the result. If you type an incomplete expression, R will show a + continuation prompt, meaning it’s waiting for the rest of your input. For instance, if you type 10 + and press Enter, the console will show:
That + indicates R expects more input (it knows the command isn’t complete).
If you realize you made a mistake and want to cancel, press Esc to break out and get back to the > prompt.
Otherwise, you can continue the command (type 20 and hit Enter) to complete it:
Your turn
Write an incomplete line of code - then either finish it or escape the line of code
> 10 +
+ 20
[1] 30
1.1.3 Comparison operators
R can also compare values. These expressions return logical values: TRUE or FALSE.
Here, == means “equal to”. A single = is not used for comparison in R.
1.2 Objects and assignment
R is object-based. This means you usually store results so you can reuse them later.
Here we created a variable named x and assigned it the result of 10 + 20 (which is 30). This does not print 30 to the console because the result was instead stored in x.
If you want to see the value, you can simply type x in the console and press Enter:
Typing the name of a variable and hitting Enter will print its value (this is called auto-printing). Alternatively, you could use the explicit print(x) function with the same result. In interactive use, auto-printing by typing the name is convenient; in scripts or functions, you might use print() to display interim results.
You can use variables in calculations just like numbers. Continuing the example, since x is 30:
You can also assign the results of these calculations to new variables:
After these assignments, you will see x, y, and z listed in RStudio’s Environment pane with their values. At any time, you can inspect a variable’s value by printing it (as shown above).
The arrow <- assigns the value on the right to the name on the left. If you assign a new value to the same name, the old value is overwritten.
1.3 Vectors
A vector is the simplest data structure in R. It is a collection of values of the same type.
The function c() stands for “combine”.
1.3.1 Sequences
R can generate sequences easily
1.3.2 Indexing
R starts counting from 1, not 0:
1.3.3 Logical subsetting
You can select values from within a vector that meet a condition:
[1] 85 92 81 90
1.3.4 Vectorised operations
Most operations in R work element-by-element.
1.4 Variable naming rules and tips

Use meaningful names: It’s often better to use descriptive names for variables (e.g., total_sales instead of x or var1) so that the code is self-explanatory. This helps you (and others) understand the code later.
Be concise: While names should be meaningful, overly long names can be cumbersome. Try to strike a balance (e.g.,
response_timeis easier to handle thanthe_response_time_of_the_subject).No spaces or special characters: Variable names cannot contain spaces. They must start with a letter or dot (
.) or underscore (_), and the remaining characters can be letters, numbers, dots, or underscores. For example,currentTemperatureorcurrent_temperatureare valid names, but current temperature (with a space) is not. Also, avoid using symbols like+,-,*in names.Case sensitivity: R is case-sensitive. This means
Variable,variable, andVARIABLEwould be three different names. Be consistent in your naming to avoid confusion.Recommended conventions: Many R programmers use either
snake_caseorcamelCasefor multi-word names. Snake case uses underscores (e.g.,total_cost), while camel case capitalizes each word after the first (e.g.,totalCost). Choose one style and stick with it.
Avoid naming variables after existing functions or constants in R (like mean, data, T, c, etc.) because that can lead to confusion or errors. For instance, if you do mean <- 5, you won’t be able to use the mean() function until you restart R or remove that variable.
1.5 Dataframes and Tibbles
A data frame is a table. Each row represents an observation; each column represents a variable.
This makes a data frame survey with 5 rows and 3 columns: index, sex, and age. If you print survey, you’ll see something like:
Each column is a vector: survey$index is the vector 1,2,3,4,5; survey$sex is c(“m”,“m”,“m”,“f”,“f”); survey$age is c(99,46,23,54,23).
Because each column is a vector, they all must be the same length (here length 5) – which they are.
1.5.1 Accessing columns
Both return the same column
1.5.2 Subsetting rows and columns
1.5.3 Adding a column
1.5.4 Tibbles
Tibbles are a modern version of data frames with safer defaults and clearer printing.
For most beginner tasks, data frames and tibbles can be treated as interchangeable.
If you print survey_tibble, you’ll get an output like:
1.6 Functions
Functions are the tools of R. Each one helps us to do a different task.
Functions perform specific tasks. You call a function by writing its name followed by parentheses ().
This means round() takes an argument x (the number or vector to round) and an argument digits (how many decimal places to round to, which defaults to 0 if not specified).
Arguments can be supplied by position or by name:
1.6.1 Help
Getting help for functions: If you’re not sure how to use a function or what arguments it takes, use R’s help system:
?function_name (e.g.,?round) will bring up the help page for that function.The help page will usually show the usage, a description of each argument, details, examples, and more.
You can also search for functions by keyword using
??keyword orhelp("keyword"). For example, ??rounding might show all help pages that mention rounding.
1.7 Packages
One of the biggest strengths of R is its rich ecosystem of packages. A package is a bundle of functions, data, and documentation, developed by the community, that extends R’s capabilities. Base R comes with a standard set of functions, but for specialized tasks (data visualization, advanced stats, machine learning, etc.), there are thousands of packages available.
Installing packages: To use a package that doesn’t come with base R, you need to install it (typically from CRAN, the Comprehensive R Archive Network). For example, to install the tidyverse package (which actually is a meta-package that includes ggplot2, dplyr, and others commonly used for data science):
You only need to install a package once on your system (or per R installation).
1.7.1 Loading packages
After installing, to use the package in any given R session, you must load it using library(). The common practice is to put all your library(packageName) calls at the top of your script, so it’s clear which packages are needed. For example
You can also use a function without loading the entire package: