```{r, results='show'}
v1 < 3
```
Introduction to R
The goal of this course is a short introduction into the R statistical package. R is a very powerful environment for data analysis, and also a programming language. It is mainly used for the reproducible analysis of data and use only a minimal set of programming instructions.
Beginners may be overwhelmed by the perceived complexity of the R package. With experience, however, this feeling will go away rapidly because the underlying principles in design and usage are easy to learn and understand.
The key advantage of using a package like R over statistical analysis programs that are based on graphical user interfaces (GUIs) is the possibility to write scripts and other types of text files that simultaneously serve as notebooks and therefore greatly contribute to the repeatability and reproducibility of the analyses.
R is an open source software. It is both a programming language and a computing environment for statistical analyses. It can be downloaded from http://www.R-project.org.
Several GUIs and editors are available for R. We will use the development environment Rstudio (free software, available at www.rstudio.org). Both R and Rstudio are available for Linux, Windows and MAC OS.
This tutorial is based on several sources:
- Original writing by members of the research group Crop Biodiversity and Breeding Informatics (in particular, Fabian Freund and Karl Schmid)
- http://cran.r-project.org/doc/contrib/Torfs+Brauer-Short-R-Intro.pdf
If you are very interested in learning R, there are excellent online materials. For example:
- https://swcarpentry.github.io/r-novice-gapminder/
- https://evolutionarygenetics.github.io/
RStudio: Your work environment
RStudio is a Graphical User Interface (GUI) for you to develop your R codes. RStudio has several useful features to assist your programming, such as auto-completion of unfinished codes and highlighting code blocks. Please note that you have to install both R and RStudio because RStudio is just an environment for editing your codes and it runs your codes by calling R in the background.
RStudio on your own computer
After you downloaded RStudio you ca open the program. The RStudio interface consists of several windows.
Bottom left: console window (also called command window). Here you can type commands after the “>” prompt and R will then execute your command. This is the most important window, because this is where R actually does stuff.
Top left: editor window (also called script window). Collections of commands (scripts) can be edited and saved. When you don’t get this window, you can open it with File \(\rightarrow\) New \(\rightarrow\) R script Just typing a command in the editor window is not enough, it has to get into the console window before R executes the command. If you want to run a line from the script window (or the whole script), you can click Run or press CTRL+ENTER to send it to the console window.
Top right: workspace / history window. In the workspace window you can see which data and values R has in its memory. You can view and edit the values by clicking on them. The history window shows what has been typed before.
Bottom right: files / plots / packages / help /viewer window. Here you can open files, view plots (also previous plots), install and load packages or use the help function.
You can change the size of the windows by dragging the grey bars between the windows.
RStudio Cloud
RStudio Cloud (https://rstudio.cloud/) is a cloud-based version of RStudio that allows you to run your analysis online. You can easily sign up with your Google account or with any Email box for free.
The interface of RStudio Cloud is as same as your local RStudio.
R command line in RStudio
In Rstudio, we can either use the normal command line input for R or write scripts in the editor and run these in R. We will focus on the command line input.
Some important basic commands are:
#| eval: false
q() # quit R
# call manual for a command; try q()
?command_name getwd() # show the current working directory
setwd("directory_name") # set a working directory
Working directory
Your working directory is the folder on your computer in which you are currently working. When you ask R to open a certain file, it will look in the working directory for this file, and when you tell R to save a data file or figure, it will save it in the working directory. Before you start working, please set your working directory to where all your data and script files are or should be stored.
The working directory can also be set in Rstudio by clicking on Session -> Set working directory -> Choose directory
. The working directory is important since it is the directory where all output of R will be written to.
Note that everything written after an #
is not evaluated by R - we will use this for commenting.
Tip: If R reports an error message: Error in file(file, "rt") : cannot open the connection ... No such file or directory
when trying to read data, it means you give an incorrect directory. You have to check the typing error(s) in your codes.
Libraries
R can do many statistical and data analyses. They are organized in so-called packages or libraries. With the standard installation, most common packages are installed. To get a list of all installed packages, go to the packages window or type library()
in the console window. If the box in front of the package name is ticked, the package is loaded (activated) and can be used.
There are many more packages available on the R website. If you want to install and use a package (for example, the package called “geometry”) you should:
- Install the package: click install packages in the packages window and type geometry or type
install.packages("geometry")
in the console window. - Load the package: check box in front of geometry or type
library("geometry")
in the console window.
Tip: An important thing to keep in mind is that R packages are not always available with install.packages()
. The default of install.packages()
searches packages on the CRAN repository. However, in some cases, R packages may be archived on bioconductor, Github or other repositories. So, if R returns an warning message of package is not available
when installing a package, try to search for a correct source for your installation.
Tip: Please note that you have to reload R packages with library()
whenever you start a new R session.
First examples of R commands
R as calculator - Part 1
R can be used as a calculator. You can just type your equation in the command window after the >
prompt:
> 10^2 + 36
and R will give the answer
1] 136 [
Exercise
Compute the difference between 2022 and the year you started at this university and divide this by the difference between 2022 and the year you were born. Multiply this with 100 to get the percentage of your life you have spent at this university. Use brackets if you need them.
Note: If you use brackets and forget to add the closing bracket, the >
on the command line changes into a +
. The +
can also mean that R is still busy with some heavy computation. If you want R to quit what it was doing and give back the >
, press ESC
(see the reference list on the last page).
Workspace
You can also give numbers a name. By doing so, they become so-called variables which can be used later. For example, you can type in the command window:
> a <- 4
You can see that a appears in the workspace window, which means that R now remembers what a is. You can also ask R what a is (just type a ENTER in the command window):
> a
1] 4 [
or do calculations with a:
> a * 5
1] 20 [
If you specify a again, it will forget what value it had before. You can also assign a new value to a using the old one.
> a <- a + 10
> a
1] 14 [
To remove all variables from R’s memory, type
> rm(list=ls())
or click clear all
in the workspace window. You can see that RStudio then empties the workspace window. If you only want to remove the variable a
, you can type rm(a)
.
The workspace can be saved permanently after a session and reloaded to use objects defined in an earlier session. The workspace can be saved in Rstudio using the menu in the upper right corner. If you terminate R, you will always be asked whether the workspace should be saved (it is saved in the working directory).
Exercise
Repeat the previous exercise, but with several steps in between. You can give the variables any name you want, but the name has to start with a letter.
Scalars, vectors and matrices
Like other programs, R organizes numbers in
- scalars (a single number - 0-dimensional),
- vectors (a row of numbers, - 1-dimensional)
- matrices (like a table - 2-dimensional).
The a
you defined before was a scalar. To define a vector with the numbers 3, 4 and 5, you need the function c
, which is short for concatenate (paste together).
<- c(3,4,5) b
Matrices and other 2-dimensional structures will be introduced below.
R Style guide
You may wonder whether it is better to write rm(list=ls())
or rm(list = ls())
, or to write b <- c(3,4,5)
or b <- c(3, 4, 5)
. All of these variants work and their use is a matter of taste.
Simple rule can be applied to decide on the writing:
- As long as you write for yourself, decide on what you like
- But be consistent throughout your code, it makes reading easier
- If you write for others. Follow a generally accepted style guide. A widely used style guide for R is here: https://style.tidyverse.org/
The assignment operator
Historically, R used <-
as an assignment operator, but =
can be used as well because it does the same thing. <-
consists of two characters, < and -, and represents an arrow pointing at the object receiving the value of the expression. In this introduction, we use <-
as assignment operator.
Objects, values and classes
When using R, your data, functions, results etc. are stored in the active memory of the PC in the form of objects of different classes to which you assign names.
Here is a summary of basic classes:
integer
: numbers that can be written without a decimal component
numeric
(ordouble
): any number; can be positive or negative
character
: any text, including symbols and numbers (numbers ofcharacter
class CANNOT be used in calculation)
logical
: value ofTRUE
orFALSE
The data of basic classes are building blocks of your objects:
Object | Class |
---|---|
vector | numeric, integer, character, logical |
factor | numeric, integer, character |
array | numeric, integer, character, logical |
matrix | numeric, integer, character, logical |
data frame | numeric, integer, character, logical |
list | numeric, character, logical, function, expression… |
Note: factor
is the text with categorical information (so it CAN be used in statistical models)
First let us focus on numerical objects. To create numerical objects we need to write our result to a variable. A variable is an object to which we have given a name and assign a value.
Here is an example:
<- 1
x <- 2
y <- 10
z <- x + y
Peter <- y - x
Bernard <- z * Peter Rabbit
Now we can display the values of these variables simply by typing their name.
x
etc.
Vectors and functions
Often, we will not only deal with single objects, but with several objects at once. For example, various measurements may have been taken from the same plant. All measurements combined in one variable are a vector (an ordered set) of entries.
Vectors
For example, the vector (1,2,3,4) can be defined by
<- c(1,2,3,4) v1
A vector consists of objects of the same class and that by using c()
on vectors, you can concatenate vectors. Try using the command str
instead of mode to get further information about an object.
<- c("a", 1, 2, 3)
v2 mode(v2)
str(v2)
<- c(v1, v1, v1)
v3 v3
Instead of typing all entries by hand, vectors consisting of copies of the same element or of equidistant values can be defined by the following commands:
rep(4, 6) # The first argument gives the object to repeat,
# the second argument the number of repetitions
seq(2, 4, 0.5) # Consists of values with distance 0.5 from 2 to 4 (including 2 and 4)
1:7 # A vector consisting of 1,...,7
seq(along = v2) # Vector (1,...,length(v2))
Note that length(v1)
gives the length of the vector v1
.
Exercise
Construct a vector of length 300 consisting of 50 copies of
1,2,3,4,5,6
.
Working with vectors
Let v1
be a vector with numerical entries, for example
<- 1:5
v1 mode(v1)
A certain entry, say the\(i\)th entry of v1
can be accessed by v1[i]
. Note that you can also access a sub vector by specifying all of the entries you want to access.
1] # The first entry of the vector`
v1[c(1, 2)] # The first two entries of the vector`
v1[-c(1, 2)] # All entries of the vector apart from the first two`
v1[<3] # All entries smaller than 3` v1[v1
Question: What kind of an object is v1<3
?
In the last example, we introduced a vector consisting of objects of class logical
. Let’s look at it:
Such operations (a vector, a comparison operator and a object the vector is compared to) produces a vector giving the result of the comparison for each vector entry.
Here is a list of logical comparison operators:
==
,!=
: equal, unequal>
,>=
: greater, greater or equal<
,<=
: smaller, smaller or equal&
,|
: and, or (to combine logical expressions, each expression has to be put into ())!(
logical condition)
: negation of a logical condition
<- (v1 > 1) & (v1 < 4)
logicv1 v1[logicv1]
So far, accessing entries of a vector resulted in the output of the accessed entries, discarding the information at which position of the original vector the entries are placed. This information can be retrieved by which
.
<- -7:3 # Defines v4 as the vector (-7, -6, ..., 1, 2, 3). Note the blank!
v4 which(v4 > 0) # Gives the positions of all entries of v4 bigger than
Exercise
In the previous exercise, you constructed a vector of length 300 consisting of 50 copies of
1,2,3,4,5,6
. How would you extract the values between 2 and 5 from this vector?
Note that to get the position of a minimal or maximal entry of a numerical vector v1
, you can use which.min(v1)
and which.max(v1)
(if there are several entries tied for minimum or maximum, the entry with lowest position is shown). To overwrite entries in a vector, you just have to assign a new value to the entry:
1] <- 100
v1[1]
v1[ v1
Here’s a list of some useful functions for a numerical vector v
:
max(v)
,min(v)
: Gives the maximal/minimal value of a vectorsum(v)
: Sums the entries ofv
mean(v)
: arithmetic mean ofv
sort(v)
: sort entries ofv
in increasing order. To sort in decreasing order, add the second argumentdecreasing=TRUE
For a logical vector v
, the following commands might be useful:
any(v)
: Is at least one entry ofv
TRUE
?all(v)
: Are all entries ofv
TRUE
?
Functions
If you would like to compute the mean of all the elements in the vector b
from the example above, you could type
> (3 + 4 + 5) / 3
But when the vector is very long, this is very boring and time-consuming work. This is why things you do often are automated in so-called functions. Some functions are standard in R or in one of the packages. You can also program your own functions. When you use a function to compute a mean, you will type:
> mean(x = b)
Within the brackets you specify the arguments. Arguments give extra information to the function. In this case, the argument x
says of which set of numbers (vector) the mean should computed (namely of b
?). Sometimes, the name of the argument is not necessary: mean(b) works as well.
Exercise
Compute the sum of 4, 5, 8 and 11 by first combining them into a vector and then using the function
sum
.
The function rnorm
, as another example, is a standard R function which creates random samples from a normal distribution. Hit the ENTER
key and you will see 10 random numbers as:
> rnorm(10)
1] -0.949 1.342 -0.474 0.403
[5] -0.091 -0.379 1.015 0.740
[9] -0.639 0.950 [
- Line 1 contains the command:
rnorm
is the function and the 10 is an argument specifying how many random numbers you want - in this case 10 numbers (typingn=10
instead of just 10 would also work). - Lines 2-4 contain the results: 10 random numbers organised in a vector with length 10.
Entering the same command again produces 10 new random numbers. Instead of typing the same text again, you can also press the upward arrow key to access previous commands. If you want 10 random numbers out of normal distribution with mean 1.2 and standard deviation 3.4 you can type
> rnorm(10, mean = 1.2, sd = 3.4)
showing that the same function (rnorm
) may have different interfaces and that R has so called named arguments (in this case mean and sd). By the way, the spaces around the “,” and “=” do not matter, but the use of spaces is frequently recommended to improve the readbility of the code.
Comparing this example to the previous one also shows that for the function rnorm only the first argument (the number 10) is compulsory, and that R gives default values to the other so-called optional arguments.
Tip: RStudio has a nice feature: when you type rnorm(
in the command window and press TAB, RStudio will show the possible arguments.
R as calculator - Part 2
The R console can be used as pocket calculator and basic arithmetic calculations +
, -
, *
and /
are easy to carry out. Many more complex mathematical operations are available, for example:
^
: powersqrt(x)
: square root ofx
log(x)
,exp(x)
: natural logarithm and exponential function ofx
, respectivelysin(x)
,cos(x)
,tan(x)
: trigonometric functions ofx
abs(x)
: absolute value|x|
ofx
Other object classes
So far, we have used one type of object classes, numeric
. Four other important object classes are character
, logical
, integer
and function
.
The class character
consists of objects that are strings of symbols (e.g. words but also something like A?7Fd
).
The class logical
is reserved for the logical expressions TRUE
and FALSE
. Such objects are often useful in programming.
integer
is a class which only allows integer-valued entries.
NA
is the class reserved for missing values. NA
is not really an object class, but gets it class information from what object class the value is missing from. If no such information is available, NA
is treated as logical.
<- "hello"
word <- "A?7Fd"
word2 <- NA
mv <- TRUE Bool
To display a class
of an object one can use a function like mode()
mode(x)
mode(sin)
mode(word)
mode(Peter)
Plots
R can make graphs. This is a very simple example:
<- rnorm(100)
x plot(x)
- In the first line, 100 random numbers are assigned to the variable x, which becomes a vector by this operation.
- In the second line, all these values are plotted in the plots window.
Exercise
Plot 100 normal random numbers.
Help and documentation
There is a large amount of free documentation and help available. Some help is automatically installed. Typing in the console window the command
help(rnorm)
gives help on the rnorm
function. It gives a description of the function, possible arguments and the values that are used as default for optional arguments. Typing
example(rnorm)
gives some examples of how the function can be used. An HTML-based global help can be called with:
help.start()
or by going to the help window.
The following links can also be useful:
- https://cran.r-project.org/doc/manuals/R-intro.pdf – A full manual.
- http://cran.r-project.org/doc/contrib/Short-refcard.pdf – A short reference card.
- www.statmethods.net – Also called Quick-R. Gives very productive direct help. Also for users coming from other programming languages.
- mathesaurus.sourceforge.net – Dictionary for programming languages (e.g. R for Matlab users).
- Just using Google (type e.g. “R rnorm” in the search field) can also be very productive.
Exercise
Find help for the
sqrt
function.
Scripts
R is an interpreter that uses a command line based environment. This means that you have to type commands, rather than use the mouse and menus. This has the advantage that you do not always have to retype all commands and are less likely to get complaints of arms, neck and shoulders.
You can store your commands in files, the so-called scripts. These scripts have typically file names with the extension .R, e.g. foo.R
. You can open an editor window to edit these files by clicking File
and New
or Open file...
, where also the options Save
and `Save as are available.
You can run (send to the console window) part of the code by selecting lines and pressing CTRL+ENTER
or click Run
in the editor window. If you do not select anything, R will run the line your cursor is on. You can always run the whole script with the console command source, so e.g. for the script in the file foo.R
you type:
source("foo.R")
You can also click Run all
in the editor window or type CTRL+SHIFT+S
to run the whole script at once.
Exercise
Make a file called
firstscript.R
containing R-code that generates 100 random numbers and plots them, and run this script several times.
Note: You can add comments to the script explaining what the code does. Type these comments behind a #
sign, so is not evaluated as code by R.
Data structures
If you are unfamiliar with R, it makes sense to just retype the commands listed in this section. Maybe you will not need all these structures in the beginning, but it is always good to have at least a first glimpse of the terminology and possible applications.
Vectors
Vectors were already introduced, but they can do more:
> vec1 <- c(1, 4, 6, 8, 10)
> vec1
1] 1 4 6 8 10
[> vec1[5]
1] 10
[> vec1[3] <- 12
> vec1
1] 1 4 12 8 10
[> vec2 <- seq(from = 0, to = 1, by = 0.25)
> vec2
1] 0.00 0.25 0.50 0.75 1.00
[> sum(vec1)
1] 35
[> vec1 + vec2
1] 1.00 4.25 12.50 8.75 11.00 [
- In line 1, a vector
vec1
is explicitly constructed by the concatenation functionc()
, which was introduced before. Elements in vectors can be addressed by standard[i]
indexing, as shown in lines 4-5. - In line 6, one of the elements is replaced with a new number. The result is shown in line 8.
- Line 9 demonstrates another useful way of constructing a vector: the
seq()
(sequence) function. - Lines 10-15 show some typical vector oriented calculations. If you add two vectors of the same length, the first elements of both vectors are summed, and the second elements, etc., leading to a new vector of length 5 (just like in regular vector calculus). Note that the function sum sums up the elements within a vector, leading to one number (a scalar).
Matrices
Matrices are nothing more than 2-dimensional vectors. To define a matrix, use the function matrix:
> mat <- matrix(data = c(9,2,3,4,5,6), ncol = 3)
> mat
1] [,2] [,3]
[,1,] 9 3 5
[2,] 2 4 6 [
The argument data specifies which numbers should be in the matrix. Use either ncol to specify the number of columns or nrow to specify the number of rows.
More formally defined, a matrix is a rectangular scheme of \(n\cdot m\) values, where \(n\) is the number of rows and \(m\) is the number of columns. A matrix can be defined by listing all entries in a vector, specifying the number of rows and stating whether the entries are ordered by rows or columns. The \(n\times n\) identity matrix can be defined by diag(n)
.
<- matrix(c(1, 2, 3, 4), nrow = 2, byrow = TRUE) # Ordered by rows
matrix1 <- matrix(c(1, 2, 3, 4), nrow = 2, byrow = FALSE) # Ordered by columns
matrix2 <- diag(2) D
Note that since the first argument of matrix
is a vector, you can use the commands written down in the chapter about vectors to easily built matrices with specific patterns in their entries. To access an entry of a matrix, you have to specify its row and column. As in the case of vectors, you can also access several entries at once.
1,2] # Accesses the entry in the 1st row, 2nd column
matrix1[1] # Accesses the first column
matrix1[ ,1, ] # Accesses the first matrix1[
There are many operations available to manipulate matrices. The following list shows some important commands for matrices:
<- c(1, 1) # Defines a 2-dimensional vector
v3 t(matrix1) # Transposes matrix1 (switches rows with columns)
%*% matrix2 # Multiplies matrix1 with matrix 2
matrix1 %*% v3 # Multiplies matrix with vector
matrix1 eigen(matrix1) # Computes eigenvalues and eigenvectors
solve(matrix1) # Inverts the matrix
solve(matrix1, v3) # Solves the system of linear equation matrix1*x=v3
cbind(matrix1, v3) # Adds v3 as a new column (works also adding matrices)
rbind(matrix1, v3) # Adds v3 as a new row (works also adding matrices)
An expansion of the concept of matrix for more than two dimensions is array
.
Exercise
Put the numbers 31 to 60 in a vector named
P
and in a matrix with 6 rows and 5 columns namedQ
. Tip: use the functionseq
. Look at the different ways scalars, vectors and matrices are denoted in the workspace window.
Matrix operations are similar to vector operations:
> mat[1,2]
1] 3
[> mat[2,]
1] 2 4 6
[> mean(mat)
1] 4.8333 [
- Elements of a matrix can be addressed in the usual way:
[row,column]
(line 1). - Line 3: When you want to select a whole row, you leave the spot for the column number empty (the other way around for columns of course).
- Line 5 shows that many functions also work with matrices as argument.
Data frames
Time series are often ordered in data frames. A data frame is a matrix with names above the columns. This is nice, because you can call and use one of the columns without knowing in which position it is.
> t <- data.frame(x = c(11,12,14), y = c(19,20,21), z = c(10,9,7))
> t
x y z1 11 19 10
2 12 20 9
3 14 21 7
> mean(t$z)
1] 8.666667
[> mean(t[["z"]])
1] 8.666667 [
- In lines 1-2 a typical data frame called
t
is constructed. Its columns have the namesx
,y
andz
. - Line 8-11 show two ways of how you can select the column called
z
from the data frame calledt
.
Normally, we will deal with data collected from different individuals. This data can be seen as a scheme with rows and columns (similar to a matrix), where the rows stand for the individuals and the columns stand for each measured variable or some information about experimental factors. Note here that in contrast to a matrix, the columns may have any object as entry (but the type of object is equal for all rows/individuals). This type of data structure is called a data frame in R. It can be defined by the command data.frame
as follows:
<- data.frame(height = c(3, 4, 5, 3),
data1 earlength = c(5, 5, 4, 2),
treatment = c("c", "a", "a", "b"))
# Note that the third column is referred to as a factor
str(data1)
If we compare defining a data frame to defining a matrix, we see that we enter each column as a separate vector and we can name the columns (similar naming is possible for matrices and vectors). Non-numerical values are mostly experimental conditions in a data frame and will be treated as factors. To access entries, there are two possibilities: We can do as with matrices or directly address the columns.
1, 2] # Accesses the entry in the 1st row, second column
data1[$earlength[1] # Does the same
data1$earlength # The column earlength
data11]] # The first column data1[[
The benefit of using the column names is that you don’t have to memorize the exact structure of the data frame, but just the column names (thus, use reasonable column names). For programming though, it’s often easier to address the columns by number and not by name. If you are about to work with one data frame a lot, you can use attach()
to add the data frame to the search path of R. This means that R knows that if you type in a column name, it’s from said data frame. You can detach by using detach()
#| eval: false
$height
data1# Doesn't work
height attach(data1)
# Works now
height $height
data1detach(data1)
We know how to access different columns of a data frame. However, we will often be interested to work with a subset of data, for example only data from individuals/rows under a certain experimental condition (e.g., a certain factor level of an experimental factor). Such subsets of data can be accessed by subset
.
subset(data1, treatment == "a") # Chooses all rows with treatment a
<- subset(data1, treatment %in% c("a", "b")) # Chooses all rows with treatment a or b
data_sub str(data_sub)
table(data_sub$treatment) # Unused factor levels are kept by subset
<- droplevels(data_sub$treatment) # Kicks out unused factor levels
data_sub2 table(data_sub2)
subset(data1, treatment == "a", select = height) # Shows the height values for all individuals \# with treatment a
As with matrices, rbind
and cbind
can be used to glue data sets together. A more flexible command is merge
(which we don’t cover here), to learn about it type ?merge
. A similar, but more flexible class for such lists is list
. All defined objects are displayed in a list in the upper right corner in Rstudio.
Exercise
Make a script file which constructs three random normal vectors of length 100. Call these vectors
x1
,x2
andx3
. Make a data frame calledt
with three columns (calleda
,b
andc
) containing respectivelyx1
,x1+x2
andx1+x2+x3
. Callplot(t)
for this data frame. Can you understand the results? Re-run this script a few times.
Lists
Another basic structure in R is a list. The main advantage of lists is that the “columns” (they’re not really ordered in columns any more, but are more a collection of vectors) don’t have to be of the same length, unlike matrices and data frames.
> L <- list(one = 1, two = c(1,2), five=seq(0, 1, length = 5))
> L
$one
1] 1
[$two
1] 1 2
[$five
1] 0.00 0.25 0.50 0.75 1.00
[> names(L)
1] "one" "two" "five"
[> L$five + 10
1] 10.00 10.25 10.50 10.75 11.00 [
- Lines 1-2 construct a list with names and values. The list also appears in the workspace window.
- Lines 3-9 show a typical printing (after pressing
L ENTER
). - Line 10 illustrates how to find out what is in the list.
- Line 12 shows how to use the numbers.
Functions
To understand computations in R, two slogans are helpful:
- Everything that exists is an object.
- Everything that happens is a function call.
<- c("a", 1, 2, 3)
v2 str(v2)
So if we go back to this example from before we now know that v2
is an object. str()
on the other hand is a function that provides us with some information on v2
.
How to make a function?
All R functions have three parts:
- the
body
, the code inside the function. - the
formals
, the list of arguments which controls how you can call the function. - the
environment
, the map of the location of the functions variables.
Here we will focus on the first two components.
To define a new function, we have to specify the arguments of the function, a function name and the function itself. For example, we can define the function that calculates \(\textit{2*b + 2}\) by:
<- function(b) {
a 2 * b + 2
}formals(a)
body(a)
mode(a) # shows the mode of a
a(4) # computes the value of the function a for an input number 4
A function may have more than one argument, and the arguments don’t necessarily have to be objects of the class numeric
. For example R can also plot mathematical functions by using the function curve
. curve
has many possible arguments. Type ?curve
to get an overview. Note that some arguments have a predefined default value, meaning that if you don’t specify a value for such an argument, the default value is used. For starters, we will focus on the arguments:
expr
: The function which to plotfrom
: Lower bound of the \(x\)-coordinate of the plotto
: Upper bound of the \(x\)-coordinate of the plotxlab
: Label of the \(x\)-axis, can either be written in text (“text”) or as a mathematical expression (usingexpression()
)ylab
: Label of the \(y\)-axis, , can either be written in text (“text”) or as a mathematical expression (usingexpression()
)
Here’s the command to let R plot the function sin2x
from \(-2\pi\) to \(2\pi\) (with labelled axes).
#| eval: false
<- function(x) {
sin2x sin(2**x)
}curve(sin2x, from = -2 * pi, to = 2 * pi, xlab = "x ",
ylab = expression(sin(2 * x)))
Note that R doesn’t keep track of objects defined in a function unless you force it to return their values. By default, just the last evaluated expression is returned (as seen). Using return
at the end of your function, you can specify the return values. Here’s an example:
<- function(x) {y <- 2*x; x+y}
testf testf(2) # returns only the value of the function, y is not returned
<- function(x) {y <- 2*x; z <- x+y; return(c(y, z))}
testf testf(2) # y and z are returned
Exercises:
Plot a function
x*x
orx^2
(both will of course give the same result) forx
from -100 to 100.Write a function that computes the mean of all negative and the mean of all positive values of a vector.
Consider an very big (infinite) population of diploid individuals. A locus with alleles \(A_1\) and \(A_2\) is in Hardy-Weinberg equilibrium if the genotype frequencies are
Genotype \(A_1A_1\) \(A_1A_2\) \(A_2A_2\) Frequency \(p^2\) \(2pq\) \(q^2\) where \(p\) is the frequency of \(A_1\) and \(q=1-p\) is the frequency of \(A_2\).
Create a \(R\) script and write a function
HWfreq
that returns the genotype frequencies if you use the \(A_1\) allele frequency \(p\).
Graphics
One of the main strengths of R comes from its strong graphical possibilities. Here we will just learn the basics of the plotting functions while it is encouraged to look into various online plotting tutorials if you want to learn more:
The following lines show a simple plot
plot(rnorm(100), type = "l", col = "gold")
Hundred random numbers are plotted by connecting the points by lines (the symbol between quotes after the type=
, is the letter l, not the number 1) in gold.
Another very simple example is the classical histogram plot, generated by the simple command
hist(rnorm(100))
The following few lines create a plot using the data frame t
constructed in the previous exercise:
plot(t$a, type = "l", ylim = range(t), lwd = 3, col = rgb(1, 0, 0, 0.3))
lines(t$b, type="s", lwd=2, col = rgb(0.3, 0.4, 0.3, 0.9))
points(t$c, pch = 20, cex = 4, col = rgb(0, 0, 1, 0.3))
Note that with plot you get a new plot window while points and lines add to the previous plot.
Add these lines to the script file of the previous section. Try to find out, either by experimenting or by using the help, what the meaning is of rgb
, the last argument of rgb
, lwd
, pch
, cex
.
To learn more about formatting plots, search for par
in the R help. Google “R color chart” for a pdf file with a wealth of color options.
To copy your plot to a document, go to the plots window, click the “Export” button, choose the nicest width and height and click Copy
or Save
.
More advanced plotting
#| eval: false
data(iris) # loading a plant dataset already existing in R
class(iris) # lets see what type of data this is
summary(iris) # summary of the dataset
plot(iris$Sepal.Length,iris$Sepal.Width) # plots the length and width of the plants
# shows us all the options we can use with the plot function
?plot plot(iris$Petal.Length, iris$Petal.Width, pch=21,
bg=c("red","green3","blue")[unclass(iris$Species)],
main="Iris_Data",
xlab="Petal_length",ylab="Petal_Width")
Basic linear regression models
To fit a linear model to a data set, we just have to specify the linear model we want to use and then plot the data using the plot()
function.
#| eval: false
<- c(1,2,3,4,5)
x <- c(1.6,4,6.5,7.5,10)
y plot(x,y)
Here, you can again add graphical arguments to plot. Now we want to add a regression line. We define the regression of \(y\) on \(x\) as:
#| eval: false
<- lm(y~x)
reg
regstr(reg)
abline(reg) # draws the regression line into the plot
Function abline
just draws a line with a certain\(y\)-axis intercept and slope (here given by the object reg
). See ?abline
for detail. The function lm
uses the given formula to fit a linear model. This model can have several independent variables (we used one, namely x
) and may also include interactions between the independent variables (nested designs are also possible). Let’s go through a little example. Denote again with y the dependent variable and with x1,x2,x3
the independent variables.
If you want to include
- no interactions, the model is
y ~ x1 + x2 + x3
- only all interactions of two variables, the model is
y ~ (x1 + x2 + x3)^2
- all interactions, the model is
y ~ x1 * x2 * x3
To get more information, type ?lm
. A good analysis of the iris
dataset using linear models (lm
) can be found here: https://warwick.ac.uk/fac/sci/moac/people/students/peter_cock/r/iris_lm/
Reading and writing data files
Importing data from external sources such as tables from spreadsheet programs like Excel, text files of measurement, output data from other programs, or data from big databases is a frequent task in data analysis.
We start first with a data set in format .txt
, which we will create using the built-in editor in Rstudio. Open a new .txt
-file by clicking on the button marked with + in the upper right corner in the GUI and choosing a new text file.
Type in:
height | earlength | experiment |
---|---|---|
150 | 15 | a |
120 | 11 | b |
135 | 11 | c |
Note here that we have a heading containing the column names and semicolons which separate different values (imagine the data coming from measuring the traits in crop plants grown under different experimental conditions). Save the file asdata2.txt
in your working directory. We will now import this data set as a data frame in R. This can be either done by clicking on Import Dataset
in Rstudio in the upper right corner (and specify the header, the separation symbol etc.). The same can be done by using the command read.csv
with suitable parameters.
#| eval: false
<- read.csv("data2.txt", sep = ";") data2
The argument header
has the default value TRUE
meaning that the program reads in the first row of the text as the header containing column names (For numbers, read.csv
expects the decimal symbol .
). The same command also works for excel files if you export the files from Excel as csv files.
A similar command is read.table
, but it has different default values for the arguments.
Data frames can be saved as text files with write.table
, which has the same arguments and default values as read.table
. You only have to specify the name of the output file:
#| eval: false
write.table(data1, "data1.txt", sep = ";") # Separation symbol ;
You can view output file in the built-in editor of Rstudio.
Another command for writing to text files is write.csv
. It allows no control on arguments to enable problem-free export to Excel.
To save a R-object as a R-object, which is not a text file but a binary object, you can use the command save
. Hdere is an example to write R objects v1
and v2
to a file with save
by writing all objects into a file called vector.RData
. The ending .RData
is not mandatory but used to indicate that the file is a binary file containing R objects, which can be imported into R with load
.
#| eval: false
save(v1, v2, file = "vector")
<- 0 # Change v1
v1 <- 0 # Change v2
v2 load("vector") # Load the previous definitions of the vectors
v1 v2
The workspace including all defined variables can be saved by save.image("file_name")
and loaded by load("file_name")
.
Exercise
Make a file called
tst1.txt
in aText File
window of RStudio similar to the above text file and store it in your working directory. Write a script to read it, to multiply the column calledg
by 5 and to store it astst2.txt
.
Not available (or missing) data
As already mentioned, missing data should be coded as NA
. One way to exclude missing data is to only keep data rows that are complete for all variables. This can be done by na.exclude
. For vectors and data frames, it marks all missing values to be ignored in further analyses. To show the positions of NA
, use is.na
. It gives the same object format with logical values indicating whether there is a missing value (TRUE
) or not (FALSE
). To check whether there is any missing data at all, use any(is.na())
<- c(NA, 2, 4, NA)
v_na <- data.frame(b1 = 1:4, b2 = c(NA, 3, 3, NA))
data_na
data_na is.na(v_na)
any(is.na(v_na))
is.na(data_na)
<- na.exclude(data_na) # Remove all rows with NA
data_good
data_good str(data_good)
mean(na.exclude(v_na)) # Compute the mean of present values, no permanent change in v_na
Exercise
Compute the mean of the square root of a vector of 100 random numbers. What happens?
When you work with real data, you will encounter missing values because instrumentation failed or because you didn’t want to measure in the weekend. When a data point is not available, you write NA
instead of a number.
<- c(1,2,NA) j
Calculating statistics of incomplete data sets is strictly speaking not possible. Maybe the largest value occurred during the weekend when you didn’t measure. Therefore, R will say that it doesn’t know what the largest value of j is:
> max(j)
1] NA [
If you don’t mind about the missing data and want to compute the statistics anyway, you can add the argument na.rm = TRUE
(Should I remove the NA
s? Yes!).
> max(j, na.rm=TRUE)
[1] 2
Classes
The exercises you did before were nearly all with numbers. Sometimes you want to specify something which is not a number, for example the name of a measurement station or data file. In that case you want the variable to be a character string instead of a number.
An object in R can have several so-called classes. The most important three are numeric, character and POSIX (date-time combinations). You can ask R what class a certain variable is by typing class(...)
.
Characters
To tell R that something is a character string, you should type the text between apostrophes, otherwise R will start looking for a defined variable with the same name:
> m <- "apples"
> m
1] "apples"
[> n <- pears
: object ‘pears’ not found Error
Of course, you cannot do computations with character strings:
> m + 2
in m + 2 : non-numeric argument to binary operator Error
Dates
Dates and times are complicated. R has to know that 3 o’clock comes after 2:59 and that February has 29 days in some years. The easiest way to tell R that something is a date-time combination is with the function strptime:
> date1 <- strptime( c("20170225230000",
+ "20170226000000", "20170226010000"),
+ format="%Y%m%d%H%M%S")
>
> date1
1] "2017-02-25 23:00:00"
[2] "2017-02-26 00:00:00"
[3] "2017-02-26 01:00:00" [
- In lines 1-2 you create a vector with
c(...)
. The numbers in the vectors are between apostrophes because the function strptime needs character strings as input. - In line 3 the argument format specifies how the character string should be read. In this case the year is denoted first
(%Y)
, then the month(%m)
, day(%d)
, hour(%H)
, minute(%M)
and second(%S)
. You don’t have to specify all of them, as long as the format corresponds to the character string.
Exercise
Make a graph with on the x-axis: today, the next end-of-year day and your next birthday and on the y-axis the number of presents you expect on each of these days. Tip: make two vectors first.
Programming tools
When you are building a larger program than in the examples above or if you’re using someone else’s scripts, you may encounter some programming statements. In this Section we describe a few tips and tricks.
To write complex functions or to use commands repeatedly in an automatic fashion, it is necessary to use R as a programming language. The two most important ingredients of programming are conditions (if
, else
) and loops (repetition of commands).
Conditional execution of commands
The if
-statement is used when certain computations should only be done when a certain condition is met (and maybe something else should be done when the condition is not met). An example:
> w <- 3
> if( w < 5 ){
+ d = 2
+ } else {
+ d = 10
+ }
> d
1] 2 [
- In line 2 a condition is specified: w should be less than 5.
- If the condition is met, R will execute what is between the first brackets in line 4.
- If the condition is not met, R will execute what is between the second brackets, after the else in line 6. You can leave the else{…}-part out if you don’t need it.
- In this case, the condition is met and d has been assigned the value 2 (lines 8-9).
The syntax of the commands is
if (logical condition) {command} else {commands} # The else part is optional
If the logical condition equals TRUE
, then the the commands in the first brackets are used. If else
is used, the second set of commands will be used if the logical condition equals FALSE
. We will now define a function which states whether a numerical value is positive.
#| eval: false
<- function(x){if (x > 0){print("positive")} else {print("not positive")} pos
Try out the new defined function pos
!
if
-conditions can be nested, for example we can give the sign of a numerical value b programming
#| eval: false
<- function(x){
sign2 if (x > 0){
print("positive")
}else{
if (x == 0){
print("value is 0")
}else{
print("negative")
}
} }
There is a (somewhat) similar function to if
in R called switch
.
To get a subset of points in a vector for which a certain condition holds, you can use a shorter method:
> a <- c(1,2,3,4)
> b <- c(5,6,7,8)
> f <- a[b == 5 | b == 8] > f
1] 1 4 [
- In line 1 and 2 two vectors are made.
- In line 3 you say that f is composed of those elements of vector
a
for whichb
equals 5 or 8. Note the double=
in the condition. Other conditions (also called logical or Boolean operators) are <, >, !=, <= and >= >=. To test more than one condition in one if-statement, use&
if both conditions have to be met (“and”) and|
if at least one of the conditions has to be met (“or”).
Repeated execution of commands
Repeated use of a command can be achieved by using one of the following commands:
repeat
{commands}:repeats the commands until it hitsbreak
while
(condition){commands}: Repeats the commands as long as the condition is fulfilledfor
(range) {commands}: Repeats the commands for as many times as specified by range
These commands are manipulating loops:
next
: jumps right to the begin of the loopbreak
: ends the loop
The exact syntax of these loop commands will be shown for the example of summing up all entries of a vector v.
With repeat
:
<- 1:10 # An arbitrary vector
v <- 0
sumv <- 0
i repeat {
<- i + 1
i <- sumv + v[i]
sumv if (i < length(v)) {next}
print(sumv)
break}
With while
:
<- 0 # Reset value
sumv <- 0 # Reset value value
i while (i < length(v)) {
<- i + 1
i <- sumv + v[i]
sumv
}print(sumv)
With a for
loop:
<- 0 # Set back value
sumv for (i in seq(along = v)){sumv <-sumv + v[i]}
print(sumv)
For the for
-loop, we used seq(along = v)
instead of 1:length(v)
because of possible problems if length(v)
would be zero. Summing up a vector is already implemented in R, just type sum(v)
.
Exercise
For any numerical vector
v
, write a function/R-script that computes the sum of all positive and all negative values and shows the results on the screen. Try to use loops and/or conditions. Test it with several vectors. Do you have to use loops and/or conditions?
Remark: An output consisting of text and variables can be displayed by using cat
. Try ?cat
. A small example:
<- 42
x cat("The answer is", x, "\n")
If you want to model a time series, you usually do the computations for one time step and then for the next and the next, etc. Because nobody wants to type the same commands over and over again, these computations are automated in for-loops.
In a for-loop you specify what has to be done and how many times. To tell “how many times”, you specify a so-called counter. An example:
> h <- seq(from = 1, to = 8)
> s <- c()
> for (i in 2:10)
{= h[i] * 10
s[i]
}> s
1] NA 20 30 40 50 60 70 80 NA NA [
- First the vector h is made.
- In line 2 an empty vector (s) is created. This is necessary because when you introduce a variable within the for-loop, R will not remember it when it has gotten out of the for-loop.
- In line 3 the
for
-loop starts. In this case,i
is the counter and runs from 2 to 10. - Everything between the curly brackets (line 5) is processed 9 times. The first time
i=2
, the second element of h is multiplied with 10 and placed in the second position of the vectors
. The second timei = 3
, etc. In the last two runs, the 9th and 10th elements of h are requested, which do not exist. Note that these statements are evaluated without any explicit error messages.
Exercise
Make a vector from 1 to 100. Make a for-loop which runs through the whole vector. Multiply the elements which are smaller than 5 and larger than 90 with 10 and the other elements with 0.1.
Writing your own functions
Functions you program yourself work in the same way as pre-programmed R functions.
<- function(arg1, arg2 )
fun1
{= arg1 ^ 2
w return(arg2 + w)
}<- fun1(arg1 = 3, arg2 = 5)
mod mod
- In line 1 the function name (fun1) and its arguments (arg1 and arg2) are defined.
- Lines 2-5 specify what the function should do if it is called. The return value (arg2+w) is given as output. - In line 6 the function is called with arguments 3 and 5 and the answer is shown on the screen.
- In line 8 the answer is stored in the variable mod.
Exercise
Write a function for the previous exercise, so that you can feed it any vector you like (as argument). Use a for-loop in the function to do the computation with each element. Use the standard R function length in the specification of the counter.
OPTIONAL: Vector-oriented programming
If you want to apply a function
on each entry of a vector, this can either be done by writing a loop as described in the last section or by using sapply
(the latter being faster in most cases). For example, we want to square each entry of a numerical vector.
<- 1:10
v <- function(x) {x^2} # Define the square function
square sapply(v, square) # Arguments: vector/data frame, function
If used wth a data frame, the function is applied to each column of the data frame. To apply the same function to either rows or columns of a matrix,use apply
. We will search for the maximum value in each row and each column of a matrix.
<- matrix(1:25, nrow = 5)
matrix3 # Computes the maximal values in each row
matrix3 apply(matrix3, 1, max)
apply(matrix3, 2, max) # Computes the maximal values in each column
The second argument of apply
specifies whether the function is applied to rows or columns. Often, we want to apply a function to a data frame, but only to all individuals/ rows that carry a certain experimental condition ( = column value). This can be done with tapply
. Let’s imagine an experiment with conditions a and b. We want to compute the mean (command mean
) for the values of each condition.
<- data.frame(values = 1:10, condition = c(rep("a", 5), rep("b", 5)))
data3
data3attach(data3)
tapply(values, condition, mean)
detach(data3)
Reference section: Useful commands and functions in R
Data creation
read.table
: read a table from file. Arguments:header=TRUE
: read first line as titles of the columns;sep=","
: numbers are separated by commas;skip = n
: don’t read the first n lines.write.table
: write a table to filec
: paste numbers together to create a vectorarray
: create a vector, Arguments: dim: length - matrix: create a matrix, Arguments: ncol and/or nrow: number of rows/columnsdata.frame
: create a data framelist
: create a listrbind
andcbind
: combine vectors into a matrix by row or column
Extracting data
x[n]
: then
-th element of a vectorx[m:n]
: them
-th to nth elementx[c(k,m,n)]
: specific elementsx[x>m & x<n]
: elements betweenm
andn - xn
: element of list or data frame named n - x[[“n”]]: idem[i,j]
: element ati
-th row andj
-th column -[i,]
: rowi
in a matrix
Information on variables
length
: length of a vectorncol
ornrow
: number of columns or rows in a matrixclass
: class of a variablenames
: names of objects in a listprint
: show variable or character string on the screen (used in scripts or for-loops)return
: show variable on the screen (used in functions)is.na
: test if variable isNA
as.numeric
oras.character
: change class to number or character stringstrptime
: change class from character to datetime (POSIX)
Plotting
plot(x)
: plotx
(y-axis) versus index number (x-axis) in a new windowplot(x,y)
: ploty
(y-axis) versusx
(x-axis) in a new windowimage(x,y,z)
: plotz
(color scale) versusx
(x-axis) andy
(y-axis) in a new windowlines
orpoints
: add lines or points to a previous plothist
: plot histogram of the numbers in a vectorbarplot
: bar plot of vector or data framecontour(x,y,z)
: contour plotabline
: draw line (segment). Arguments:a,b
for intercepta
and slopeb
; orh = y
for horizontal line aty
; orv = x
for vertical line atx
.curve
: add function to plot. Needs to have anx
in the expression. Example:curve(x^2)
legend
: add legend with given symbols (lty
orpch
andcol
) and text (legend
) at location (x = "topright"
)axis
: add axis. Arguments: side –1
=bottom,2
= left,3
= top,4
= rightmtext
: add text on axis. Arguments: text (character string) and sidegrid
: add gridpar
: plotting parameters to be specified before the plots. Arguments: e.g.mfrow=c(1,3))
: number of figures per page (1 row, 3 columns);new = TRUE
: draw plot over previous plot.
Plotting parameters
These can be added as arguments to plot, lines, image, etc. For help see par.
type:
“l”=lines, “p”=points, etc.col
: color – “blue”, “red”, etclty
: line type – 1=solid, 2=dashed, etc.pch
: point type – 1=circle, 2=triangle, etc.main
: title character stringxlab
andylab
: axis labels – character stringxlim
andylim
: range of axes – e.g. c(1,10)log
: logarithmic axis – “x”, “y” or “xy”
Statistics
sum
: sum of a vector (or matrix)mean
: mean of a vectorsd
: standard deviation of a vectormax
ormin
: largest or smallest elementrowSums
(orrowMeans
,colSums
andcolMeans
): sums (or means) of all numbers in each row (or column) of a matrix. The result is a vector.quantile(x,c(0.1,0.5))
: sample the 0.1 and 0.5th quantiles of vectorx
Data processing
seq
: create a vector with equal steps between the numbersrnorm
: create a vector with random numbers with normal distribution (other distributions are also available)sort
: sort elements in increasing ordert
: transpose a matrixaggregate(x,by=ls(y),FUN="mean")
: split data setx
into subsets (defined byy
) and computes means of the subsets. Result: a new list.na.approx
: interpolate (inzoo
package). Argument: vector with NAs. Result: vector without NAs.cumsum
: cumulative sum. Result is a vector.rollmean
: moving average (in thezoo
package) - paste: paste character strings togethersubstr
: extract part of a character string
Fitting
lm(v1 ~ v2)
: linear fit (regression line) between vectorv1
on the y-axis andv2
on the x-axisnls(v1 ~ a + b * v2, start = ls(a = 1, b = 0))
: nonlinear fit. Should contain equation with variables (herev1
andv2
and parameters (herea
andb
) with starting valuescoef
: returns coefficients from a fitsummary
: returns all results from a fit
Programming
function (arglist) {expr}
: function definition: do expr with list of arguments arglistif (cond) {expr1} else {expr2}
: if-statement: if cond is true, then expr1, else expr2for (var in vec) {expr}
: for-loop: the counter var runs through the vector vec and does expr each runwhile (cond) {expr}
: while-loop: while cond is true, do expr each run
Keyboard shortcuts
There are several useful keyboard shortcuts for RStudio (see Help \(\rightarrow\) Keyboard Shortcuts):
CRL+ENTER
: send commands from script window to command window$\uparrow$
or↓
in command window: previous or next commandCTRL+1
,CTRL+2
, etc.: change between the windows
Not R-specific, but very useful keyboard shortcuts:
CTRL+C
,CTRL+X
andCTRL+V
: copy, cut and pasteALT+TAB
: change to another program window - ↑ ↓ ← or → movecursorHOME
orEND
: move cursor to begin or end of linePage Up
orPage Down
: move cursor one page up or downSHIFT+↑/↓/←/→/HOME/END/PgUp/PgDn
: select
Error messages
No such file or directory
orCannot change working directory
: Make sure the working directory and file names are correct.Object ‘x’ not found
: The variable x has not been defined yet. Definex
or write apostrophes ifx
should be a character string.Argument ‘x‘ is missing without default
: You didn’t specify the compulsory argument x.+
: R is still busy with something or you forgot closing brackets. Wait, type } or ) or press ESC.Unexpected ‘)‘ in ")"
orUnexpected ‘}‘ in "}"
: The opposite of the previous. You try to close something which hasn’t been opened yet. Add opening brackets.Unexpected ‘else’ in "else"
: Put the else of an if-statement on the same line as the last bracket of the “then”-part:}else{
.Missing value where TRUE/FALSE needed
: Something goes wrong in the condition-part (if(x==1)
) of an if-statement. Isx
NA
?The condition has length > 1 and only the first element will be used
In the condition-part (if(x==1)
) of an if-statement, a vector is compared with a scalar. Is x a vector? Did you meanx[i]
?Non-numeric argument to binary operator
: You are trying to do computations with something which is not a number. Useclass(...)
to find out what went wrong or useas.numeric(...)
to transform the variable to a number.Argument is of length zero
orReplacement is of length zero
: The variable in question isNULL
, which means that it is empty, for example created byc()
. Check the definition of the variable.
Further literature
There’s an extensive list of books about R (with book descriptions) available at http://www.r-project.org/doc/bib/R-books.html. From our lab, the recommendations of books would be
- M. J. Crawley, “The R Book”, Wiley %- M. J. Crawley, “Statistics: An Introduction using R”, Wiley
- U. Ligges, “Programmieren mit R”, Springer (in German, this course is based on it)
There are also some manuals available at the R-project website http://cran.r-project.org/.