3  Variables and Indexing

The first advantage of R over many simple calculators is the ability to save results in variables. Variables are like little boxes in which a slip of paper can be put. The paper might have a word, a sentence, an entire text, a number, or anything else on it. Similarly, a variable can hold a string, which is just text, or a numerical value, or any other objects. Each variable must have a name, which is like a little label on the box that allows us to reuse the variable later on by referring to it. This also implies that each box must have a unique label - each variable must have a unique name.

Instead of always creating the matrix of the previous section from scratch, we could have also saved it to a variable called A. Object can be saved to variables by using the assignment operator <-. While most programming languages use the = as an assignment operator, there is a slight difference between = and <- in R. So, to be on the safe side, always use <- for assignment to variables.

A <- matrix(c(1, 2, 3, 4), nrow = 2)
A
     [,1] [,2]
[1,]    1    3
[2,]    2    4
B <- matrix(c(5, 6, 7, 8, 9, 10), nrow = 2)
B
     [,1] [,2] [,3]
[1,]    5    7    9
[2,]    6    8   10

As the two examples above show, the content of variables can be printed to the console by simply calling the variable. However, variables can also be used in calculations by referring to their names. For example, the product of the matrix A and B, which we then save as matrix C, can be obtained in the following way

C <- A %*% B
C
     [,1] [,2] [,3]
[1,]   23   31   39
[2,]   34   46   58

Similarly, any other operations introduced in the previous section can be applied to variables, as long as the variable is holding an appropriate object. So for mathematical operators, the variables must correspond to correct matrices and vectors or scalars.

We can double check the above matrix multiplication by manually calculating the first entry of C. From Linear Algebra you should know that

\[ C_{1,1} = A_{1,1}B_{1,1} + A_{1,2}B_{2,1}. \]

Two obtain the elements in the matrix, we can index into the matrix. Indexing is performed using square brackets. Since matrices are two dimensional arrays, indexing into matrices requires two numbers. The first number corresponds to the row, and the second to the column. Thus, the calculation above can be performed in the following way:

C11_manual <- A[1, 1] * B[1, 1] + A[1, 2] * B[2, 1]
C11_manual
[1] 23

Although we said above that indexing into matrices requries two numbers, this is not technically correct if one is not interested in a single element of the matrix. If we are interested in an entire row or column of a matrix, then the column or row element within the square brackets can be left empty.

From linear algebra you should also know that

\[ C_{1,1} = A_{1,\cdot}B_{\cdot,2}. \] Thus, we can double check the result in the following way

C11_vector_product <- A[1, ] %*% B[, 1]
C11_vector_product
     [,1]
[1,]   23

Indexing for vectors works similar to indexing matrices with the only difference being that vectors only have a single dimension. To demonstrate this, we first create a vector containing the integers from 1 to 10, which is created using 1:10.

x <- 1:10
x
 [1]  1  2  3  4  5  6  7  8  9 10

The third element of this vector is then equal to 3.

x[3]
[1] 3

We can also return a subset of the vector by specifying multiple indices. If the indices of interest are consecutive, we can use : to specify the indices. For example, if we are interested in the first five elements of the vector, we can use 1:5 within square brackets.

x[1:5]
[1] 1 2 3 4 5

R comes with some handy pre-loaded vectors, including the letters vector, which contains all letters from a to z.

letters
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
[20] "t" "u" "v" "w" "x" "y" "z"

To obtain the first, third, fifth, and seventh letter in the alphabet, we can use c(1, 3, 5, 7) within square brackets after letters to index into the first, third, fifth, and seventh element in the vector.

letters[c(1, 3, 5, 7)]
[1] "a" "c" "e" "g"

If you carefully followed the above, you should have noticed that we used c(1, 3, 5, 7) to index into the letters vector. Thus, we technically used a vector to index into another vector. This indeed works, as long as the vector used for indexing contains only integers of Booleans. Additionally, the vector used for indexing can only contain elements between 1 and the length of the vector that is being indexed. Since thereSince there are more than ten letters in the alphabet and since x contains all integers from one to ten, we can use x to index into the letters vector.

letters[x]
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

As pointed out above, we can also use a vector of Booleans to index into a vector. A Boolean is either zero (FALSE) or one (TRUE). Although the vector of Booleans must not technically be of the same length as the vector that is being indexed, we highly recommend to always choose vectors of same length since the results otherwise rely on vector recycling which, as we discussed in the point-wise multiplication case, does not always return what you might expect.

When using a Boolan vector for indexing, a TRUE within the Boolean vector implies, that the corresponding element in the vector that is being indexed, is being returned, while a FALSE implies, that the corresponding entry will not be returned.

x[c(TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE)]
[1]  1  2  6  7 10

Boolean indexing combined with conditional statements can often be useful. While we introduce conditionals only later, we will shortly touch on them here to show the power of Boolean indexing.

The goal below is to obtain a vector of all even numbers between 1 and 100. To do so, we are using the modulus, %%, operator. The modulus returns the remainder after division. Since all even numbers can be divided by two without a remainder, the modulus of an even number with two will always be zero. On the other hand, an odd number is always one more than an even number and thus, the modulus between an odd number of two will always be one.

10 %% 2
[1] 0
11 %% 2
[1] 1

We can combine this with the conditional == which returns TRUE if the left-hand-side is the same as the right-hand-side, while it returns FALSE otherwise.

10 %% 2 == 0
[1] TRUE
11 %% 2 == 0
[1] FALSE

Like any operator in R, the modulus and the conditional can be applied point-wise to a vector of numbers. Thus, after creating the vector x which contains all numbers from 1 to 100, we can take the modulus of all numbers in x with two and check whether it is zero. We save this result in the new variable x_even_selector. Note, that we tried to give the new variable a descriptive name. In general, this is good practice, and bad variable names usually lead to mistakes later on.

x <- 1:100
x_even_selector <- x %% 2 == 0
x_even_selector
  [1] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE
 [13] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE
 [25] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE
 [37] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE
 [49] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE
 [61] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE
 [73] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE
 [85] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE
 [97] FALSE  TRUE FALSE  TRUE

The x_even_selector vector is now a Boolean vector, having a TRUE at entry i if and only if the ith entry of x is even. Thus, we can use the x_even_selector vector to index into x and obtain all even numbers between 1 and 100.

x[x_even_selector]
 [1]   2   4   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34  36  38
[20]  40  42  44  46  48  50  52  54  56  58  60  62  64  66  68  70  72  74  76
[39]  78  80  82  84  86  88  90  92  94  96  98 100

Another way to obtain all even numbers between 1 and 100 would have been to use the seq function, starting at 2 and incrementing by 2 until 100 is reached.

seq(2, 100, 2)
 [1]   2   4   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34  36  38
[20]  40  42  44  46  48  50  52  54  56  58  60  62  64  66  68  70  72  74  76
[39]  78  80  82  84  86  88  90  92  94  96  98 100