Functions

We now have used many functions that come with R. For example c, matrix, read.csv, and sum. Functions are always used (‘called’) by typing their name, followed by parenthesis. In most, but not all, cases you supply ‘arguments’ within the parenthesis. If you do not type the parenthesis the function is not called. Instead, either the function definition, or some of type of reference to it, is shown.

Existing functions

To see the content of a function, type its name:

nrow
## function (x)
## dim(x)[1L]
## <bytecode: 0x000002d243994a78>
## <environment: namespace:base>

We see that nrow has a single argument called x. It calls another function, dim to which it provides the same argument (x) and returns its first element (1L) (recall that adding L (‘literal’) is a way to create an integer). Can you guess how ncol is implemented? (See for yourself if you are right!). Now, let’s see what dim looks like.

dim
## function (x)  .Primitive("dim")

It is a ‘primitive’ (low level) R function that we cannot easily learn more about. Well, you could, by looking at the source code of R — but that is way out of scope for this tutorial.

To run (instead of inspect) nrow we add parentheses:

nrow()
## Error in nrow(): argument "x" is missing, with no default

This fails, because the function requires a valid argument, like this:

m <- matrix(1:6, nrow=2, ncol=3, byrow=TRUE)
nrow(m)
## [1] 2

Note nrow(m) and that this is equivalent to

nrow(x=m)
## [1] 2

because the first argument of nrow is called x.

Writing functions

R comes with thousands of functions for you to use. Nevertheless, it is often necessary to write your own functions. For example, you may want to write a function to:

  • more clearly describe and isolate a particular task in your data analysis workflow.

  • re-use code. Rather than repeating the same steps several times (e.g. for each of 200 cases you are analysing), you can write a function that gets called 200 times. This should lead to faster development of scripts and to fewer mistakes. And if there is a mistake it only needs to be fixed in one place.

  • create a function that is an argument to another function (!). This is quite commonly done when using ‘apply’ type functions (see next chapter).

For these reasons, writing functions is one of the most important coding skills to learn. Writing your own functions is not difficult. The below is a very simple function. It is called f. This is an entirely arbitrary name. You can also call it myFirstFunction. It takes no arguments, and always returns ‘hello’.

f <- function() {
    return('hello')
}

Look carefully how we assign a function to name f using the function keyword followed by parenthesis that enclose the arguments (there are none in this case). The body of the function is enclosed in braces (also known as “curly brackets” or “squiggly brackets”).

Now that we have the function, we can inspect it, and use it.

#inspect
f
## function ()
## {
##     return("hello")
## }
## <environment: 0x000002d244ea4848>
#use 2 times
f()
## [1] "hello"
f()
## [1] "hello"

f is a very boring function. It takes no arguments and always returns the same result. Let’s make it more interesting.

f <- function(name) {
    x <- paste("hello", name)
    return(x)
}
f('Jasmin')
## [1] "hello Jasmin"

Note the return statement. This indicates that variable x (which is only known inside of the function) is returned to the caller of the function. Simply typing x would also suffice, and ending the function with paste("hello", name) would also do! So the below is equivalent but shorter, at the expense of being less explicit.

f <- function(name) {
    paste("hello", name)
}
f("Sviatoslav")
## [1] "hello Sviatoslav"

Here is a function that returns a sequence of letters. The length is determined by argument n.

frs <- function(n) {
    s <- sample(letters, n, replace=TRUE)
    r <- paste0(s, collapse="")
    return(r)
}

Because the function uses randomization, I use set.seed to always get the same result (as we discussed here.

set.seed(0)
frs(5)
## [1] "nydga"
frs(5)
## [1] "bwknr"
x <- frs(10)
x
## [1] "sauujvnjgi"

Now an example of a functions that manipulates numbers. This function squares the sum of two numbers.

sumsquare <- function(a, b) {
    d <- a + b
    dd <- d * d
    return(dd)
}

We can now use the sumsquare function. Note that it is vectorized (each argument can be more than one number)

sumsquare(1,2)
## [1] 9
x <- 1:3
y <- 5
sumsquare(x,y)
## [1] 36 49 64

You can name the arguments when using a function; that often makes your intentions clearer.

sumsquare(a=1, b=2)
## [1] 9

But the names must match

sumsquare(a=1, d=2)
## Error in sumsquare(a = 1, d = 2): unused argument (d = 2)

And both arguments need to be present

sumsquare(1:5)
## Error in sumsquare(1:5): argument "b" is missing, with no default

Unless we redefine the function with default arguments that will be used if a value for the argument is not provided.

sumsquareD <- function(a=0, b=1) {
    d <- a + b
    dd <- d * d
    return(dd)
}
sumsquareD(1:5, 2)
## [1]  9 16 25 36 49

As both arguments have a default value, we can call sumsquareD without providing arguments

sumsquareD()
## [1] 1

Or with a single argument

sumsquareD(5)
## [1] 36

Above the value 5 was assigned to argument a because the argument was matched “by position”. If we only wanted to provide a value for b, we need to match “by name”.

sumsquareD(b=3)
## [1] 9

Just another example, a function to compute the number of unique values in a vector:

nunique <- function(x) {
    length(unique(x))
}
data <- c("a", "b", "a", "c", "b")
nunique(data)
## [1] 3

Of course, these were toy examples, but if you understand these, you should be able to write much longer and more useful functions. It can be difficult to “debug” (find errors in) a function. It is often best to first write the sequence of commands that you need outside a function, and only when it all works, wrap that code inside of a function block (function( ) { }).

Ellipses (…)

Ellipses ... are a special argument to many functions. It allows to pass optional additional arguments and/or arguments that are passed on to other functions. Consider these two functions (this is a bit advanced).

f1 <- function(x, y=10) {
    x * y
}
# f2 calls f1
f2 <- function(x, ...) {
    f1(x, ...)
}
f2(5)
## [1] 50
f2(5, y=5)
## [1] 25

Even though f2 does not have an argument y it can be provided and it is passed on to f1. This call returns an error :

f2(5, z=5)
## Error in f1(x, ...): unused argument (z = 5)

because f1 does not have an argument z.

Functions overview

A list of much used functions that we discuss in this introduction to R:

c, cbind, rbind length, dim, nrow, ncol

sum, mean, prod, sqrt

apply, sapply, tapply, aggregate rowSums, rowMeans

merge, reshape

Also see this cheatsheet