Functions
We now have used many functions that come with R. For example c
,
matrix
, read.csv
, and sum
. Functions are always used
(‘called’) by typing their name, followed by parenthesis. In most, but
not all, cases you supply ‘arguments’ within the parenthesis. If you do
not type the parenthesis the function is not called. Instead, either the
function definition, or some of type of reference to it, is shown.
Existing functions
To see the content of a function, type its name:
nrow
## function (x)
## dim(x)[1L]
## <bytecode: 0x000002d243994a78>
## <environment: namespace:base>
We see that nrow
has a single argument called x
. It calls
another function, dim
to which it provides the same argument (x
)
and returns its first element (1L
) (recall that adding L
(‘literal’) is a way to create an integer). Can you guess how ncol
is implemented? (See for yourself if you are right!). Now, let’s see
what dim
looks like.
dim
## function (x) .Primitive("dim")
It is a ‘primitive’ (low level) R function that we cannot easily learn more about. Well, you could, by looking at the source code of R — but that is way out of scope for this tutorial.
To run (instead of inspect) nrow
we add parentheses:
nrow()
## Error in nrow(): argument "x" is missing, with no default
This fails, because the function requires a valid argument, like this:
m <- matrix(1:6, nrow=2, ncol=3, byrow=TRUE)
nrow(m)
## [1] 2
Note nrow(m)
and that this is equivalent to
nrow(x=m)
## [1] 2
because the first argument of nrow
is called x
.
Writing functions
R comes with thousands of functions for you to use. Nevertheless, it is often necessary to write your own functions. For example, you may want to write a function to:
more clearly describe and isolate a particular task in your data analysis workflow.
re-use code. Rather than repeating the same steps several times (e.g. for each of 200 cases you are analysing), you can write a function that gets called 200 times. This should lead to faster development of scripts and to fewer mistakes. And if there is a mistake it only needs to be fixed in one place.
create a function that is an argument to another function (!). This is quite commonly done when using ‘apply’ type functions (see next chapter).
For these reasons, writing functions is one of the most important coding
skills to learn. Writing your own functions is not difficult. The below
is a very simple function. It is called f
. This is an entirely
arbitrary name. You can also call it myFirstFunction
. It takes no
arguments, and always returns ‘hello’.
f <- function() {
return('hello')
}
Look carefully how we assign a function to name f
using the
function
keyword followed by parenthesis that enclose the arguments
(there are none in this case). The body of the function is enclosed in
braces (also known as “curly brackets” or “squiggly brackets”).
Now that we have the function, we can inspect it, and use it.
#inspect
f
## function ()
## {
## return("hello")
## }
## <environment: 0x000002d244ea4848>
#use 2 times
f()
## [1] "hello"
f()
## [1] "hello"
f
is a very boring function. It takes no arguments and always
returns the same result. Let’s make it more interesting.
f <- function(name) {
x <- paste("hello", name)
return(x)
}
f('Jasmin')
## [1] "hello Jasmin"
Note the return
statement. This indicates that variable x
(which
is only known inside of the function) is returned to the caller of the
function. Simply typing x
would also suffice, and ending the
function with paste("hello", name)
would also do! So the below is
equivalent but shorter, at the expense of being less explicit.
f <- function(name) {
paste("hello", name)
}
f("Sviatoslav")
## [1] "hello Sviatoslav"
Here is a function that returns a sequence of letters. The length is
determined by argument n
.
frs <- function(n) {
s <- sample(letters, n, replace=TRUE)
r <- paste0(s, collapse="")
return(r)
}
Because the function uses randomization, I use set.seed
to always
get the same result (as we discussed
here.
set.seed(0)
frs(5)
## [1] "nydga"
frs(5)
## [1] "bwknr"
x <- frs(10)
x
## [1] "sauujvnjgi"
Now an example of a functions that manipulates numbers. This function squares the sum of two numbers.
sumsquare <- function(a, b) {
d <- a + b
dd <- d * d
return(dd)
}
We can now use the sumsquare function. Note that it is vectorized (each argument can be more than one number)
sumsquare(1,2)
## [1] 9
x <- 1:3
y <- 5
sumsquare(x,y)
## [1] 36 49 64
You can name the arguments when using a function; that often makes your intentions clearer.
sumsquare(a=1, b=2)
## [1] 9
But the names must match
sumsquare(a=1, d=2)
## Error in sumsquare(a = 1, d = 2): unused argument (d = 2)
And both arguments need to be present
sumsquare(1:5)
## Error in sumsquare(1:5): argument "b" is missing, with no default
Unless we redefine the function with default arguments that will be used if a value for the argument is not provided.
sumsquareD <- function(a=0, b=1) {
d <- a + b
dd <- d * d
return(dd)
}
sumsquareD(1:5, 2)
## [1] 9 16 25 36 49
As both arguments have a default value, we can call sumsquareD
without providing arguments
sumsquareD()
## [1] 1
Or with a single argument
sumsquareD(5)
## [1] 36
Above the value 5
was assigned to argument a
because the
argument was matched “by position”. If we only wanted to provide a value
for b
, we need to match “by name”.
sumsquareD(b=3)
## [1] 9
Just another example, a function to compute the number of unique values in a vector:
nunique <- function(x) {
length(unique(x))
}
data <- c("a", "b", "a", "c", "b")
nunique(data)
## [1] 3
Of course, these were toy examples, but if you understand these, you
should be able to write much longer and more useful functions. It can be
difficult to “debug” (find errors in) a function. It is often best to
first write the sequence of commands that you need outside a function,
and only when it all works, wrap that code inside of a function block
(function( ) { }
).
Ellipses (…)
Ellipses ...
are a special argument to many functions. It allows to
pass optional additional arguments and/or arguments that are passed on
to other functions. Consider these two functions (this is a bit
advanced).
f1 <- function(x, y=10) {
x * y
}
# f2 calls f1
f2 <- function(x, ...) {
f1(x, ...)
}
f2(5)
## [1] 50
f2(5, y=5)
## [1] 25
Even though f2
does not have an argument y
it can be provided
and it is passed on to f1
. This call returns an error :
f2(5, z=5)
## Error in f1(x, ...): unused argument (z = 5)
because f1
does not have an argument z
.
Functions overview
A list of much used functions that we discuss in this introduction to R:
c
, cbind
, rbind
length
, dim
, nrow
, ncol
sum
, mean
, prod
, sqrt
apply
, sapply
, tapply
, aggregate
rowSums
,
rowMeans
merge
, reshape
Also see this cheatsheet