24 Iterating Loops Over Data Structures
The previous chapter introduced the loop forms. This one shows how to walk over R’s main containers, vectors, lists, matrices, data frames, using both hand-written for loops and the apply family (apply, sapply, lapply, vapply, mapply). You’ll see why the apply functions are usually preferred, when a for loop still wins, and how to choose the right helper for the shape of the data you’re processing.
24.1 A common shape, iterate, transform, collect
Almost every loop in data work has the same shape:
- step through a container,
- compute something for each element,
- collect the results into a new structure.
R offers two ways to express this shape:
-
Explicit: a
forloop into a pre-allocated output. - Implicit: an apply function that bundles iteration + collection into one call.
We’ll see both, but in idiomatic R the apply family is the default.
24.2 Iterating over vectors
A for loop works element-by-element:
The vectorised one-liner does the same thing in one line:
When the operation is per-element and already vectorised, don’t loop. Use a loop only when each step depends on the previous one or has a side effect.
24.3 Iterating over lists
Lists hold heterogeneous elements, so a per-element transform is a real iteration job. Here’s the loop form:
The same with sapply() is a single line:
sapply() walks the list, applies mean() to each element, and simplifies the result to a named numeric vector. Names are preserved automatically.
24.4 The apply family at a glance
| Function | Input | Output | Use when |
|---|---|---|---|
lapply(x, f) |
vector / list | list (always) | safe default, even outputs vary in shape |
sapply(x, f) |
vector / list | vector / matrix if all results have the same shape; otherwise list | quick interactive use |
vapply(x, f, FUN.VALUE) |
vector / list | vector / matrix matching FUN.VALUE
|
production code, type-safe |
apply(m, MARGIN, f) |
matrix / data frame | vector / matrix | collapse rows or columns |
mapply(f, …) |
several vectors | vector / matrix | iterate over parallel arguments |
Map(f, …) |
several lists | list | like mapply but always a list |
Three of these, lapply, sapply, vapply, do the same job and differ only in what they return. Pick by how predictable the output is.
24.5 lapply, always returns a list
lapply() is the safe default. It returns a list of the same length as the input, regardless of what the function returns.
A list back means no surprises, lapply never tries to be clever about merging the outputs.
24.6 sapply, simplify when possible
sapply() calls lapply() and then tries to simplify:
- if every result is a single value → returns a vector
- if every result is the same length > 1 → returns a matrix
- otherwise → falls back to a list (just like
lapply)
Convenient interactively, risky in scripts: if one element happens to return a different shape, your code’s output type silently changes.
24.7 vapply, type-safe simplify
vapply() is sapply() plus a contract: you state up front what one result should look like, and R errors if any iteration disagrees. Use it when the type matters.
The FUN.VALUE = numeric(1) template says “every result must be a length-1 numeric.” If any iteration returned, say, an integer or a vector, vapply() would stop with a clear error instead of silently shifting type.
24.8 apply, for matrices and data frames
apply() walks along one dimension of a matrix and collapses the other. The MARGIN argument is 1 for rows, 2 for columns.
For the common cases, sums and means of rows or columns, the dedicated helpers rowSums(), colSums(), rowMeans(), colMeans() are faster and clearer. Reach for apply() when the function isn’t one of those.
apply() also works on data frames, but it coerces them to matrices first, so all columns must be of compatible type, otherwise everything becomes character. For data frames, prefer column-wise iteration with lapply()/sapply().
24.9 Iterating over a data frame’s columns
Because a data frame is a list of columns, lapply() and sapply() walk over its columns by default:
To filter to numeric columns first:
24.10 Iterating over rows of a data frame
Row-wise iteration in base R is unusual, most analyses are column-wise. When you need it, two patterns:
By index with a for loop:
With apply(df, 1, …), but remember the matrix-coercion gotcha: every column will be turned into character if any column is non-numeric.
For serious row-wise work in modern R, use dplyr::rowwise() or split the frame with split() then lapply().
24.11 mapply and Map, parallel iteration
mapply() is sapply() with multiple inputs walked in parallel, index 1 of every argument, then index 2, and so on.
For the same operation, the vectorised prices * qty is shorter, mapply() shines when the per-element function does something genuinely non-vectorisable.
Map() is mapply() without simplification, it always returns a list, the way lapply() does for one input.
24.12 Anonymous functions
You don’t need to name a function to pass it to an apply call. Two equivalent shorthands:
The \(x) … form is just sugar for function(x) … and reads cleanly inside an apply call.
24.13 Worked example, exam summary by subject
Five students, three subjects each. Compute per-subject mean and standard deviation, classify each subject as “tight” or “wide” based on the standard deviation, and label every individual score as Pass/Fail.
Three different iteration patterns in one example:
-
sapply()to collapse each column to a single number, -
ifelse()over the resulting vector to derive a label, -
lapply()to apply a per-column transform that returns a vector of the same length.
That toolbox handles the vast majority of descriptive-analytics work.
Summary
| Concept | Description |
|---|---|
| Iteration Pattern | |
| Iterate-Transform-Collect Pattern | The most common shape: walk a structure, transform, collect results |
| Iterating Over Vectors | for and seq_along on vectors; or use sapply for output |
| Iterating Over Lists | Lists support iteration over named or positional elements |
| Apply Family and Tabular Data | |
| lapply() | Always returns a list; the safe and predictable choice |
| sapply() | Simplifies output to vector or matrix when possible |
| vapply() | Type-safe simplify; you state the expected output shape |
| apply() for Matrices | Apply a function over rows (1) or columns (2) of a matrix |
| Iterating Over Data Frame Columns and Rows | lapply over df is column-wise; apply by row is rarely the right choice |