flowchart LR
B["Build <br> array(data, dim, dimnames)"] --> A["An Array"]
A --> I["Index <br> x[i, j, k] / x[i, , ] / negative / logical"]
A --> AX["Collapse <br> apply(x, MARGIN, FUN)"]
A --> R["Reshape <br> dim()<- / aperm()"]
classDef default fill:#2e4057,color:#ffffff,stroke:#ff9933,stroke-width:3px,rx:10px,ry:10px;
15 Arrays in R
This chapter introduces arrays, R’s general n-dimensional container for data that shares a single atomic type. A matrix is a 2-D array; an array can have any number of dimensions you like. You will learn how to build one with array(), how to name each axis with dimnames, how to slice it with x[i, j, k, ...], and how to collapse any axis down to a summary with apply(). You will see a three-way contingency-table use case that is impossible to express cleanly as a matrix, and you will learn the two safe ways to iterate over slices. By the end of this chapter you will know when to reach for an array over a matrix or a data frame and how to write code that generalises across any number of dimensions.
15.1 From Matrix to Array
Every cell of an array shares the same atomic type (double, integer, logical, character). The array’s shape is described by a dim attribute, which for a matrix has length 2 and for an array can have length 3, 4, or more.
| Dimensions | Shape Name | Built With |
|---|---|---|
| 1 | Vector |
c() or numeric(n)
|
| 2 | Matrix |
matrix() or a 2-length dim
|
| 3 | Cube / stack |
array() with a 3-length dim
|
| 4+ | Higher-dimensional array |
array() with a longer dim
|
A 2-D matrix has two axes: rows and columns. A 3-D array adds a third, typically “layer” or “time”. Higher-dimensional arrays add more axes still. Any operation you write should be expressed in terms of “which axes am I collapsing” and “which axes am I keeping”, rather than pictorial descriptions that only make sense in 2-D.
15.2 Building an Array
array() Takes Data, a Dimension Vector, and Optional Names
dimnames is a list with one element per axis, each either NULL or a character vector of the axis length. Named axes make the code self-documenting.
This is a classic three-way data cube: quarter × region × year.
If prod(dim) does not match the length of data, R recycles the data to fill the array, silently. Always double-check that the product of dimensions equals the length of the data you are passing in.
15.3 Indexing an Array
Indexing an array uses the same [ ] notation as a matrix, but with one index per dimension, separated by commas. Leaving an axis blank means “all of it”.
Negative indices exclude positions from the corresponding axis; logical vectors filter along an axis.
drop = FALSE to Keep the Shape
Like matrices, arrays collapse singleton axes by default. Use drop = FALSE to preserve the array class when you select a single layer.
15.4 Modifying an Array
15.5 Collapsing Dimensions with apply()
apply(x, MARGIN, FUN) applies FUN across the chosen margin(s) and collapses every other axis. MARGIN = 1 keeps axis 1 (rows in a matrix), MARGIN = 2 keeps axis 2, and so on. You can pass a vector of margins to keep more than one.
apply() Is How Arrays Earn Their Keep
A 3-way array is just numbers until you start collapsing it. Almost every useful calculation on an array is an apply() call with the right MARGIN argument: per-quarter totals, per-region means, per-year variability. Writing those as apply(a, 1, ...), apply(a, c(1, 2), ...), etc. is the array idiom.
15.6 Reshaping and Permuting Axes
dim()<- changes the dimension vector in place; the total number of cells must stay the same. aperm() permutes the order of axes, for example, swapping rows and columns in a matrix or moving the year axis to the front of a 3-way array.
15.7 A 3-Way Contingency Table
Contingency tables count how often each combination of categorical variables occurs. A 2-way table (like “treatment x outcome”) is a matrix; a 3-way table (like “treatment x outcome x site”) is a 3-D array. R’s table() function happily returns one.
Most analysts today store the same information in a long data frame (one row per observation with columns gender, age_band, outcome, and a count) because the tidyverse toolchain is built around that shape. Use an array when you genuinely need fast n-D numeric work or when a statistical function explicitly returns one (many hypothesis tests and image-processing functions do).
15.8 Iterating Over Array Slices
15.9 A Worked Example: Year-over-Year Growth
Summary
| Concept | Description |
|---|---|
| Definition and Building | |
| Array Definition | An array generalises a matrix to any number of dimensions |
| From Matrix to Array | A matrix is a 2-D array; an array adds more dimensions of the same type |
| array() Constructor | array(data, dim = c(...), dimnames = ...) builds an array |
| Naming the Axes | Provide dimnames so you can refer to slices by name |
| Indexing | |
| Indexing per Axis | Use one comma-separated index per axis: array[i, j, k] |
| Negative Indexing | Drop slices using negative indices on any axis |
| Logical Indexing | Boolean masks work on each axis independently |
| Common Mistakes | Mismatched data length and dim, or forgetting drop = FALSE |