15 Arrays in R

What This Chapter Covers

This chapter introduces arrays, R’s general n-dimensional container for data that shares a single atomic type. A matrix is a 2-D array; an array can have any number of dimensions you like. You will learn how to build one with array(), how to name each axis with dimnames, how to slice it with x[i, j, k, ...], and how to collapse any axis down to a summary with apply(). You will see a three-way contingency-table use case that is impossible to express cleanly as a matrix, and you will learn the two safe ways to iterate over slices. By the end of this chapter you will know when to reach for an array over a matrix or a data frame and how to write code that generalises across any number of dimensions.

flowchart LR
    B["Build <br> array(data, dim, dimnames)"] --> A["An Array"]
    A --> I["Index <br> x[i, j, k] / x[i, , ] / negative / logical"]
    A --> AX["Collapse <br> apply(x, MARGIN, FUN)"]
    A --> R["Reshape <br> dim()<- / aperm()"]
    classDef default fill:#2e4057,color:#ffffff,stroke:#ff9933,stroke-width:3px,rx:10px,ry:10px;

15.1 From Matrix to Array

Core Concept: Any Number of Dimensions, One Type

Every cell of an array shares the same atomic type (double, integer, logical, character). The array’s shape is described by a dim attribute, which for a matrix has length 2 and for an array can have length 3, 4, or more.

Dimensions	Shape Name	Built With
1	Vector	`c()` or `numeric(n)`
2	Matrix	`matrix()` or a 2-length `dim`
3	Cube / stack	`array()` with a 3-length `dim`
4+	Higher-dimensional array	`array()` with a longer `dim`

Expert Insight: Think in Axes

A 2-D matrix has two axes: rows and columns. A 3-D array adds a third, typically “layer” or “time”. Higher-dimensional arrays add more axes still. Any operation you write should be expressed in terms of “which axes am I collapsing” and “which axes am I keeping”, rather than pictorial descriptions that only make sense in 2-D.

15.2 Building an Array

array() Takes Data, a Dimension Vector, and Optional Names

Try here

Naming the Axes

dimnames is a list with one element per axis, each either NULL or a character vector of the axis length. Named axes make the code self-documenting.

Try here

This is a classic three-way data cube: quarter × region × year.

Common Mistake: Mismatched Data Length

If prod(dim) does not match the length of data, R recycles the data to fill the array, silently. Always double-check that the product of dimensions equals the length of the data you are passing in.

Try here

15.3 Indexing an Array

The Bracket Pattern: One Index Per Axis

Indexing an array uses the same [ ] notation as a matrix, but with one index per dimension, separated by commas. Leaving an axis blank means “all of it”.

Try here

Negative and Logical Indexing

Negative indices exclude positions from the corresponding axis; logical vectors filter along an axis.

Try here

Common Mistake: drop = FALSE to Keep the Shape

Like matrices, arrays collapse singleton axes by default. Use drop = FALSE to preserve the array class when you select a single layer.

Try here

15.4 Modifying an Array

Assignment Works the Same Way

Try here

15.5 Collapsing Dimensions with `apply()`

Core Concept: Pick the Axes You Keep

apply(x, MARGIN, FUN) applies FUN across the chosen margin(s) and collapses every other axis. MARGIN = 1 keeps axis 1 (rows in a matrix), MARGIN = 2 keeps axis 2, and so on. You can pass a vector of margins to keep more than one.

Try here

Expert Insight: apply() Is How Arrays Earn Their Keep

A 3-way array is just numbers until you start collapsing it. Almost every useful calculation on an array is an apply() call with the right MARGIN argument: per-quarter totals, per-region means, per-year variability. Writing those as apply(a, 1, ...), apply(a, c(1, 2), ...), etc. is the array idiom.

15.6 Reshaping and Permuting Axes

dim()<- Reshapes, aperm() Rearranges

dim()<- changes the dimension vector in place; the total number of cells must stay the same. aperm() permutes the order of axes, for example, swapping rows and columns in a matrix or moving the year axis to the front of a 3-way array.

Try here

15.7 A 3-Way Contingency Table

When a Matrix Is Not Enough

Contingency tables count how often each combination of categorical variables occurs. A 2-way table (like “treatment x outcome”) is a matrix; a 3-way table (like “treatment x outcome x site”) is a 3-D array. R’s table() function happily returns one.

Try here

Best Practice: Reach for a Data Frame First

Most analysts today store the same information in a long data frame (one row per observation with columns gender, age_band, outcome, and a count) because the tidyverse toolchain is built around that shape. Use an array when you genuinely need fast n-D numeric work or when a statistical function explicitly returns one (many hypothesis tests and image-processing functions do).

15.8 Iterating Over Array Slices

Looping Safely Across an Axis

When you need to step through layers of an array, for example, running the same analysis on every year of sales data, combine apply() for the “collapse” case with a plain for loop or lapply() for the “keep each slice” case.

Try here

15.9 A Worked Example: Year-over-Year Growth

Putting the Array Tools Together

Try here

Every technique from the chapter shows up: construction with named dimnames, slab indexing with character names, element-wise arithmetic between two slabs, colSums() on a slab, and apply() with MARGIN = c(1, 2) to collapse the year axis.

Summary

Concept	Description
Definition and Building
Array Definition	An array generalises a matrix to any number of dimensions
From Matrix to Array	A matrix is a 2-D array; an array adds more dimensions of the same type
array() Constructor	array(data, dim = c(...), dimnames = ...) builds an array
Naming the Axes	Provide dimnames so you can refer to slices by name
Indexing
Indexing per Axis	Use one comma-separated index per axis: array[i, j, k]
Negative Indexing	Drop slices using negative indices on any axis
Logical Indexing	Boolean masks work on each axis independently
Common Mistakes	Mismatched data length and dim, or forgetting drop = FALSE