# Areal interpolation

Vincent Thorne · Posted 18 Jun 2021 · Last edited 31 Jan 2022

Areal interpolation lets us “distribute” variables between spatial features overlapping but with different borders, which we call incongruent. Importantly, we assume that the variable is homogeneously distributed across a given spatial feature.

In R, the excellent `areal`

package lets us do areal interpolation and control the parameters described below. The (also great) `sf`

package does have an `st_interpolate_aw`

method, but it lacks some features `areal`

implements.

The “distribution” of values across spatial units takes two parameters (i.e., takes place in two dimensions), described below.

## Extensive vs Intensive

**Extensive** distribution **spreads** the value of a variable across the overlapping features. This used for **count** variables (population, number of trees, etc).

**Intensive** distribution produces a **spatially weighted average** of the variable across the overlapping features. This is used for rates, averages and other **already transformed** variables (asthma rate, median income, etc)

## For extensive only: sum vs total

**Sum** assumes that 100% of the source data should be distributed to the target features. The total area of the source is thus $A_j=\sum A_{ij}$, the sum of all the overlapping (intersected) areas.

In practice, that method is used when features do not perfectly overlap, but one still wishes to distribute the entirety of a feature’s value to the overlapping features (because of an imperfectly matching datasets, for example).

**Total** assumes that, if a source feature is not 100% overlapped by target features, then only the overlapped proportion should be distributed. For example, “if a source feature is only covered by 99.88% of the target features, only 99.88% of the source target’s data should be allocated to target features in the interpolation”. $A_j$ is thus the original area of the source feature, not the sum of the intersected areas as in sum.

In practice, that method is used when one is certain the features overlap perfectly, or when only a proportion of the variable (relative to the overlapped area) needs to be allocated, because that’s what makes sense in that particular context.

This is the “distribution dimension” where `areal`

’s `aw_interpolate()`

differs from `sf`

’s `st_interpolate_aw()`

: the former offers both “sum” and “total” options, while the latter only supports the “total” distribution. `areal`

has the advantage of making the distinction explicit, but will yield the same results as `sf`

if the “total” option is selected.

## More details

The `areal`

homepage has detailed explanations and visual descriptions of the steps involved in areal interpolation. See also the reference on `st_interpolate_aw()`

for more details on `sf`

’s implementation.