Upper Midwest Environmental Sciences Center

**Selecting a distributional assumption for modelling
relative densities of benthic macroinvertebrates**

Gray, B. R., 2005, Selecting a distributional assumption for modelling relative densities of benthic macroinvertebrates: Ecological Modelling, v. 185, p. 1-12.

Abstract

The selection of a distributional assumption suitable for modelling macroinvertebrate
density data is typically challenging. Macroinvertebrate data often exhibit substantially
larger variances than expected under a standard count assumption, that of the
Poisson distribution. Such overdispersion may derive from multiple sources, including
heterogeneity of habitat (historically and spatially), differing life histories
for organisms collected within a single collection in space and time, and autocorrelation.
Taken to extreme, heterogeneity of habitat may be argued to explain the frequent
large proportions of zero observations in macroinvertebrate data. Sampling locations
may consist of habitats defined qualitatively as either suitable or unsuitable.
The former category may yield random or stochastic zeroes and the latter structural
zeroes. Heterogeneity among counts may be accommodated by treating the count mean
itself as a random variable, while extra zeroes may be accommodated using zero-modified
count assumptions, including zero-inflated and two-stage (or hurdle) approaches.
These and linear assumptions (following log- and square root-transformations)
were evaluated using 9 years of mayfly density data from a 52 km, ninthorder reach
of the Upper Mississippi River (*n* = 959). The data exhibited substantial
overdispersion relative to that expected under a Poisson assumption (i.e. variance:mean
ratio = 23>>1), and 43% of the sampling locations yielded zero mayflies.
Based on the Akaike Information Criterion (AIC), count models were improved most
by treating the count mean as a random variable (via a Poisson-gamma distributional
assumption) and secondarily by zero modification (i.e. improvements in AIC values
= 9184 units and 47.48 units, respectively). Zeroes were underestimated by the
Poisson, log-transform and square roottransform models, slightly by the standard
negative binomial model but not by the zero-modified models (61%, 24%, 32%, 7%,
and 0%, respectively). However, the zero-modified Poisson models underestimated
small counts (1≤*y*≤4) and overestimated intermediate counts
(7≤*y*≤23). Counts greater than zero were estimated well by zero-modified
negative binomial models, while counts greater than one were also estimated well
by the standard negative binomial model. Based on AIC and percent zero estimation
criteria, the two-stage and zero-inflated models performed similarly. The above
inferences were largely confirmed when the models were used to predict values
from a separate, evaluation data set (*n* = 110). An exception was that,
using the evaluation data set, the standard negative binomial model appeared superior
to its zero-modified counterparts using the AIC (but not percent zero criteria).
This and other evidence suggest that a negative binomial distributional assumption
should be routinely considered when modelling benthic macroinvertebrate data from
low flow environments. Whether negative binomial modelsshould themselves be routinely
examined for extra zeroes requires, from a statistical perspective, more investigation.
However, this question may best be answered by ecological arguments that may be
specific to the sampled species and locations.

Keywords

*Hexagenia*;
Hurdle models; LTRMP; Mayflies; Negative binomial distribution; Two-stage models;
Zero-inflated count models