Upper Mississippi River Restoration Program

Long Term Resource Monitoring

Who We Are

Mission and Goals

Background

Program Documents

USGS Contacts

A Team Corner

State Field Stations

Lake City, Minnesota

La Crosse, Wisconsin

Bellevue, Iowa

Great Rivers, Illinois

Open River, Missouri

Havana, Illinois

Field Station Directory

Components

Fish

Aquatic Vegetation

Water Quality

Macroinvertebrates

Land Cover

Bathymetry

GIS Data

Other Research

Data and Tools

Data Visualization Tools

Sampling Design and Statistics

Reports

Reports and Publications

Strategic Plan 2010-2014

Status and Trends Report 2008

Fact Sheets

UMESC

Search

LTRM Statistics

Estimating Variance Components using LTRM Survey Data

Introduction
Fixed vs. Random Effects
Unbalanced Designs
Parametric Methods for Estimating Variance Components
Spatial and Temporal Correlation
Nonparametric Methods
Corrections for Nonproportional Sampling
Will Variance Components Inferences Complement LTRMP Status and Trend Information?
Multivariate Responses
References

Introduction

Variance components analysis is often used to assess the proportions of variance attributable to random components within an experimental design setting. Such analyses are used to estimate the relative importances of variance components and may also be useful for power analyses. Variance components analyses using LTRM data may need to address the presence of fixed effects, strata, unbalanced designs, spatial and temporal correlation, skewed data, and nonproportional sampling. These issues are addressed below.

Fixed vs. Random Effects

Efforts to extend variance components analysis to data sets containing variances associated with both random and fixed effects must, at minimum, ensure that the interpretation of variance estimates from these two types of effects are not confused. Fixed effects refer to specific and selected effects, while random effects represent effects that are arguably interchangeable with a larger population of effects. Consequently, the variance estimates for fixed effects may more properly be termed finite variances.

For the LTRM, fixed effects include effects associated with field station, possibly spatial strata (these could also be viewed purely as restrictions on randomization), season, and their interactions. By contrast, components associated with annual sampling events are often treated as random. This is based on the assumptions that we have no a priori interest in particular years and that, apart from some specific model, sampled years are interchangeable with some larger set of years. Components associated with year include year and any interaction with year (e.g., year * season). Less commonly considered sources of variation may also be treated as random (e.g., backwater lakes within a given backwater stratum, sampling day within a multiday sampling period, observer effects) but are not considered further in this document.

Unbalanced Designs

Variance component estimates from unbalanced designs are generally approximate. As the design underlying virtually any analysis of LTRM data will be unbalanced, all LTRM variance component estimates should be viewed as approximate. Exceptions may include when inferences are confined to a single stratum within reaches, and, for the macroinvertebrate component, year effects within single reaches (with strata effects ignored).

Parametric Methods for Estimating Variance Components

Likelihood-based methods of estimating variance components are well developed but, as popularly used, are appropriate only for estimating variance components from normally distributed data. For the LTRM, only water quality data where missing (below detection) data are trivial in proportion or have been imputed will typically appear essentially normal. Data from other sources cannot be made normal without compromising the information contained in those data.

Parametric methods of estimating variance components using discrete data have traditionally been viewed as challenging for all but the simplest models (Searle et al. 1992). Recently, however, some advances have been made for estimating variance components for binary and binomial data (Goldstein et al. 2002; Browne et al. 2005; Snijders and Bosker 2012). However, implementation or interpretation using these methods will often be challenging, and particularly so in the presence of fixed effects, nonproportional sampling, stratification or spatial or temporal correlation.

Spatial and Temporal Correlation

Spatial and temporal correlation may potentially occur at any spatial or temporal scale. For example, spatial correlation may occur among stratum-specific and/or field-station-specific means while temporal correlation may occur among repeated annual observations on aquatic vegetation from the same site (Pool 8, years 2001-2004 only) and among annual stratum- and field-station specific annual means. Not accounting for spatial and temporal correlation, if present, will potentially lead to misleading variance component estimates. Spatial correlation will be ignorable if a wholly design-based analysis is proposed.

Nonparametric Methods

Where data are nonnormal and unbalanced, variance components analysis typically proceeds by equating observed mean squares terms with expected mean squares terms. This method presumes data and means are independent (i.e., not spatially or temporally correlated). Expected mean squares values may differ depending on whether they derive from fixed or random effects (see above).

Corrections for Nonproportional Sampling

If inference to the sampled population is desired, variance component estimation using data derived from designs that include nonproportional sampling, such as are used in the LTRM designs, must weight observations by sampling weights (see Estimating Means and Standard Errors from LTRM Survey Data; Carle 2009).

Will Variance Components Inferences Complement LTRM Status and Trend Information?

Maybe. A major impediment to treating variances components inferences as complementary to status and trend information is that all three sets of estimates may be derived using different methods and under different assumptions. Comparability among methods for both variances components and status and trend estimation may be poorly known. Another is that, for nonlinear data, variance at the sampling and aggregated (e.g., year) scales are typically presumed to vary on different distributional scales (e.g., for counts, on count and log scales, respectively). The relationship between variance components at these different scales may be complex. Inferences for data that are ostensibly normally distributed may be expected to be qualitatively complementary.

Note that Kincaid et al. (2004) published estimated variance proportions for proportion and count monitoring data. The focus of that study, however, was strictly monitoring—rather than the monitoring and research foci that typically characterize LTRM analyses. In addition, the data used in that study did not derive from a stratified design and did not include fixed effects.

Multivariate Responses

The above comments were written with univariate outcomes in mind. Multivariate responses may be addressable using methods described by Borcard et al. (2004) or in references therein.

References

Borcard, D., P. Legendre, C. Avois-Jacquet, and H. Tuomisto. 2004. Dissecting the spatial structure of ecological data at multiple scales. Ecology 85:1826-1832.

Browne, W. J., S. V. Subramanian, K. Jones, and H. Goldstein, H. 2005. Variance partitioning in multilevel logistic models that exhibit over-dispersion. Journal of the Royal Statistics Society, Series A 168:599-614.

Carle, A. C. 2009. Fitting multilevel models in complex survey data with design weights: Recommendations. BMC Medical Research Methodology 9:49. doi:10.1186/1471-2288-9-49. The Snijders and Bosker reference should read Snijders, T. A. B., and R. J. Bosker. 2012. Multilevel analysis, 2nd ed. Sage, London. 354 pp.

Goldstein G., W. Browne, and J. Rasbash. 2002. Partitioning variation in multilevel models. Understanding Statistics 1:223-232.

Kincaid, T. M., D. P. Larsen, and N. S. Urquhart. 2004. The structure of variation and its influence on the estimation of status: indicators of condition on lakes in the Northeast, U.S.A. Environmental Monitoring and Assessment 98:1-21.

Lohr, L. 1999. Sampling: Design and analysis. Duxbury Press Publishing Company, Pacific Grove, California.

Searle, S. R., G. S. Casella, and C. E. McCulloch. 1992. Binary and discrete data. Pages 367-377 in S. R., Searle, G. S. Casella, and C. E. McCulloch, editors. Variance components. John Wiley & Sons, New York.

Snijders, T. A. B., and R. J. Bosker. 1999. Multilevel analysis. Sage, London. 266 pp.

Contact: Further information about variance component analysis using LTRMP data may be obtained from Brian Gray, LTRMP statistician, Upper Midwest Environmental Sciences Center, La Crosse, Wisconsin, at brgray@usgs.gov.

Page Last Modified: August 15, 2016