This item has been officially peer reviewed. Print this Encyclopedia Page Print This Section in a New Window This item is currently being edited or your authorship application is still pending. View published version of content View references for this item

Statistical & Mapping Procedures

Authored By: M. V. Warwell, G. E. Rehfeldt, N. L. Crookston

Statistical Procedures

The Random Forests classification and regression tree package (Breiman 2001, Liaw and Wiener 2002, R Development Core Team 2004) was used to model species presence and absence. This tree-based method of regression uses a nonparametric approach and is resistant to overfitting, as multicolinearity and spatial correlation of residuals are not issues (Breiman 2001). Consequently, the algorithm was well suited for our analyses, which used variables among which intercorrelations could be pronounced.

An analysis data set was constructed for each species that initially included all predictor variables. Observations in this data set include all the observations with presence=yes weighted by a factor of 3. These observations made up 40 percent of the total for a species (Table: Climate Envelope Statistics). The remaining observations (60 percent) of presence=no were selected by a stratified random sample of locations from two strata constructed using threshold values of the 33 predictor variables that define a climatic envelope for the species. The first stratum is the space formed by an expanded climatic envelope, where the expansion is defined by increasing the range for the threshold values of each climatic variable. The climate envelopes were expanded by factors large enough to produce about 20 times the number of locations sampled. The second stratum included locations outside the expanded envelope.

Regression analysis used 10 independent forests of 100 independent regression trees. Random Forests builds each tree using a separate boot-strap sample of the analysis data resulting in about 36 percent of the observations being used to compute classification error.

A set of regressions were run. The first regression used 33 climate predictors. Random Forest produces indices of variable importance (mean decrease in accuracy). The 12 least important variables were dropped after the first run. The regression procedure was then rerun nine more times with the remaining predictors, whereby the least important 1 to 3 predictors were dropped at each run until classification errors began to increase. The Random Forests run with the fewest variables selected prior to detecting an increase in classification error was considered the most parsimonious bioclimatic model for the species.

Mapping Procedures

Rehfeldt’s climate surfaces (2006) and those updated to convey global warming were used to estimate the climate for nearly 5.9 million pixels (1 km2 resolution) representing the terrestrial portion of the study area. The average altitude was made available from Globe (1999). The estimated climate and projected climates of each pixel were run down the 100 regression trees in the final set and the number of trees that predict the species is present and the number predicting absence were tabulated. A single-tree prediction is termed a vote. The votes were grouped into 6 categories: < 50, 50-60, 60-70, 70-80, 80-90, 90-100 percent. We consider any pixel in the first group as not having suitable climate for the species and define pixels in the other 5 categories as the species’ realized climatic niche space. The fit of the mapped projections were assessed visually by comparing them with locations where the species were observed or Little’s range maps (1971, 1976) that are available as digitized files (USGS 2005), or both.


Click to view citations... Literature Cited

Encyclopedia ID: p3665



Home » Environmental Threats » Case Studies » Case Study: Modeling Species' Climatic Niche Space & Response to Global Warming » Methods » Statistical & Mapping Procedures


 
Skip to content. Skip to navigation
Text Size: Large | Normal | Small