From Observation to Prediction: Modeling Species Distributions in the Mojave Desert

As I’ve written before, my original internship period, which focused on developing priority species lists for restoration sites in the Mojave and Sonoran deserts, has been extended into the fall in order to work on the next phase of this project. In this extension period, I have dramatically switched gears: from a mode of observation of how species in the low desert of southern Nevada and California operate to one of prediction. What is the scope of distribution for these species? What environmental variables impose limits on the breadth of their occurrence? And how can we make this information as accessible and useful to land managers as possible?

From the list of priority restoration species for the Mojave Desert, my Principal Investigators and I chose 50 species with which to take the next step: creating species distribution models (SDMs) to be incorporated into an adaptive management tool for BLM land managers. This tool would further expedite the restoration process by allowing BLM agents to create “seed menus” for recently disturbed sites. The idea is that land managers would be able to input coordinates for the site in need of restoration into this tool, and up will pop not just one, but a whole suite of plant species suited tailored for the restoration needs of that location, as well as viable seed source locations and information on ecosystem services (specifically for desert tortoise and pollinators) that those plants provide.

My main task in this endeavor has been to gather and vett species occurrence data to use as presence points in our models. My main sources for this information have been unpublished data sets from vegetation surveys taken across the Mojave and herbarium records from public databases such as the California Consortium of Herbaria and the Southwestern Environmental Information Network (SEINet). After a few weeks of gathering a robust number of points and giving them a thorough cleaning, we are ready to actually make some SDMs!

Ephedra nevadensis, one of the species for which we are producing distribution models for our Seed Menus project.

Our process involves three algorithms: 1) a General Additive Model (GAM), a crossbreed between General Linear Models and Additive Models, 2) Random Forest, which is basically a decision tree on steroids, and 3) MaxEnt, the famed maximum entropy algorithm. We first produce an equal number of pseudoabsences (randomly generated points from likely habitat) to go with our presence points. True absence data are rare in vegetation data, so generation of these pseudoabsences is necessary to provide a comparison to presence data. To reduce bias in the data, we thin the presence points to one per grid cell (size of grid cells) and weight ones that are highly isolated from any neighbors. A further bias test we do is cross-validation, in which different models are tested with 75 randomly selected points for a preliminary analysis of goodness-of-fit. After that, we go through each algorithm and formulate response curves of our points to different environmental variables – this helps us determine which variables best explain variance in the data. We then choose a few preliminary models of the best-fitting response curves, and take the mean of these models for each algorithm. After going through all three, we take our top model choices for each algorithm, and take what is known as an ensemble mean. Once this is done, we conduct a last evaluation of performance using the Boyce Index and mask any impervious surfaces in our layers. And voila! We have a robust distribution map for one of our species.

Models of sample data produced with GAM, Random Forest, and MaxEnt algorithms, as well as an ensemble model (the mean of the previous three outputs).

This one-at-a-time approach takes quite a while, but it’s worth it to get sound results for each species. There are alternative modeling methods that are faster (such as Canonical Correspondence Analysis), but the results they produce are not as robust in terms of individual species. Our method aims to produce the most accurate and useful information possible for land managers in the Mojave Desert. With more disturbance happening in the Desert Southwest than ever before, it’s imperative that we have the tools to make sound, on-the-fly decisions. I’m excited to see this tool be put to use in the coming years; to get a better understanding of its strengths and limitations.

CLM Internship Program Blog

Adventures of a CLM Intern

From Observation to Prediction: Modeling Species Distributions in the Mojave Desert

Leave a Reply