This box searches only this space. The box at the upper right searches the entire iPlant wiki.

Skip to end of metadata
Go to start of metadata

Goal: Create monthly species distribution models and graphical representations thereof, for North American butterfly species, based on data from citizen science efforts in e-Butterfly.


  1. The resource could be used to identify under-sampled areas of high predicted species diversity

  2. Citizen scientists would use the maps to guide their efforts to areas of high diversity or for targeted species

General approach:

  1. Retrieve historical climate data

  2. Get a list of all species in databases (eButterfly & iNaturalist)

  3. Get lat/long data for one species from databases

  4. Extract data for one month

  5. Perform quality check (minimum # observations)

  6. Run SDM (Species Distribution Model)

    1. Any chance that future climate data could be used for future range projections?

  7. Create graphic with standardized name for use on web resource

Repeat steps 4-7 for remaining months
Repeat steps 3-7 for remaining species

Computational / data bottlenecks:

  • Over 250,000 observations in eButterfly and 120,000 observations in iNaturalist

    • Queries for a species' data might take seconds (or more?)

    • Access to the live eButterfly database is not currently possible; would have to have copy somewhere and work from that (long-term solution would use live database, but not right now)

    • iNaturalist database can be queried via their API (

  • Data quality

    • Lat/long data in eButterfly are stored as text (not numeric), with varying formats (decimal degrees, degrees-minutes-seconds, some combination of the two). Will need to cull the data of everything that is not in decimal degree format (the majority of records), or ideally detect the format and convert to decimal degrees

    • There will be sampling biases in these data - observations will be clustered around urban areas (because that's where people are); collaborator R. Hutchinson (Oregon State University) might be able to help with this.

  • Will need to run at least one SDM per species every month, to keep it up to date with latest observations

    • Could do a check to determine if records were actually added / updated, to decide whether SDM would need to run again; but this would need to be efficient enough to take less resources than another SDM

  • No labels