Goal: Create monthly species distribution models and graphical representations thereof, for North American butterfly species, based on data from citizen science efforts in e-Butterfly.
The resource could be used to identify under-sampled areas of high predicted species diversity
Citizen scientists would use the maps to guide their efforts to areas of high diversity or for targeted species
Retrieve historical climate data http://www.worldclim.org
Get a list of all species in databases (eButterfly & iNaturalist)
Get lat/long data for one species from databases
Extract data for one month
Perform quality check (minimum # observations)
Run SDM (Species Distribution Model)
Any chance that future climate data could be used for future range projections?
Create graphic with standardized name for use on eButterfly.org web resource
Repeat steps 4-7 for remaining months
Repeat steps 3-7 for remaining species
Computational / data bottlenecks:
Over 250,000 observations in eButterfly and 120,000 observations in iNaturalist
Queries for a species' data might take seconds (or more?)
Access to the live eButterfly database is not currently possible; would have to have copy somewhere and work from that (long-term solution would use live database, but not right now)
iNaturalist database can be queried via their API (https://api.inaturalist.org/v1/)
Lat/long data in eButterfly are stored as text (not numeric), with varying formats (decimal degrees, degrees-minutes-seconds, some combination of the two). Will need to cull the data of everything that is not in decimal degree format (the majority of records), or ideally detect the format and convert to decimal degrees
There will be sampling biases in these data - observations will be clustered around urban areas (because that's where people are); collaborator R. Hutchinson (Oregon State University) might be able to help with this.
Will need to run at least one SDM per species every month, to keep it up to date with latest observations
Could do a check to determine if records were actually added / updated, to decide whether SDM would need to run again; but this would need to be efficient enough to take less resources than another SDM