2014.06.24 BIEN3 Range Modeling

BIEN3 Range Modeling

June 24, 2014

Participants

Nathan Casler, Naia Moureta Holme, Cory Merow, Brian McGill, Martha Narro

Agenda

  • SWD creation
  • Species for the test run (1pt,2pt,5pt, mix)
  • What can be modeled for ESA in August

Notes

SWD creation

  • Dismo SWD removes duplicate points and creates reference to the grid squares.
    • A unique ID index may also be helping.
  • The R package Cory wrote didn’t really seem to run faster.
  • SWD is a bit faster than dismo: 20X vs 21X faster.
  • Could go either way. Depends on how fast it is to make our own SWD files. Doing that would mean moving smaller files around. By producing own SWD could also use the same background sampling for each species.
  • What is most important for the first pass?
  • Due to time constraints, Cory’ll need to make some recommendations without exhaustively exploring them.
  • Decision: Use SWD and call maxent from the command line.
  • Nathan has almost everything in place to run.
  • Just dealing with a issue with Java with John Fonner.
  • Cory will send Nathan latest code with comments.
  • Cory is comfortable as long as knows a re-run will be done in future.
    • First a test run will be done, so all is fine. This isn't the final run.

Which species should be run?

  • Bounding box computation on species having only 1, 2, 3 occurrence points don’t need to be run on TACC.
  • Only maxent models for species having 4 and >=5 occurrence points will be run on TACC.
  • Need a clean version of files without duplicates.
  • Would it be difficult to overlay obs pts on landscape?
  • Extracting the number of points per cell isn’t difficult. Depends more on how many species have over a few hundred points.
  • It won’t be hard.
  • Still need to compute the convex hull on all points if a species drops from, say 12 points to 1-3 points after duplicates are removed.
  • Cells size is 100 x 100 km. Or are they 10 x 10 km? (the latter is for forest modeling only).
  • Nathan will write a script to determine which species go into which model based on the number of points after removing duplicates within a cell.
  • Create SWD without duplicates for maxent modeling.

What about using smaller sets of predictors (instead of the full WorldClim set)?

  • Impacts run time (fewer variables, shorter run time).
  • Need more background points to get the model to converge, but could reduce the number of predictors.
  • Temperature, precipitation, minimum temperature, rainfall during warmest quarter (growing season), seasonality (of temp and precip) are really what’s useful (most biologically relevant) from WorldClim
  • Use those four variables and the top four spatial eigen vectors.
  • Cory will check for over-fitting.
  • When comparing climate vs space, need equal number of climate and landscape predictors. This is not necessary for general good prediction.

Outliers

  • Naia: Do outliers need to be cut out? Yes.
  • Was done in John’s code.
  • Points that were more than 1000KM away.
  • Naia will resend her email message about this.
  • For first test run, just change number of background points? Yes.

What can be modeled for ESA in August

  • Is the "year" data stored in VegBIEN for resampled plots?
    • Brian Mc will check.
    • Would removing duplicates take care of the date problem?
  • The group could present the results of a test run instead of the full run if all the data aren't ready (QC).
  • Planning to do the North American trees which are the high quality data.
  • If there is a list of things to check, Brian could send it out. Naia and others could help.

Wrap up

  • Next call in 2 weeks, on July 8th.
  • Nathan is free the beginning of next week.
  • Set maximum background points to 30K.

Decisions

  • Use SWD and call maxent from the command line.
    • Reasons: Though SWD is only a bit faster than dismo (20X vs 21X faster), it isn't compute intensive to create our own SWD files. Doing that would mean moving smaller files around and by producing our own SWD, could also use the same background sampling for each species.
  • Only maxent models for species having 4 and >=5 occurrence points will be run on TACC. Models of species with 1-3 points will be run on local resources.
    • Determine which species go into which model based on the number of points after removing duplicates within a cell.
    • Regarding removal of duplicates, compute the convex hull on all points if a species drops from, say 12 points to 1-3 points after duplicates are removed.
    • Create SWD without duplicates for maxent modeling.
  • WorldClim predictors to use: Temperature, precipitation, minimum temperature, rainfall during warmest quarter (growing season), seasonality (of temp and precip) 
  • Set the maximum number of background points to 30,000 for maxent.

Next Steps

Nathan

  • Write a script to determine which species go into which model based on the number of points after removing duplicates within a cell.
  • Do a test run!

Cory 

  • Share the latest code with Nathan.
  • Check for over-fitting of models

Naia 

  • Resend the email message describing how to handle outliers. (DONE)

Brian Mc

  • Check to see if data on "year" is stored in the normalized VegBIEN database (BIEN3) for resampled plots.
  • Send out a list of data to be checked in VegBIEN so others can help check it.