2014.04.22 BIEN3 Range Modeling

Range Modeling

April 22, 2014

Participants

Peder Bocher, Cory Merow, Nathan Casler, Naia Morueta Holme, Brian Enquist, Martha Narro

Agenda

  • Progress getting rolling at TACC (Nathan, Peder)
  • Modeling (All)

Action Items (NOT YET DONE)

  • Brian McGill: apply for an XSEDE account for the production runs (see April 15 email "Apply for XSEDE account")
  • Brian Enquist: Summarize and circulate the plan for the Western Forest modeling so the larger group can provide input.
  • Both Brians: Send Nicole and Martha examples of tools and workflows that would enable the community to make use of the range models (maps). This was requested by the committee that reviewed the Extended Collaborative Support request.

Notes

Getting started on TACC systems

  • Nathan is in TACC systems and going through tutorials.
  • He's parsing out the R scripts.
    • Should ask John Fonner if he has any questions about how best to do so.
  • Nathan will put together documentation about how to run at TACC for Peder and Naia.
  • John Fonner will install parametric launcher on Maveric.
    • Martha will keep on top of parametric launcher installment.
  • Next week Nathan and Peder will have a call to get Peder up to speed on what Nathan is doing to run at TACC
    • Will use Google hangouts so can share desktops.

Modeling

  • Cory has been working on a couple of things to speed up run times.
  • Dismo (R package) writes out presence and absence files and does the computations on it. Maxent on the command line runs twice as fast on those files generated by dismo. Cory can't figure out what is different about the files dismo writes out.
    • Cory wrote to the dismo authors. Jane didn't know of anything regarding what would be different about those dismo files.
    • Jane gave Cory another (not yet released) version of maxent that works with GRD file formats which likely will be faster.
  • Sampling of background is fast, so existing code is fine. Surprised Cory.
  • Wants to only read in environmental and background files once per batch.
  • Larger background sample provides a gain in the quality of the model. Still playing with it.
  • Starting with a couple of species with large and small sample sizes. After running those, he'll run the 100 western forest species. (running locally)
  • Figuring out optimal background size (number of samples) and optimal regularization multiplier.
  • Cory wants to be able to run on local resources since his work will usually be on a small set of species, so he wants the scripts to be efficient.
  • Efficiency is important both for running locally and for running on TACC systems since compute cycles are expensive on those systems and they are in high demand.
  • Sample size variation
    • A lot of species have very few observations.
    • Approximately 40% of species have sample size <=5.
    • Need to think about how the methods apply to species with low sample sizes.
    • Spatial autocorrelation will also need to be looked at.
    • The issue is how to deal with large differences in numbers of sample size (observations), especially for forecasting.
    • Cory will send a document around with his thoughts on how to address (I think) size variation across species.
    • Nathan will send information out about running at TACC.

Next Steps

  • Nathan: Put together and send out documentation about how to run at TACC.
  • Nathan and Peder: Have a call next week to get Peder up to speed on running at TACC. ?STATUS: Scheduling with Peder.
  • Cory: Send a document around with thoughts on how to address (I think) size variation across species.
    Next week we'll devote some time to discussion of sample size variation across species. STATUS: too few participants on 4-29 call.
  • Martha: Check on parametric launcher installment on Maverick. No word yet.
  • All: Next call in one week, April 29th.