MT_20091118

iPG2P Modeling Tools Working Group

November 18th, 2009 - 4pm EST

Attendees: Karla Gendler, Adam Kubach, Ann Stapleton, Chris Myers, Matt Vaughn, Jeff White, Liya Wang, Melanie Correll, Sanjoy Das, Steve Welch

Action Items:

  • All: Provide list of modeling tools currently used
  • Talk with other working groups to see how modeling connects with use cases from steering committee.
  • Adam and Liya: look at SBML, BioModels.net, and OpenMI
  • Gendler: send out email with Confluence introduction, how organized, where to find meeting notes, etc
  • Gendler: send out poll to establish a regular meeting time

Notes/Agenda:

  1. Tools: what commonalities should we focus on, when should we let 1000 flowers bloom, and how do we connect the two?
    1. Myers asked what are the sticking points in the workflows that we execute now and are there ways to join forces to help solve some of these problems.  White suggested that perhaps we don't want to build a general solution but instead should be interested in standardizing interfaces.  Das pointed out that everyone writes in their own language so will there be a way to integrate the models.  Welch commented that the goal would be to work towards a strategy that lets people share, using SBML as an example.  Myers said that the issues that arise in modeling are different than creating a centralized workflow like the other working groups are doing.  Vaughn commented that people tend to use their own code but they could be limited by access to data.  Consider iPlant as a big tent; while others are taking a common/centralized approach, this group should think about how to democratize modeling.  Myers said that there will be a need for enhanced computational needs and or/developement efforts and how do we prioritize development on certain tools?  The discussion was tabled with an action item being that everyone should provide a list of modeling tools that they currently use.
  2. Formats, standards, and interchange: what are they good for?
    1. White stated that in crop modeling, they are all over the place in formats and standards which makes it extremely hard to compare models.  He proposed moving to a much more standard interface.  Myers pointed out that SBML can be used to exchange models and work in the same format.  Vaughn asked if SBML makes assumptions about the way models are executed.  Myers answered that you can take SBML and do Monte Carlo or deterministic modelining; there is not specification as to how executed.  Welch added that SBML version 3 will expand its capabilities and that this group might want to contact SBML person to help draft standards.  Myers said that the bioModels.net group in Europe presents a good opportunity to partner with.  Correll added that the bioModels.net would be a good framework to look at, with their repository for linking models and it would be at least something to consider in the iPlant modeling group.
  3. What types of modeling problems will best make use of unique iPlant/TACC resources?
    1. In working with NAM populations, White said that just simulating 500 genotypes can add up in a hurry.  Myers said it would be good to have an infrastructure to manage related but not idential big runs  White asked if it was possible to put a wrapper around an executable.  How do you deal with all of the data this generated?  Welch asked if there are tools that help and are there issues on both the input and output?  Vaughn said that that is a cyberinfrastructure problem, not nescessarily a data integration issue.  Welch said we can help the data integration group by identifying needs. 
  4. Data integration drives us nuts: how can we convey useful requests and specifications to the Data Integration group?
  5. Personnel: what tasks can we hand to iPlant developers now, and when we find a group postdoc, what will he/she work on?
  6. Use Cases
    1. Welch said that the use cases should be a litmus test; if we can be meeting the needs identified than we are making progress and with these use cases, larger groups can be involved.  It would be good to begin cataloguing tools now.  Myers sent a request to the group to start listing tools.  White said that he is reluctant to go outside of the group until the group is more focused.  However, with the work on photosynthesis/phenology, both Visual Analytics and Statistical Inference are looking at the NAM lines and maybe the question is how would one model phenology in maize and perhaps to start mapping out that process.  What data do we have to work with?  Myers suggested RNAseq data.  He aslo asked where are there connections with other iPlant activities; with photosynthesis there is work with Tom Brutnell.  White would like to look at wheat phenology data.  Myers suggested that the group also work top down, to see how modeling connects to other parts of iPlant.  Correll asked what are the other groups doing and was pointed to Confluence.

Expanded agenda:

Modeling Tools group,
Regarding this afternoon's working group teleconference, I've elaborated
a bit on the draft agenda that was circulated previously (included
below).  Whether or not such an elaboration is useful remains to be seen.
Talk to you later,
Chris

  1. Tools: what commonalities should we focus on, when should we let
    1000 flowers bloom, and how do we connect the two?
    +Some thoughts on tools from the Steve & Steve Trip Report:+Selecting and/or developing modeling tool sets.  Tools are needed
    for parameter estimation, sensitivity analysis, verification, and
    model comparison.  Because modeling is such a diversified activity, it
    may be useful for the members of the work group to identify items from
    their own workflows and seek commonality.
    A few general points on each are:
    1. Parameter estimation.  This really equates to the need to optimize
      one or more goodness-of-fit functions [e.g., least squares, maximum
      likelihood, maximum entropy (possibly), or hand-crafted objectives].
      So the real need is for optimizers that can be readily used in a
      generalized fashion.  This need is shared by Statistical Inference.
      As these problems are numerically intensive, parallel approaches
      should be investigated.  Also, both nondeterministic (e.g. particle
      swarm optimization) and deterministic (e.g. DIRECT) algorithms should
      be considered.
    2. Sensitivity analysis.  In principle, three types of sensitivities
      can be investigated, namely to (i) initial conditions, (ii) parameter
      values, and (iii) to input values.  Of these, sensitivity to
      parameters is probably most important in the near term.  Tools are
      needed that can explore model responses near an optimally fitting set
      of parameters.  These responses include both the values of model
      outputs and of functions thereof (e.g. least squares values).  Both
      numeric and symbolic derivatives are probably needed, with the latter
      including derivatives of computer source code.  The ability to take
      complicated derivatives will be of assistance in parameter estimation.
      The need exists to visualize the results of sensitivity analyses.
      Sensitivity regions can be expected to extend orders of magnitude
      further in some directions than others.
    3. Verification.  Sometimes referred to as "model validation", the
      basic question is whether there exist grounds to reject a model based
      on observations.  There is a large literature on how this might best
      be done.  The question is complicated by the fact that verification
      should be considered in the context of some proposed model use.  In
      research contexts the focus is heavily on model falsification but in
      applied contexts model acceptance may be related to 'acceptable levels
      of error'.
    4. Model comparisons.  The question in this context is generally which
      of two or more models better represents a given set of data.  Again,
      there is literature of various methods from which to choose.  This
      topic is also of relevance to "model selection" in Statistical
      Inference.
  2. Formats, standards, and interchange: what are they good for?
    A useful discussion of at least some standards, model formats, and
    ontologies is being developed at BioModels.net (e.g., SBML, MIRIAM, SBGN).
    On a related point, it might make sense for iPlant to partner with
    BioModels.net to (a) provide a home/portal for plant-specific models
    and (b) providing more substantial computational resources for online
    simulation.
  3. What types of modeling problems will best make use of unique
    iPlant/TACC resources?
    There is generally a sense (among those of us who have been discussing
    it) that modeling problems of current interest become "big" when we
    consider explorations across spaces of parameters, initial conditions,
    and populations.  Among other things, there are data management and data
    integration problems that arise in coordinating sets of simulations.
  4. Data integration drives us nuts: how can we convey useful requests
    and specifications to the Data Integration group?
  5. Personnel: what tasks can we hand to iPlant developers now, and when
    we find a group postdoc, what will he/she work on?
  6. Use cases:
    1. the intersection of photosynthesis/carbon metabolism
      and flowering time
    2. hypothesis-generation through data-mining,
      processing, and visualization
    3. lignin biosynthesis (interest from group at NCSU working to develop models from detailed 'omics datasets).