iPG2P Modeling Tools Working Group

November 18th, 2009 - 4pm EST

**Attendees:** Karla Gendler, Adam Kubach, Ann Stapleton, Chris Myers, Matt Vaughn, Jeff White, Liya Wang, Melanie Correll, Sanjoy Das, Steve Welch

**Action Items:**

- All: Provide list of modeling tools currently used
- Talk with other working groups to see how modeling connects with use cases from steering committee.
- Adam and Liya: look at SBML, BioModels.net, and OpenMI
- Gendler: send out email with Confluence introduction, how organized, where to find meeting notes, etc
- Gendler: send out poll to establish a regular meeting time

**Notes/Agenda**:

- Tools: what commonalities should we focus on, when should we let 1000 flowers bloom, and how do we connect the two?
- Myers asked what are the sticking points in the workflows that we execute now and are there ways to join forces to help solve some of these problems. White suggested that perhaps we don't want to build a general solution but instead should be interested in standardizing interfaces. Das pointed out that everyone writes in their own language so will there be a way to integrate the models. Welch commented that the goal would be to work towards a strategy that lets people share, using SBML as an example. Myers said that the issues that arise in modeling are different than creating a centralized workflow like the other working groups are doing. Vaughn commented that people tend to use their own code but they could be limited by access to data. Consider iPlant as a big tent; while others are taking a common/centralized approach, this group should think about how to democratize modeling. Myers said that there will be a need for enhanced computational needs and or/developement efforts and how do we prioritize development on certain tools? The discussion was tabled with an action item being that everyone should provide a list of modeling tools that they currently use.

- Formats, standards, and interchange: what are they good for?
- White stated that in crop modeling, they are all over the place in formats and standards which makes it extremely hard to compare models. He proposed moving to a much more standard interface. Myers pointed out that SBML can be used to exchange models and work in the same format. Vaughn asked if SBML makes assumptions about the way models are executed. Myers answered that you can take SBML and do Monte Carlo or deterministic modelining; there is not specification as to how executed. Welch added that SBML version 3 will expand its capabilities and that this group might want to contact SBML person to help draft standards. Myers said that the bioModels.net group in Europe presents a good opportunity to partner with. Correll added that the bioModels.net would be a good framework to look at, with their repository for linking models and it would be at least something to consider in the iPlant modeling group.

- What types of modeling problems will best make use of unique iPlant/TACC resources?
- In working with NAM populations, White said that just simulating 500 genotypes can add up in a hurry. Myers said it would be good to have an infrastructure to manage related but not idential big runs White asked if it was possible to put a wrapper around an executable. How do you deal with all of the data this generated? Welch asked if there are tools that help and are there issues on both the input and output? Vaughn said that that is a cyberinfrastructure problem, not nescessarily a data integration issue. Welch said we can help the data integration group by identifying needs.

- Data integration drives us nuts: how can we convey useful requests and specifications to the Data Integration group?
- Personnel: what tasks can we hand to iPlant developers now, and when we find a group postdoc, what will he/she work on?
- Use Cases
- Welch said that the use cases should be a litmus test; if we can be meeting the needs identified than we are making progress and with these use cases, larger groups can be involved. It would be good to begin cataloguing tools now. Myers sent a request to the group to start listing tools. White said that he is reluctant to go outside of the group until the group is more focused. However, with the work on photosynthesis/phenology, both Visual Analytics and Statistical Inference are looking at the NAM lines and maybe the question is how would one model phenology in maize and perhaps to start mapping out that process. What data do we have to work with? Myers suggested RNAseq data. He aslo asked where are there connections with other iPlant activities; with photosynthesis there is work with Tom Brutnell. White would like to look at wheat phenology data. Myers suggested that the group also work top down, to see how modeling connects to other parts of iPlant. Correll asked what are the other groups doing and was pointed to Confluence.

Expanded agenda:

Modeling Tools group,

Regarding this afternoon's working group teleconference, I've elaborated

a bit on the draft agenda that was circulated previously (included

below). Whether or not such an elaboration is useful remains to be seen.

Talk to you later,

Chris

- Tools: what commonalities should we focus on, when should we let

1000 flowers bloom, and how do we connect the two?

+Some thoughts on tools from the Steve & Steve Trip Report:+Selecting and/or developing modeling tool sets. Tools are needed

for parameter estimation, sensitivity analysis, verification, and

model comparison. Because modeling is such a diversified activity, it

may be useful for the members of the work group to identify items from

their own workflows and seek commonality.

A few general points on each are:- Parameter estimation. This really equates to the need to optimize

one or more goodness-of-fit functions [e.g., least squares, maximum

likelihood, maximum entropy (possibly), or hand-crafted objectives].

So the real need is for optimizers that can be readily used in a

generalized fashion. This need is shared by Statistical Inference.

As these problems are numerically intensive, parallel approaches

should be investigated. Also, both nondeterministic (e.g. particle

swarm optimization) and deterministic (e.g. DIRECT) algorithms should

be considered. - Sensitivity analysis. In principle, three types of sensitivities

can be investigated, namely to (i) initial conditions, (ii) parameter

values, and (iii) to input values. Of these, sensitivity to

parameters is probably most important in the near term. Tools are

needed that can explore model responses near an optimally fitting set

of parameters. These responses include both the values of model

outputs and of functions thereof (e.g. least squares values). Both

numeric and symbolic derivatives are probably needed, with the latter

including derivatives of computer source code. The ability to take

complicated derivatives will be of assistance in parameter estimation.

The need exists to visualize the results of sensitivity analyses.

Sensitivity regions can be expected to extend orders of magnitude

further in some directions than others. - Verification. Sometimes referred to as "model validation", the

basic question is whether there exist grounds to reject a model based

on observations. There is a large literature on how this might best

be done. The question is complicated by the fact that verification

should be considered in the context of some proposed model use. In

research contexts the focus is heavily on model falsification but in

applied contexts model acceptance may be related to 'acceptable levels

of error'. - Model comparisons. The question in this context is generally which

of two or more models better represents a given set of data. Again,

there is literature of various methods from which to choose. This

topic is also of relevance to "model selection" in Statistical

Inference.

- Parameter estimation. This really equates to the need to optimize
- Formats, standards, and interchange: what are they good for?

A useful discussion of at least some standards, model formats, and

ontologies is being developed at BioModels.net (e.g., SBML, MIRIAM, SBGN).

On a related point, it might make sense for iPlant to partner with

BioModels.net to (a) provide a home/portal for plant-specific models

and (b) providing more substantial computational resources for online

simulation. - What types of modeling problems will best make use of unique

iPlant/TACC resources?

There is generally a sense (among those of us who have been discussing

it) that modeling problems of current interest become "big" when we

consider explorations across spaces of parameters, initial conditions,

and populations. Among other things, there are data management and data

integration problems that arise in coordinating sets of simulations. - Data integration drives us nuts: how can we convey useful requests

and specifications to the Data Integration group? - Personnel: what tasks can we hand to iPlant developers now, and when

we find a group postdoc, what will he/she work on? - Use cases:
- the intersection of photosynthesis/carbon metabolism

and flowering time - hypothesis-generation through data-mining,

processing, and visualization - lignin biosynthesis (interest from group at NCSU working to develop models from detailed 'omics datasets).

- the intersection of photosynthesis/carbon metabolism