SI_20091020
iPG2P Statistical Inference
October 20, 2009
Attendees: Ed Buckler, Jean-Luc Jannink, Chris Myers, Peter Bradbury, Scott Menor, Liya Wang, Steve Welch, Barb Stranger, Dan Kliebenstein and Matt Vaughn (locked out)
Action Items:
[SECTION I] Please comment on the Data outline and I will forward to the Data Integration group for their consideration
[SECTION III] Please come up with specific algorithms to compare lots of genotypes to an individual phenotype (these algorithms will then be run against other phenotypes for the same population).
Notes/Agenda:
- Personalities
- For iPlant, there is a general process to identify potential users and develop their profiles to help with the computational development (Know your audience).
- Matt/Karla will update us about setting up the computational postdoc profile for editing
- The computational postdoc profile (Andres) looked like a good beginning and there were no real suggestions for improvement.
- Other users that we feel this package should facilitate?
- Biologically savvy graduate student
- Generates phenotypic data and just wants to find a list of genotypes that might influence the phenotype
- Mathematical or Statistical scientist
- Wants to access the data and computational resources to test their algorithm for genotype to phenotype linkage
- Would want the ability to insert new algorithms into the package for use.
- Would want the ability to get performance measures on the computational time per test, etc.
- High school or undergraduate lab student
- Might have a mapping population and do a phenotyping analysis in a lab course.
- Would then want to be able to do the genotype to phenotype tests and get some generally informative answer.
- Likely involve links to other modules
- Lab Instructor
- Ability to understand what their students are doing with the module.
- Biologically savvy graduate student
- Data
- Previous meeting came up with the following request for a universal data format for Genotype and Phenotype in mapping populations (structured and non-structured)
- Outline
- Line data
- Geographic position of collection
- Environmental data
- Population design
- Genotypic Data
- Physical position within the organisms genome
- Genetic position within the map
- Class variable for allelic state
- Basepair
- Quantitative value for allelic state
- Allows for polyploidy
- Allows for copy number variation
- Genotyping Assay
- Error rate
- Phenotypic Information
- Phenotype classification
- Nucleic acid based
- Gene id from which originated
- Metabolic based
- Metacyc identity
- Physiological based
- ??
- Developmental based
- Plant ontology id
- Nucleic acid based
- Phenotype value
- Individual replications?
- Means?
- Covariates per line per measurement
- Environment or treatment
- Phenotype classification
- Experimental Database
- Environment descriptors
- Treatment descriptor
- Tissue descriptor
- Time descriptor
- The predominant setup will be where there is one main genotypic data source per population and then potentially multiple independent sources for phenotypic information
- i.e. QTL mapping populations are generated once by one lab and then phenotyped independently by a large number of labs
- These additional labs may also have separately scored genotypes but it is not known if it is desirable to allow these into the main database
- Line data
- We need to finalize this and see if we can forward to the data integration group
- Algorithms
- What base algorithm should we begin attempting to link genotypes to phenotypes using these different apporaches?
- GLM (ANOVA)
- Maybe more for QTL population structure is less of a worry
- Which algorithm do F-test searching through the genotypic space. Reiterative might be nice
- ??
- ??
- Mixed Model
- Maybe more for GWA basis
- Iteration
- Technology demonstration
- Algorithms
- EMMA with two SNPs or more on moderate datasets
- EMMA with one SNP on monster datasets
- ??
- Bayesian
- Would be nice to have it work for both structured (QTL) and non-structured (GWA) populations
- Algorithms
- ??
- ??
- GLM (ANOVA)
- What base algorithm should we begin attempting to link genotypes to phenotypes using these different apporaches?
- Next Meeting
- Dan K will send around a Doodle pool for next meeting in two weeks time we will try to make this next Doodle poll set up a standing time for the rest of the year.
- Other Topics
WebEx:
Topic: iPG2P Statistical Inference Meeting
Date: Tuesday, October 20, 2009
Time: 10:00 am, Mountain Standard Time (GMT -07:00, Arizona)
Meeting Number: 759 336 365
Meeting Password: iPC123
Please click the link below to see more information, or to join the meeting.
-------------------------------------------------------
To join the online meeting (Now from iPhones too!)
-------------------------------------------------------
1. Go to https://ua.webex.com/ua/j.php?ED=118154482&UID=1064553707&PW=f591390f3a2f5a441f
2. Enter your name and email address.
3. Enter the meeting password: iPC123
4. Click "Join Now".
-------------------------------------------------------
To join the teleconference only
-------------------------------------------------------
Call-in toll-free number (US/Canada): 866-699-3239
Call-in toll number (US/Canada): 1-408-792-6300
Toll-free dialing restrictions: http://www.webex.com/pdf/tollfree_restrictions.pdf