This box searches only this space. The box at the upper right searches the entire iPlant wiki.

Skip to end of metadata
Go to start of metadata

Attendees: Doreen Ware, Jerry Lu, Chris Jordan, Karla Gendler

Action Items:

  • Karla: send out flow charts
  • All: review NGS documents, meet at 1pm EDT on Thursday to discuss data integration needs
  • Karla: set up/send out doodle poll for next meeting


Discussed using Confluence for working groups; make sure all particpants have log in and passwords prior to meeting this Thursday

Reviewed Steve and Steve's document

Working Group Update


  • have a pipeline worked out and have determined what to offer for their first pass
  • would like to have options at every step of the way, where possible
  • data format issues: what will be the input? How to track provenance? How to keep metadata with the data as it flows through pipeline and then is fed into viz tool or statistical tool? How will data persist? How will data be shared?  What data will be shared?
  • SAM format: will this suffice
  • What kinds of reports will want to be generated? Strandedness, SNPs, missing reads
  • For RNA: Where do we go from here? Output is RPKM

Visual Analytics

  • Mutual Understanding meeting first week in November and will hopefully have concrete goals and issues after this meeting
  • RG would like a 3d tool to visualize cells and subcellular parts
  • Tom Brutnell has immediate viz needs from NGS working group
    • Data Integration: whatever format comes out of NGS pipeline (SAM format) will need to work with the tool that is developed by the viz group for viewing NGS data
    • Viz group is looking at MinSeq as file format but is this enough and can it be generated from SAM format

Statistical Inference

  • Conducting 2nd meeting as we speak
  • Lack of a universal data format and are talking about what types of data they would like to represent in a data structure concept noting the dichotomy of genotype and phenotype
    • Hope to finish this ASAP to get to DI group
  • What types of summary output formats are necessary for running algorithms?
  • Have identified data sets that might benefit from parallelizing existing algorithms using GPUs
    • Ed Bucklers’ Nam (5000 lines x 30M Genotypes x 10s of phenotypes)
    • eQTL within Arabidopsis (220 lines x 500 genotypes x 25K phenotypes)
    • Association mapping sets (100’s lines x 250K genotypes x 100s phenotypes) 


  • Have not met yet but have started email discussions
  • Meta-data for modeling.  What sort of ancillary data is needed to make primary measurements of use in a modeling context

Decison is to target NGS for first iteration with DI group with the objective to try and get working and functioning by mid to late January

  • will need to look at analysis tools, available reference genomes, data exchange format

Agenda for Thursday's Call

  1. Introduction of participants
  2. Introduce Confluence as collaboration environment
  3. iPlant suppport staff ( to deal with problems
  4. WG review
  5. Prelimnary review of NGS needs based on workflows presented
  6. Identification of Action Items
  7. Set date for next meeting