1KP_Pilot_Study_DeepGreen

1kp data analysis working group

In November 2009 a NESCent/iPlant-sponsored 1KP analysis workshop held in Phoenix to bring together members of the iPlant tree of life's (iPToL) Tree Reconciliation working group, the 1KP project and experts in phylogeny estimation using large, multi-gene data sets. The 1000 Plant Transcriptome Sequencing Initiative (www.onekp.com) aims to resolve relationships across the green plant phylogeny in order to elucidate processes contributing to diversification and biological innovations including origins in multi-cellularity, colonization of land, the evolution of vascular systems and the origins of seeds and flowers. In pursuit of these goals, the project is generating an unparalleled plant sequence database for investigating the evolution of gene families, regulatory networks and biosynthetic pathways. In collaboration with iPlant, the 1KP project plans to make all transcriptome sequences, gene trees, species trees and tree reconciliations available to the plant science community through an iPlant Discovery Environment.

As part of the iPlant Assembling the Tree of Life (iPToL) grand challenge project, the Tree Reconciliation Working Group and collaborators are developing pipelines for analysis of gene families within the context of organismal phylogenies. The working group will first analyze a pilot set of strategically placed streptophytic algae and land plant trascriptomes (see table) being generated by the 1KP project as a focal point for developing workflows aimed at circumscribing gene families, and estimating gene trees and species trees. Results will be shared with the larger community of iPToL and 1KP collaborators and prepared for publication.


Major systematics questions to be addressed in pilot project:

  • Which algal lineage(s) are sister to the land plants?
  • Do mosses, hornworts and liverworts form a clade?
  • To what degree is the timing of gene duplication events correlated across gene families?
    • Do we see diversification of gene families and/or functional groups associated with the origin of land plants, vascular plants, seed plants and/or flowering plants?
  • Are gene trees affected by sparse taxon sampling?
  • Given computational limitations (see below), is it feasible to go even further back in time to resolve the earliest branching events in history of green algae?


Some of the major computational questions include:

  • Given existing tools, how can we best infer species trees from reconciliation of unrooted and often poorly resolved (very deeply branching) gene trees?
  • What are the limits of the power of currently available gene tree/ species tree reconciliation methods?
  • How do we summarize the uncertainty in the gene tree reconciliation in a way that captures uncertainty in the species & gene tree topologies as well as the rooting of both?
  • How do we visualize the results of gene tree/ species tree reconciliations?
  • Where are the computational bottlenecks and how do we scale these analyses up?


Our experience with the pilot analysis will form the foundation for much larger analyses of the full 1000 transcriptome data set to be generated over the next 24 months. Most importantly, we will aim to address the computation challenges anticipated with analyses of the complete 1KP data set.

Table : Taxa to be included in pilot dataset from the 1KP project

 

Clade

Order

Family

Species

1KP Code

Comments

1

Basalmost angiosperms

Austrobaileyales

Illiciaceae/Schisan.

Amborella trichopoda

URDJ

Combine 1KP and AAGP data; update 1KP assembly

2

Basalmost angiosperms

Austrobaileyales

Illiciaceae/Schisan.

Nuphar advena

WTKZ

Combine 1KP and AAGP data

3

Basalmost angiosperms

Austrobaileyales

Illiciaceae/Schisan.

Kadsura heteroclite

NWMY

 

4

Magnoliid

Piperales

Piperaceae

Houttuynia cordata

CSSK

 

5

Magnoliid

Piperales

Aristolochiaceae

Saruma henryi

QDVW

Combine 1KP and FGP data

6

Magnoliid

Magnoliales

Magnoliaceae

Liriodendron tulipifera

-

AAGP data

7

Magnoliid

Magnoliales

Magnoliaceae

Persea americana

-

AAGP data

8

Chloranthales

Chloranthales

Chloranthaceae

Sarcandra glabra

OSHQ

 

9

Basal Eudicots

Ranunculales

Berberidaceae

Podophyllum pelatum

WFBF

 

10

Basal Eudicots

Ranunculales

Papaveraceae

Eschscholzia californica

multi-library

Combine 1KP and FGP data; multi-1kp-library assembly in progress

11

Basal Eudicots

Ranunculales

Ranunculaceae

Aquilegia formosa X pubescens

-

ESTs in GenBank

12

Core Eudicots

Caryophyllales

Amaranthaceae

Kochia scoparia

WGET

 

13

Core Eudicots/Rosids

Brassicales

Brassicaceae

Arabidopsis thaliana

-

Annotated Genome

14

Core Eudicots/Rosids

Brassicales

Brassicaceae

Arabidopsis lyrata

-

Annotated Genome

15

Core Eudicots/Rosids

Brassicales

Caricaceae

Carica papaya

-

Annotated Genome

16

Core Eudicots/Rosids

Malpighiales

Salicaceae

Populus trichocarpa

-

Annotated Genome

17

Core Eudicots/Rosids

Malpighiales

Euphorbiaceae

Ricinus communis

-

Annotated Genome

18

Core Eudicots/Rosids

Malpighiales

Euphorbiaceae

Manihot esculenta

-

Annotated Genome

19

Core Eudicots/Rosids

Fabales

Fabaceae

Medicago truncatula

-

Annotated Genome

20

Core Eudicots/Rosids

Fabales

Fabaceae

Glycine max

-

Annotated Genome

21

Core Eudicots/Rosids

Cucurbitales

Cucurbitaceae

Cucumis sativus

-

Annotated Genome

22

Core Eudicots/Rosids

Vitales

Vitaceae

Vitis vinifera

-

Annotated Genome

23

Core Eudicots/Rosids

Zygophyllales

Zygophyllaceae

Larrea divaricata

UDUT

 

24

Core Eudicots/Rosids

Rosales

Urticaceae

Boehmeria nivea

ACFP

 

25

Core Eudicots/Rosids

Malvales

Malvaceae

Hibiscus cannabinus

OLXF

 

26

Core Eudicots/Asterids

Gentianales

Apocynaceae

Allamanda cathartica

MGVU

awaiting assembly of top-off

27

Core Eudicots/Asterids

Gentianales

Apocynaceae

Catharanthus roseus

UOYN

 

28

Core Eudicots/Asterids

Lamiales

Lamiaceae

Rosmarinus officinalis

FDMM

 

29

Core Eudicots/Asterids

Lamiales

Phrymaceae

Mimulus guttatus

-

Annotated Genome

30

Core Eudicots/Asterids

Solanales

Convolvulaceae

Ipomoea purpurea

multi-library

multi-library assembly assembly in progress

31

Core Eudicots/Asterids

Ericales

Ebenaceae

Diospyros malabarica

KVFU

 

32

Core Eudicots/Asterids

Asterales

Asteraceae

Inula helenium

AFQQ

 

33

Core Eudicots/Asterids

Asterales

Asteraceae

Tanacetum parthenium

DUQG

 

34

Monocots

Acorales

Acoraceae

Acorus americanus

-

MonAtol

35

Monocots

Dioscoreales

Dioscoreaceae

Dioscorea villosa

OCWZ

 

36

Monocots

Liliales

Colchicaceae

Colchicum autumnale

NHIX

 

37

Monocots

Liliales

Smilacaceae

Smilax bona-nox

MWYQ

Sequencing in progress

38

Monocots

Asparagales

Asparagaceae

Yucca filamentosa

ICNN

 

39

Monocots/Commelinids

Arecales

Arecaceae

Chamaedorea seifrizii

-

MonAToL

40

Monocots/Commelinids

Poales

Poaceae

Zea Mays

-

Annotated Genome

41

Monocots/Commelinids

Poales

Poaceae

Sorghum bicolor

-

Annotated Genome

42

Monocots/Commelinids

Poales

Poaceae

Brachypodium distachyon

-

Annotated Genome

43

Monocots/Commelinids

Poales

Poaceae

Oryza sativa

-

Annotated Genome

44

Gymnosperms

Pinales

Taxaceae

Taxus baccata

WWSS

 

45

Gymnosperms

Pinales

Podocarpaceae

Prumnopitys andina

EGLZ

 

46

Gymnosperms

Pinales

Sciadopityaceae

Sciadopitys verticillata

YFZK

 

47

Gymnosperms

Pinales

Cupressaceae

Juniperus scopulorum

XMGP

 

48

Gymnosperms

Pinales

Cupressaceae

Cunninghamia lanceolata

OUOI

 

49

Gymnosperms

Pinales

Pinaceae

Pinus taeda

-

Dendrome

50

Gymnosperms

Pinales

Pinaceae

Cedrus libani

GGEA

 

51

Gymnosperms

Gnetales

Gnetaceae

Gnetum montanum

GTHK

combine with NY Consortium data? same species?

52

Gymnosperms

Gnetales

Welwitschiaceae

Welwitschia mirabilis

-

FGP set

53

Gymnosperms

Ephedrales

Ephedraceae

Ephedra sinica

VDAO

additional reads?

54

Gymnosperms

Cycadales

Cycadaceae

Cycas micholitzii

XZUY

NY Consortium data? same species?

55

Gymnosperms

Cycadales

Zamiaceae

Zamia vazquezii

-

FGP/AAGP data

56

Gymnosperms

Ginkgoales

Ginkgoaceae

Ginkgo biloba

SGTW

Combine with NY Consortium data

57

Moniliformopses

Osmundales

Osmundaceae

Osmunda cinnamonea

 

No RNA at BGI; Replacement? Barker data?

58

Moniliformopses

Marattiales

Marattiaceae

Angiopteris evecta

NHCM

 

59

Moniliformopses

Psilotales

Psilotaceae

Psilotum nudum

QVMR

 

60

Moniliformopses

Filicales

Cyatheaceae

Cyathea (=Alsophila) spinulosa

GANB

 

61

Moniliformopses

Polypodiales

Pteridaceae

Cryptogramma acrostichoides

 

No RNA at BGI; Replacement? Barker data?

62

Moniliformopses

Polypodiales

Pteridaceae

Asplenium rhizophyllum

KJZG

update 1KP assembly

63

Moniliformopses

Equisetales

Equisetaceae

Equisetum sp.

CAPN

awaiting assembly of top-off

64

Lycopods

Lycopodiales

Lycopodiaceae

Huperzia squarrosa

GAON

 

65

Lycopods

Selaginellales

Selaginellaceae

Selaginella moellendorffii

-

Annotated Genome

66

Bryophyta

Polytrichales

Polytrichaceae

Polytrichum commune

SZYG

gametophyte

67

Bryophyta

Sphagnales

Sphagnaceae

Sphagnum lescurii

GOWD

 

68

Bryophyta

Funariales

Funariaceae

Physcomitrella patens

-

Annotated Genome

69

Marchantiophyta

Marchantiales

Marchantiaceae

Marchantia polymorpha

JPYU

 

70

Marchantiophyta

Marchantiales

Marchantiaceae

Marchantia emarginata

TFYI

 

71

Anthocerotophyta

Anthocerotales

Anthocerotaceae

Nothoceros aenigmaticus

DXOU

 

72

Anthocerotophyta

Anthocerotales

Anthocerotaceae

Anthoceros

IQJU

very few large scaffolds

73

Streptophytic Green Algae

 

Mesostigmatophyceae

Mesostigma viride

KYIO

 

74

Streptophytic Green Algae

 

Mesostigmatophyceae

Chlorokybus atmophyticus

AZZW

update 1KP assembly

75

Streptophytic Green Algae

 

Mesostigmatophyceae

Spirotaenia minuta

NNHQ

 

76

Streptophytic Green Algae

 

Klebsormidiophyceae

Klebsormidium subtile

FQLP

 

77

Streptophytic Green Algae

 

Klebsormidiophyceae

Hormidiella sp.

 

New

78

Streptophytic Green Algae

 

Charophyceae

Chara vulgaris

MWXT

 

79

Streptophytic Green Algae

 

Coleochaetophyceae

Chaetosphaeridium globosum

DRGY

 

80

Streptophytic Green Algae

 

Coleochaetophyceae

Coleochaete scutata

VQBJ

 

81

Streptophytic Green Algae

 

Coleochaetophyceae

Coleochaete orbicularis

-

Timme & Delwiche paper

82

Streptophytic Green Algae

 

Zygnematophyceae

Cosmarium broomei

HIDG

awaiting assembly of top-off

83

Streptophytic Green Algae

 

Zygnematophyceae

Netrium digitus

FFGR

 

84

Streptophytic Green Algae

 

Zygnematophyceae

Spirogyra sp.

HAOX

 

85

Streptophytic Green Algae

 

Zygnematophyceae

Spirogyra pratensis

-

Timme & Delwiche paper

Analysis Pipeline and Data Access

Analysis of transcriptome sets for individual species - Collaborators have access to web-accessible databases for blast and annotation term searches.



Phylogenomic Analyses - Coding sequences extracted from transcriptome sets passing quality control will be sorted into gene families and used for estimations of gene trees, species trees and tree reconciliations. Collaborators will have access to alignments and trees.



Transcript Annotation

In support of all 1kp subprojects, all transcript assemblies are or will be accessible in the following ways:

  • Assemblies and reads are available to contributers through download sites at the Universities of Texas (TACC) and Alberta (Westgrid),
  • The results of prerun BLAST searches will be available through a project website. These results can be searched for annotation terms.
  • BLAST searches can be performed on assemblies for each taxon through a project website.
  • As described above, (see * Phylogenomic Analyses * ) assemblies will be sorted into gene families based on similarity to plant genes in the NCBI RefSeq database http://www.ncbi.nlm.nih.gov/RefSeq/ and genes from annotated plant genomes that are not in RefSeq. Sequence alignments and gene trees for each family will be available for all 1kp collaborators for ortholog identification.