The 1000 plants (oneKP or 1KP) initiative is an international multi-disciplinary consortium that has generated large-scale gene sequencing data for over 1000 species of plants. Major supporters include Alberta Ministry of Innovation and Advanced Education, Musea Ventures (Somekh Family Foundation), Beijing Genomics Institute in Shenzhen (BGI-Shenzhen), China National GeneBank (CNGB), iPlant Tree-of-Life (iPToL) Grand Challenge,Compute Canada (Westgrid), Alberta Innovates Technology Futures (AITF-iCORE Strategic Chair). The sample selection was originally based on a series of overlapping sub-projects with scientific objectives that could be addressed by sequencing multiple plant species -- descriptions of these sub-projects and the associated species list can be found at http://www.onekp.com. As more collaborators joined 1KP, the objectives evolved and are now exemplified by a diverse collection of papers. Here we describe the plans for a final capstone paper.

1000 Plants Data Set

We generated an average of 2Gb of RNA-seq data per sample on the Illumina sequencing platform (GA2 or HiSeq). For the most part, we sequenced one tissue sample per species, although exceptions were made when the scientific objectives required it. Paired-end data were assembled by SOAPdenovo-trans (http://soap.genomics.org.cn/SOAPdenovo-Trans.html). Each sample typically yielded 10k scaffolds with lengths of greater than 1kb. The results are released on password-protected repositories at Westgrid (http://onekp.westgrid.ca/1kp-data) and TACC (http://web.corral.tacc.utexas.edu/OneKP). These repositories contain raw unassembled reads, assembled transcriptomes, and as an estimate of gene expression levels, averaged read depths computed across each scaffold.

One of the distinguishing characteristics of 1KP is the fact that we had no restrictions on the species that had to be sequenced. Although the majority of our sub-projects had applications-focused objectives, the majority of our samples were chosen to represent every species known to science, across the plant kingdom, at some phylogenetically or taxonomically defensible level. For example, we sequenced a representative of nearly all of the 415 known angiosperm families, and about a fifth of our sequenced species are algae. Most of these species have never been subjected to large-scale gene sequencing, and as a result, our cumulative efforts have now sequenced approximately 2 orders of magnitude more genes than the totality of the public databases.

Figure 1: Number of genes sequenced by 1KP as compared to the entirely of the NCBI databases as of March 2012. The branches of the phylogenetic tree are weighted by gene count. For artistic reasons, the scale is not uniform along the vertical axis. Black bars at the right indicate the approximate widths for one million genes. Notice that the 1KP counts were based on the initial SOAPdenovo assemblies, which contain roughly half as many genes as the new SOAPdenovo-trans assemblies that will be published.

Many bioinformatic analyses are being carried out for project-wide consumption. A phylogenomics pipeline (https://pods.iplantcollaborative.org/wiki/display/iptol/1KP_Phyloinformatics_Pipeline) has been developed and, in the process of computing the species tree, it will also provide the consortium with corrected reading frames, multiple sequence alignments, and gene family clusters. These results will be available for consortium use well before the species trees are computed. Ultimately, every gene tree will be reconciled against the species tree, to determine the timing of the gene duplications within each gene family relative to the speciation events. We are working with iPlant developers to provide web-based visualizations. The following describes the analyses being done.

Sorting transcripts into gene families (Naim Matasci): Finished; but automatically generated orthoMCL clusters are only approximations to gene families and some manual curation will be required.

Alignment and gene tree estimation and reconciliation with NCBI taxon tree (Tandy Warnow, Jim Leebens-Mack, and iPLANT): Scripts have been written and validated for gene families with <2000 genes. We can try pushing this to 10,000 but it will be difficult. Pipelines for doing the reconciliations and placing them into the viewer are in place. The fact that we will eventually replace the NCBI taxon tree should not be a bottleneck as it is mostly correct.

Reconciliation viewer (Naim Matasci): Software components are in place.The plan is to showcase the iPLANT website as part of our pilot publication on the phylogenomics of the first ~100 species.

Protein-protein interactions (Ling-Hong Hung): Based on homology to known interacting pairs, http://cando.compbio.washington.edu/wiki/CANDO.

Publications Strategy

All of the data will be released with the publication of a high-impact "capstone" paper that we will submit in 2015, along with multiple companion papers arising from the many 1KP sub-projects and the many analyses performed for the capstone. Even before the capstone is submitted, we expect to have published at least 3 methodology papers: on the RNA extractions, on the SOAPdenovo-trans assembler, and on the phylogenomics for a pilot data set of about 100 species. Many papers are already being published in advance of the capstone, because they require little of the data to be released prematurely, and their objectives are so different that early publication will not dilute the capstone's impact. All papers are tracked on the consortium wiki.

The capstone will emphasize the diversity of the species we chose. It will have 2 major components, starting first with a phylogenomics analysis of all 1000 species. A species tree will be computed from the subset of "low copy" genes, and on this tree, all of the gene families (low and high copy number, functionally characterized or not) will be attached and displayed within the iPlant framework. We expect in the process to resolve some important questions on single to multi-cellular evolution. To expand our readership beyond the systematics community, we will also perform an analysis of the gene changes associated with major evolutionary transitions across the plant kingdom. A core group of the major 1KP contributors has developed a working list (summarized below) of important evolutionary transitions and well-studied gene families.

Central to this plan is the recruitment of gene family experts to help interpret our results. We do not expect that all of the recruited experts will discover something truly novel, and neither is it necessary (or feasible) for us to analyze every gene family known to science. The idea is to cast a wide net, so that when we write the capstone, we can cherry-pick the most interesting and compelling results to incorporate into that paper. Given the inevitable page limits, it is unlikely that we will have enough space to discuss more than a few evolutionary transitions and gene families. Any unused analyses will go into the expected companion papers. All collaborators are encouraged to publish their own papers with controlling authorship and on their own timeline.

The capstone will be published under a consortium byline, "the 1000 plants (1KP) consortium", similar to the ENCODE project consortium on the September 6th 2012 cover of Nature. Everyone who contributed to the data collection or the data analysis will be included in the author list; individual contributions will be clearly stated. Although it is our intention and desire that people use this data, we cannot sabotage the capstone by allowing too much data to be released prematurely, or by allowing scientific conclusions destined for the capstone to be published in advance. In general, analyses of sufficient phylogenetic breadth run the risk of potential conflict with the capstone. We trust our collaborators to exercise the appropriate self-restraint, and also to appreciate that we want their best results to be incorporated into the capstone itself.

Additional papers, strong enough to justify a separate publication independent of the capstone, are most welcome. However the objectives must be sufficiently different (e.g. correlation of polyploidy and angiosperm diversification) to prevent the journal editors from asking us to merge everything together. Any paper that has a conceptual overlap with the capstone should be submitted to another journal and/or submitted after the capstone.

Selected Transitions

The green plant tree of life is marked by numerous innovations including the evolution of multi-cellularity, transitions from marine to freshwater and terrestrial environments, maternal retention of zygotes and embryos, the evolution of complex life histories including haploid and diploid phases, and the origin of vascular systems, the seed and the flower. These innovations define key transitions in the history of green plants and the origin of diverse groups of plants that form the foundation of local and our global biota. Having sampled transcriptomes across the green tree of life, 1KP is in a unprecedented position to assess changes in gene content or gene expression associated with each of these key transitions. Our phylogenomics pipeline (https://pods.iplantcollaborative.org/wiki/display/iptol/1KP_Phyloinformatics_Pipeline) is placing assembled transcripts into gene family alignments and gene trees. All 1KP consortium members will be given access to these analyses. To assist consortium members who are unfamiliar with the sequenced plant species and their evolutionary history, species/taxon consultants have been assigned to each transition. There are other important transitions (e.g., stomatophytes between embryophytes and tracheophytes) that we did not list. In those instances, the relevant expert(s) would be the ones for the next largest group.

Origin of Viridiplantae (Green Plants)

Species/Taxon Consultant: Michael Melkonian, E-mail michael.melkonian@uni-koeln.de

Characteristic Innovations: origin of plastids with Chl a+b, intraplastidial starch synthesis, nuclear localization of the gene for RbcS, origin of whiplash flagella

Origin of Streptophyta

Species/Taxon Consultant: Michael Melkonian, E-mail michael.melkonian@uni-koeln.de

Characteristic Innovations: possible transition to freshwater, change in cell division, origin of phragmoplast and comparison to origin of phycoplast, flagella asymmetrically attached, origin of plasmodesmata

Origin of Embryophytes (Land Plants)

Species/Taxon Consultant: Michael Melkonian, E-mail michael.melkonian@uni-koeln.de; Sean Graham, E-mail swgraham@interchange.ubc.ca; Dennis Stevenson, E-mail dws@nybg.org

Characteristic Innovations: cuticle, poikilohydry (desiccation tolerance/water control in somatic protoplasm), food-conducting cells, lignin and lignin-precursors (non-mechanical functions) -- any evidence of these in algae?

Origin of Tracheophytes (Vascular Plants)

Species/Taxon Consultant: Sean Graham, E-mail swgraham@interchange.ubc.ca; Dennis Stevenson, E-mail dws@nybg.org

Characteristic Innovations: homoiohydry (stable water supply to tissues) including less desiccation control and origin of 'true' vascular tissue -- xylem with lignified tracheids, specialized food-conducting apparatus -- phloem tissue; autonomous sporophytes, reduced thalloid gametophytes

Origin of Euphyllophytes

Species/Taxon Consultant: Sean Graham, E-mail swgraham@interchange.ubc.ca; Dennis Stevenson, E-mail dws@nybg.org

Characteristic Innovations: overtopping growth form

Origin of Spermatophytes (Seed Plants)

Species/Taxon Consultant: Sean Graham, E-mail swgraham@interchange.ubc.ca; Dennis Stevenson, E-mail dws@nybg.org

Characteristic Innovations: axillary shoot branching

Origin of Angiosperms (Flowering Plants)

Species/Taxon Consultant: Doug Soltis, E-mail dsoltis@botany.ufl.edu; Pam Soltis, E-mail psoltis@flmnh.ufl.edu; Sean Graham, E-mail swgraham@interchange.ubc.ca; Dennis Stevenson, E-mail dws@nybg.org

Characteristic Innovations: extremely reduced gametophytes

Diversification of Mesangiosperms (including Monocots, Eudicots, Magnoliids)

Species/Taxon Consultant: Doug Soltis, E-mail dsoltis@botany.ufl.edu; Pam Soltis, E-mail psoltis@flmnh.ufl.edu; Sean Graham, E-mail swgraham@interchange.ubc.ca; Dennis Stevenson, E-mail dws@nybg.org; Jim Leebens-Mack E-mail jleebensmack@plantbio.uga.edu

Characteristic Innovations: (origin of monocots) calcium oxalate raphides, no vessels in leaves, steroidal saponins, diffuse vascular bundles, (origin of core eudicots) ellagic and gallic acids

Gene Family Experts

The following table lists the gene family experts who have accepted our invitation to join the capstone analysis. We are continually updating the table as more people join 1KP. To avoid conflict we prefer to assign one expert to each gene family, but we do make exceptions if people indicate a willingness to collaborate on a particular gene family. Everyone is encouraged to publish their findings as they see fit and on their own timeline, but they should appreciate that once the capstone is published all of the sequences will be public.

BIOLOGICAL PROCESS OR GENE FAMILY	FIRSTNAME	SURNAME	AFFILIATION
ABC transporters	Neal	Stewart	University of Tennessee
ABC1 kinases and Clp proteases	Klaas	van Wijk	Cornell University
AGP and GT2 genes	Tony	Bacic	University of Melbourne
AGP and GT2 genes	Monika	Doblin	University of Melbourne
ammonium and phosphate transporters	Pierre-Emmanuel	Courty	University of Basel
AP2 domain proteins	Michael	Holdsworth	University of Nottingham
auxin network and F-box genes	Markus	Geisler	University of Fribourg
auxin network and F-box genes	Ivo	Grosse	Martin Luther Universität Halle-Wittenberg
auxin network and F-box genes	Martin	Porsch	Martin Luther Universität Halle-Wittenberg
auxin network and F-box genes	Marcel	Quint	Leibniz Institute of Plant Biochemistry
BAHD	John	D'Auria	Texas Tech University
bHLH and TCP genes	Victor	Albert	SUNY University at Buffalo
bHLH and TCP genes	Lorenzo	Carretero-Paulet	SUNY University at Buffalo
BR signaling pathway	Zhi-Yong	Wang	Carnegie Institution for Science
chromatin methylation	Robert	Schmitz	University of Georgia
chromatin methylation	Adam	Bewick	University of Georgia
ciliome biology	Steven	Kelly	Oxford University
ciliome biology	Jane	Langdale	Oxford University
circadian clock genes	Ulf	Lagercrantz	Uppsala University
cullin-RING ubiquitin protein ligases	Richard	Vierstra	University of Wisconsin
cuticle biology and wax synthesis	Ljerka	Kunst	University of British Columbia
cuticle biology and wax synthesis	Jocelyn	Rose	Cornell University
defense peptides	Christian	Gruber	Medical University of Vienna
folate synthesis and other B vitamins	Andrew	Hanson	University of Florida
glucosinolate biosynthesis	Barbara	Halkier	University of Copenhagen
glycosyltransferase families GT47 and GT77	Jesper	Harholt	University of Copenhagen
glycosyltransferase families GT47 and GT77	Peter	Ulvskov	University of Copenhagen
glycosyltransferase family 1 and glycoside hydrolase family 28	Luiz-Eduardo	Del-Bem	Harvard School of Public Health
GSK3/Shaggy-like kinases and cell adhesion	Juliet	Coates	University of Birmingham
GSK3/Shaggy-like kinases and cell adhesion	Younousse	Saidi	University of Birmingham
HDZ3/ZPR	Pamela	Soltis	University of Florida
histone deacetylases	Stéphane	Bourque	Université de Bourgogne
isoprenyl diphosphate synthase	Feng	Chen	University of Tennessee
kinases	Shin-Han	Shiu	Michigan State University
leaf and fruit development	Barbara	Ambrose	New York Botanical Garden
LysM RKs	Thorsten	Nürnberger	University of Tübingen
MADS-box	Guenter	Theissen	Friedrich Schiller University of Jena
mycorrhizal and rhizobial associations	Giles	Oldroyd	John Innes Centre
nitric oxyde synthase	Sylvain	Jenadroz	Université de Bourgogne
nitric oxyde synthase	David	Wendehenne	Université de Bourgogne
P450	David	Nelson	University of Tennessee
P450	Danièle	Werck-Reichhart	Institut de Biologie Moléculaire des Plantes
peroxidase, class III	Christophe	Dunand	University Paul Sabatier (Toulouse III)
phenylpropanoid	Clint	Chapple	Purdue University
photosynthesis	Xinguang	Zhu	CAS-MPG Partner Institute for Computational Biology
phytochrome	Sarah	Mathews	Arnold Arboretum (Harvard University)
PP2C phosphatases	Christian	Doerig	Monash University
PPR proteins	Patrick	Finnegan	University of Western Australia
PPR proteins	Ian	Small	University of Western Australia
PYR/PYL/RCAR ABA receptors and DNA demethylation pathway	Shaojun	Xie	Purdue University
PYR/PYL/RCAR ABA receptors and DNA demethylation pathway	Jian-Kang	Zhu	Purdue University
retinoblastoma-related (and isoprenoid synthesis)	Wilhelm	Gruissem	Eidgenössische Technische Hochschule Zürich
SABBATH methyltransferases	Todd	Barkman	Western Michigan University
secondary growth and wood formation	Andrew	Groover	US Forest Service at Davis
sugar/sucrose transporters	Daniel	Wipf	Université de Bourgogne
sulphate transporters	Leonardo	Casieri	Université de Bourgogne
terpene synthase	Feng	Chen	University of Tennessee
transcription factors	Stefan	Rensing	University of Marburg
tubulin	Jack	Tuszynski	University of Alberta