DNA Subway & Microbiome Workshop July 30-31, 2018 Austin Community College, Highland Campus

Dave Micklos and Bruce Nash, DNA Learning Center, Cold Spring Harbor Laboratory

Uwe Hilgert, BIO5 Institute, University of Arizona 


David Micklos, Bruce Nash, and Jason Williams

Local Point of Contact:

Poornima Rao


David Micklos, Bruce Nash , and Uwe Hilgert


July 30-31, 2018


Austin Community College - Highland Campus, 6101 Highland Campus Dr., Austin, TX, 78752

(Use North Entrance)

Workshop Prep

1. CyVerse Account

2. Laptop

Please bring your own Wi-Fi enabled laptop to the workshop. Make sure your laptop has the following:

  • Internet Browser: Please have an up-to-date web browser (We strongly recommend Firefox, Safari, or Chrome; others may not work properly.)

3. DNA Subway

Please ensure that you can log on to DNA Subway:

Please check that you can sign in to the Purple Line (this is on a separate server from DNA Subway) :

4. DNA barcoding sample

Bring a small sample of plant tissue (a leaf/flower/etc.) or collect one on campus before the experiment.

Additional Readings and Assignments

  • If you wish to bring sequence data for the microbiome analysis and did not get sequencing from us, please do so! 

Should you decide to do so and have Illumina Cassava format sequence files, you can upload your files to the Cyverse datastore to make them available for analysis with DNA Subway. To do so, Cyberduck is a fast and relatively straightforward application to handle uploads and downloads that is available free for Mac and PC:

    • You will also need a file with your metadata mapped to your samples. We can work to correct errors in your fies, but getting a head start on your metadata is a good idea.
      • Qiime2 requires a format, which is explained here: There is also a sample file that can be downloaded from the tutorial. Note that the file must follow the format, which does not allow spaces. Including spaces between words is a very common reason for mapping files fail to validate. Qiime2 requires a tab delimited file (.tsv). These can be saved from within Excel (using "save as") and many other spreadsheet or text editors, but may need the file extension changed. This will be covered during the workshop.

Workshop Powerpoint Presentations:

DNA barcoding and Blue Line resources:

Microbiome and Purple Line resources:



RNAseq and Green Line resources:

Maize annotation and Red Line resources:

Maize Version 4 and RNA Evidence

MaizeCODE Apollo Track Legends:

B73v4_protein_coding_genes.gff: protein coding genes with AED and QI 

est2genome_FLC.gff: Full length cDNAs from genbank

est2genome_GZT.gff: aligned transcripts from the v3 annotation (not used to generate the v4 annotations)

est2genome_ISO.gff: Isoseq data, full length isoform sequencing using PacBio single molecule sequencer

est2genome_MS.gff: Trinity assembled Illumina RNA-Seq  data from seedling (high depth)

est2genome_TR.gff: 95 Trinity assembled Illumina RNA-Seq experiments (complexity reduced using cdhit)

protein2genome_AT.gff: Arabidopsis proteins

protein2genome_BD.gff: Brachypodium proteins

protein2genome_GZP.gff: v3 proteins (not used to generate the v4 annotations)

protein2genome_OS.gff: Rice proteins

protein2genome_SB.gff: Sorghum proteins

protein2genome_SI.gff: Setaria proteins

augustus_masked.gff: Gene predictions from augustus

fgenesh_masked.gff: Gene predictions from fgenesh

More detail:

  • The est2genome_MS.gff are polished alignments of assembled mRNA-seq data. Here is a URL for the original publication.
  • The est2genome_TR.gff file is a little more involved. We started with 95 mRNA-seq experiments that were publicly available in genbank. They were assembled individually using Trinity. This is described in this publicaiton Given the large number of experiments there were a lot of redundant transcripts that took a lot of time to align. I used cdhit to filter out redundant transcripts. This file contains the polished alignments of the non-redundant transcripts.

Workshop Agenda

Monday, July 30

9:00 am Welcome, Logistics, Introductions

9:30 am Workshop Objective: Course-Based Undergraduate Research CyVerse and DNA Subway (Micklos)

10:00 am Blue Line 1: DNA Barcoding, DNA Extraction and PCR (Micklos)

11:00 am Purple Line I: Microbiomes, Data Set-up, De-Multiplexing (Nash)

12:45 pm Lunch

1:30 PM Blue Line 2: DNA Barcoding, Gel Electrophoresis and Analysis (Micklos) [1-10, 11-20, 21-25]

3:00 PM Red Line 1: MaizeCODE, Guided Gene Annotation (Micklos and Hilgert)

4:45 PM Purple Line 2: Trimming 5:00 PM Dismissal

Tuesday, July 31

9:00 am Purple Line 3: Rarefaction, Core Analysis (Nash)

10:15 am Red Line 2: Independent Gene Annotation (Micklos and Hilgert)

12:15 pm Lunch 1:00 pm Purple Line 4: Analysis (Nash)

2:30 pm Green Line: Differential Gene Expression: RNA-Seq Analysis (Nash)

3:30 pm Evaluation and Online Follow-up

4:00 pm DismissalOther Websites and Links and Resources


Post Workshop Survey

Please complete this survey at the conclusion of the workshop:

Post Workshop DNA Sequencing Results

Instructions on retrieving your results will be posted here after the workshop.






