Introduction

Author: Amanda Cooksey, CyVerse/University of Arizona

Goal

The goal of this tutorial is to use the Velvet-1.2.10 workflow to assemble a genome from paired-end Illumina reads.

Rationale and Background

De Novo Sequencing A process in which a novel genome is sequenced for the first time and requires specialized assembly of sequencing reads. For this tutorial the assembler Velvet will be used to assemble the genome.  A recommended approach will be followed in testing different kmer settings.

Velvet Velvet is a short read de novo assembler using de Bruijn graphs. More information about Velvet (including the manual) can be found here: http://www.ebi.ac.uk/~zerbino/velvet/

Workflow

This workflow is composed of two apps.

1. Velveth-1.2.10 takes in a number of sequence files, produces a hashtable, then outputs two files in an output directory. These files are then supplied to the second app 'Velvetg-1.2.10'.

2. Velvetg-1.2.10 is the core of Velvet where the de Bruijn graph is built then manipulated.

Approximate analysis durations for the sample data are provided in each step. Other datasets, depending on size, could take more or less time. 

Test/sample data

This tutorial uses paired-end Illumina sequencing data that are stored in the Data Store. The data were trimmed and cleaned up with the applications Scythe and Sickle.

The data to be used in this tutorial are available in the Data Store in this folder:  Community Data -> iplantcollaborative -> example_data -> Velvet

Method

  1. Search for Velvet in the 'Apps' window of the Discovery Environment and open 'Velvet-1.2.10'. There are several sections of the app that will need to be completed. Each is detailed in the following steps. 
  2. Analysis Name: No changes are required in this section. However, if you like, you can choose an analysis name (other than the default), make comments you would like associated with your analysis or choose an output folder (other than the default). Since we are running this as a workflow you do not need to retain inputs. If we were to run the Velveth app separately then we would need to select the 'retain inputs' option. 
  3. VelvetH-1.2.10-Input Files: 
    1. Select the file format of your input files. Velveth accepts either unmapped reads or alingment files and there are many file types to choose from. Note that FASTA or FASTQ file may be gzipped. For the example data you should select 'FASTQ'. 
    2. Supply you input files to the app by clicking the 'Add' button at the right side of the window and navigating to your files. The example file for this  step can be found at this path in the Discovery Environment: community_data -> iplantcollaborative -> example_data -> Velvet -> velveth -> interlaced.fq
  4. VelvetH-1.2.10-Required Arguments:
    1. Choose a name for your output directory or you can leave the default name. 
    2. Choose a k-mer size for your assembly. Choose an odd number between 17 and 31. K-mer should generally be approximately 1/2 the read length. For the example we will use 19. 
    3. For read type, short reads are ~35 bp and long reads are ~75 bp. If your data are paired-end be sure to select the appropriate 'paired' option. For the example we will choose 'short paired'.
    4. If your data are paired indicated whether they are interleaved (both forward and reverse in a single file) or separate. For the example we will use 'interleaved'.
  5. VelvetH-1.2.10-Optional Arguments
    1. Stand specific–choose if your sequencing is strand specific. For example data leave unchecked.
    2. No hash–test various k-mer lengths without doing redundant computations. For example data leave unchecked. 
  6. VelvetG-1.2.10-README provide information about the VelvetG-1.2.10 app. There are no selections to be made here.
  7. VelvetG-1.2.10-Coverage Parameters Coverage parameters may be customized but do not need to be. For example data leave all coverage parameters as default.
  8. VelvetG-1.2.10-Paired Read Only Parameters These parameters may be set or left as default. For example data set the following options
    1. Set the insert length for short paired reads (first box) to 300.
    2. Set the insert length standard deviation (second box) to 60.
  9. VelvetG-1.2.10-Output Parameters There are several options here. You can get more information about each by hovering over the 'i' at the right side of the window. For the example data select 'read tracking'
  10. VelvetG-1.2.10-Parameters Best Left Untouched While these parameters are available to you, you should probably leave them alone unless you know what the result will be. More information is available about each under the 'i' at the right or in the Velvet manual (the link at the top of this page). 
  11. Launch Analysis Click 'launch analysis' at the bottom right of the window. When your analysis has finished you will receive a notificaiton saying it has 'completed' under the bell in the top right of the screen.

Results

  1. When run with the example data the workflow takes ~ 4 min to complete.
  2. All outputs from the workflow can be found in the 'Velveth' folder in the folder you specified for your outputs
  3. Your assembled contigs can be found in the file 'contigs.fa'.