The DE Quick Start tutorial provides an introduction to basic DE functionality and navigation.

Please work through the tutorial and add your comments on the bottom of this page, or send comments per email to upendra@cyverse.org. Thank you.

Rationale and background: 

Trim Galore is an app to automate quality and adapter trimming as well as quality control, with some added functionality to remove biased methylation positions for RRBS sequence files (for directional, non-directional (or paired-end) sequencing). It's main features are:

Trim_galore makes use of the publicly available adapter trimming tool Cutadapt and FastQC for optional quality control once the trimming process has completed. Even though Trim Galore! works for any (base space) high throughput dataset (e.g.,  downloaded from the SRA) this section describes its use mainly with respect to RRBS libraries.

Prerequisites

  1. A CyVerse account. (Register for an CyVerse account here - user.cyverse.org.)

  2. Input

    1. Sequence file in fastq format (either single end or paired end reads).

    2. Paired-end specific options: 
      1. Paired (This option performs length trimming of quality/adapter/RRBS trimmed reads for paired-end files. To pass the validation test, both sequences of a sequence pair are required to have a certain minimum length which is governed by the option).
      2.  Retain unpaired reads (If only one of the two paired-end reads became too short, the longer read will be written to either '.unpaired_1.fq' or '.unpaired_2.fq' output files). 
      3. Unpaired single-end read length cut-off for read 1 (Unpaired single-end read length cutoff needed for read 1 to be written to '.unpaired_1.fq' output file. These reads may be mapped in single-end mode. Default: 35 bp).
      4. Unpaired single-end read length cut-off for read 2 (Unpaired single-end read length cutoff needed for read 2 to be written to '.unpaired_2.fq' output file. These reads may be mapped in single-end mode. Default: 35 bp).
      5. Trim 1bp from 3'end (Trims 1 bp off every read from its 3' end. This may be needed for FastQ files that are to be aligned as paired-end data with Bowtie).

  3. Parameters
    1. quality (Default Phred score: 20).
    2. phred33 (Sanger/Illumina 1.9+ encoding).
    3. phred64 (Illumina 1.5 encoding).
    4. fastqc (Run FastQC in the default mode on the FastQ file once trimming is complete).
    5. Adapter sequence to be trimmed (If not specified explicitly, the first 13 bp of the Illumina adapter 'AGATCGGAAGAGC' are used by default).
    6. adapter2 (Optional adapter sequence to be trimmed off read 2 of paired end files. This option requires '--paired' to be specified as well).
    7. stringency (Overlap with adapter sequence required to trim a sequence (very stringent setting of '1', i.e. even a single bp of overlapping sequence will be trimmed of the 3' end of any read).
    8. Maximum allowed error rate (no. of errors divided by the length of the matching region) (default: 0.1).
    9. Length (Discard reads that became shorter than length INT because of either quality or adapter trimming. A value of '0' effectively disables this behavior. Default: 20 bp). For paired-end files, both reads of a read-pair need to be longer than bp to be printed out to validated paired-end files (see option --paired). If only one read became too short there is the possibility of keeping such unpaired single-end reads (see --retain_unpaired). Default pair-cutoff: 20 bp. 
    10. Clip R1 (Instructs Trim Galore to remove bp from the 5' end of read 1 (or single-end reads). This may be useful if the qualities were very poor, or if there is some sort of unwanted bias at the 5' end. Default: OFF.
    11. Clip R2 Instructs Trim Galore to remove bp from the 5' end of read 2 (paired-end reads only). This may be useful if the qualities were very poor, or if there is some sort of unwanted bias at the 5' end. For paired-end BS-Seq, it is recommended to remove the first few bp because the end-repair reaction may introduce a bias towards low methylation. Please refer to the M-bias plot section in the Bismark User Guide for some examples. Default: OFF.
    12.  3' Clip R1 (Instructs Trim Galore to remove bp from the 3' end of read 1 (or single-end reads) AFTER adapter/quality trimming has been performed. This may remove some unwanted bias from the 3' end that is not directly related to adapter sequence or basecall quality. Default: OFF).
    13.  3' Clip R2 (Instructs Trim Galore to remove bp from the 3' end of read 2 AFTER adapter/quality trimming has been performed. This may remove some unwanted bias from the 3' end that is not directly related to adapter sequence or basecall quality. Default: OFF). 
    14. RRBS-specific options (MspI digested material): 
      1. rrbs (Specifies that the input file was an MspI digested RRBS sample (recognition site: CCGG). Sequences which were adapter-trimmed will have a further 2 bp removed from their 3' end. This is to avoid that the filled-in C close to the second MspI site in a sequence is used for methylation calls. Sequences which were merely trimmed because of poor quality will not be shortened further).
      2.  Non directional (Selecting this option for non-directional RRBS libraries will screen quality-trimmed sequences for 'CAA' or 'CGA' at the start of the read and, if found, removes the first two basepairs. Like with the option '--rrbs' this avoids using cytosine positions that were filled-in during the end-repair step. '--non_directional' requires '--rrbs' to be specified as well. 
      3. Keep (Keep the quality trimmed intermediate file. Default: off, i.e. the temporary file is being deleted after adapter trimming. Only has an effect for RRBS samples since other FastQ files are not trimmed for poor qualities separately)

If your DNA material was digested with MseI (recognition motif: TTAA) instead of MspI it is NOT necessary to specify --rrbs or --non_directional since virtually all reads should start with the sequence 'TAA', and this holds true for both directional and non-directional libraries. As the end- repair of 'TAA' restricted sites does not involve any cytosines it does not need to be treated especially. Instead, simply run Trim Galore! in the standard (i.e. non-RRBS) mode.

Test/sample data:


The test data are provided for testing trim_galore-0.4.1 is in here - /iplant/home/shared/iplantcollaborative/example_data/trim_galore:

Use the following inputs/outputs and parameters for testing trim_galore-0.4.1

  1. Input/Outputs 

    1. ATreads.fq

    2. ATreads2.fq
  2. Optional arguments/parameters
    1.  Paired (select Paired check box)
    2. Quality type - sanger
    3. fastqc (select fastqc box)

    Leave the rest of the options as default

Output Reports:

After successful completion of the run, expect the following files as output:

  1. ATreads_unpaired_1.fq 
  2. ATreads_val_1_fastqc.html 
  3. ATreads_val_1_fastqc.zip 
  4. ATreads_val_1.fq 
  5. ATreads.fq_trimming_report.txt 
  6. ATreads2_unpaired_2.fq 
  7. ATreads2_val_2_fastqc.html 
  8. ATreads2_val_2_fastqc.zip 
  9. ATreads2_val_2.fq 
  10. ATreads2.fq_trimming_report.txt

More information about Scythe-0.991 can be found at trim-galore manual.