The applications listed here are available for use in the Discovery Environment and are documented in: Discovery Environment Manual.

Discovery Environment Applications List

The box below searches only this space.
To search the entire iPlant wiki, enter your query in the box at the upper right.

 

 

 

 

 

Skip to end of metadata
Go to start of metadata

FastQC

FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses which you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis.

Quick Start

NOTE: FastQC will determine the format that your FASTQ reads are in (PHRED33, Illumina, etc). The detected read type will be listed on the graphs outputted. As an additional note, PHRED33 is exactly the same as Solexa / Illumina 1.9, thus if using these FASTQ files in downstream apps such as the FASTX toolkit, you will need to select PHRED33 for your format type if your reads are in Solexa/Illumina 1.9 format.

Test Data

All files are located in the Community Data directory of the iPlant Discovery Environment at the following path:

Community Data > iplantcollaborative > example_data > fastqc

Input File(s)

Use SRR070572_hy5.fastq as test data.

Parameters Used in App

When the app is run in the Discovery Environment, use the following parameters with the above input file(s) to get the output provided in the next section below.

  • Default parameters only, no further configuration needed.

Output File(s)

All outputs can be found in the directory Community Data > iplantcollaborative > example_data > fastqc

  • Expect the following as outputs (in addition to the logs generated for all analyses)
    • Directory with name of the input file used
    • zipped instance of this directory
  • Within the directory generated (in the case of the above example, it should read SRR070572_hy5_fastqc), there are two sub directories and several files.
    • Sub directories are icons (not scientifically necessary) and images.
    • Files generated in this directory are the following: fastqc_data.txt, fastqc_report.html and summary.txt
  • Within the image directory, the following files should be available:
    • duplication_levels.png
    • kmer_profiles.png
    • per_base_gc_content.png
    • per_base_n_content.png
    • per_base_quality.png
    • per_base_sequence_content.png
    • per_sequence_gc_content.png
    • per_sequence_quality.png
    • sequence_length_distribution.png

Tool Source for App

6 Comments

  1. ★★★☆☆ Would be nice to be able to view the report w/ graphics from the DE w/o having to download it (by mcrusoe)

  2. ★★★★★ null (by ccgoller)

  3. ★★★★★ Great tool (by nareshvasani)

  4. ★☆☆☆☆ null (by sanjeevsingh)

  5. ★★★★☆ useful quality control assessment for fastq files (by cbuddenhagen)

  6. Would be nice to have some documentation that discusses what to do when the FastQC finds issues with your data. For example, "What do you do if you see possible PCR contamination?" "How do you remove Kmers or over-represented sequences? Or do you just toss the data away and start anew?"