This space is home to learning materials and tutorials created for CyVerse products and services. To search the entire CyVerse wiki, use the box at the upper right.


LEARNING MATERIALS
Maintenance Notification

CyVerse systems will be unavailable on Tuesday, September 18th, from 7:00am to 5:00pm MST.
Check your local timezone here https://goo.gl/CHqLph

Keep up to date with our maintenance schedules on the CyVerse public calendar
http://www.cyverse.org/maintenance-calendar

ACCESS TO OR USAGE OF THE FOLLOWING SERVICES WILL BE UNAVAILABLE OR DISRUPTED:

Discovery Environment:        7:00am to 5:00pm MST

The Discovery Environment will be unavailable while patches and updates are applied.
        ** Currently running analyses will be terminated. Please plan accordingly.

DataStore                    
7:00am to 5:00pm MST
The DataStore will be unavailable during the maintenance period.

Atmosphere                    7:00am to 5:00pm MST
Atmosphere instances will continue to run.
Commonly-used images will be available for launching instances, but users will be unable to create new images, or launch instances from rarely-used images, while we perform Atmosphere and DataStore maintenance.

User Portal                   7:00am to 5:00pm MST
The User Portal, http://user.cyverse.org, will be unavailable while we perform maintenance and updates.

Agave/Science API             7:00am to 5:00pm MST
The Agave/Science API will be unavailable during this maintenance period.


Please contact suppport@cyverse.org for any questions, or concerns.

 

 

 

 

 

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Log into the Discovery Environment.
  2. Open FastQC 0.10.1 (multi-file) (Apps > Public Apps > NGS > QC and Processing > FastQC > FastQC 0.10.1 (multi-file)).
    1. Name your analysis.
  3. Click on "Select input data;" enter your sequence data file into Select input data.
    1. To analyze the sample data, enter all 4 sequence read files (.fastq) provided in Community Data > iplant_training > read_cleanup > sampledata.
  4. Launch the analysis.
  5. Once the analysis is completed (approx. 10 min. with the sample data), click on "Analysis," and then click on the analysis name to open the output folder.
  6. Examine the results:
    1. Within the output job file, there should be one directory for each of the original read files entered into the app. For the 4 test data files there should be four directories, each with 3 files and 2 folders that contain the information to build a web page listing comprehensive information on the sequence read files.
    2. Most of the evaluation information is provided in graphical formats in "Images" sub-directories. They hold .png files that depict the graphs produced during the analyses and that can be viewed directly in the DE by clicking on the file names.
    3. For a comprehensive comparison of the output data, download (using Simple download) the zipped output file to the desktop. (If the download takes to long or you only wish to examine specific files you can open them in the "Images" directories in the DE.)
    4. Examine the results for each of the four sequence read sets analyzed by opening the webpages (fastqc.report.html files) for each of them.
    5. Also examine the .png files in the Images directories.
    6. Notice that the data quality differs from read set to read set. Some sets have better scores at the longer lengths ("Per base quality"). Some sets may lose a significant portion of the total reads by using a high cutoff score ("per-sequence-quality"). The quality of a sequence varies with the position of the assessed units in the sequence ("Per base sequence content, Kmer content").
  7. Adapters: Following your analysis, determine whether the read sets are likely to show indication for being contaminated with adapter sequences. (For arguments sake, we will treat the sequence as if it is contaminated with adapters, and treat it in Operation #2 with Scythe to remove these adapters. However, see the Further Considerations section on the bottom for further deliberations on how to interpret the FastQC analysis results.)
  8. Read Quality: Following your analysis, estimate a reasonable cutoff setting for the quality scores for each sequence read file. (You will be using 12 and 15 when treating the read sets with Sickle in Operation #3, below. However, see the Further Considerations section on the bottom for further deliberations on how to interpret the FastQC analysis results.)

Operation 2: Remove adapter sequences (app: Scythe-

...

0.991)

The Scythe-adapter-trimming app identifies adapter or primer sequences in your reads and removes them, using a fasta file of expected adapter/primer sequences. Ideally these expected sequences should be a comprehensive list of the primers and adapters used in preparing the sequencing library. If you aren’t sure, then a comprehensive file of a lot of different adapters and primers that might be used could be appropriate, but the settings should be adjusted to reduce the chance of randomly cutting real, organism-specific sequence. For the tutorial we use a large file of Illumina primer and adapter sequences, representing the most commonly expected contaminants from preparation of Illumina sequencing libraries. (Basic documentation: https://pods.iplantcollaborative.org/wiki/display/DEapps/Scythe-adapter-trimming)

  1. Open the Scythe-adapter-trimming app (Public Apps > NGS > QC and Processing > Scythe-adapter-trimming)0.991 app 
    1. Name your analysis.
  2. Click on the "Settings" tab.
    1. Enter the appropriate fasta-formatted “Adapter file.” (For the sample data enter "illumina_adapters.fa" at Community Data > iplant_training > read_cleanup > sampledata > illumina_adapters.fa.)
    2. As “Input file” enter a fastq-formatted sequence file. (Use one of the sample sequence files at Community Data > iplant_training > read_cleanup > sampledata.)
    3. Enter a unique name for the "Output file."
  3. Click on the "Options" tab.
    1. As "Quality format" enter "sanger." (Click on the "i" button to see the choices available. Acceptable formats for fastq files are "solexa," "illumina," or "sanger." The default setting for the app is “illumina,” but even Illumina sequences are predominantly presented in sanger format.)
    2. Enter an appropriate "Minimum match" cut-off. (For the sample data enter "10." With sequences of your own you may have to experiment to identify the optimal setting.)
    3. Enter a name for the "matches file" if you wish to monitor the matches Scythe will find. (Retaining matches provides a good way to judge the level of primer contamination of the sequences.)
    4. Enter an appropriate value for "Prior," smaller values establish stronger matching requirements. (For the sample data enter “0.005.” For your own data you may want to experiment to identify the optimal setting.)
  4. Click "Launch Analysis."
  5. Repeat the Scythe trimming procedure with the other 3 .fq sequence files.
  6. Once an analysis is completed (approx. 5 min. with the sample data), click on "Analysis," and then click on the analysis name to open the output folder.
  7. Examine the results:
    1. The file size of the matches files are with 2% - 2.5% of the .fq output files at a normal level and don't indicate major contamination. This confirms the FastQC results that suggested some contamination, but not necessarily a major issue.

...

  1. Open the Sickle-quality-based-trimming app (Public Apps > NGS > QC and Processing > Sickle-quality-based-trimming)_version_1.0 app
    1. Name your analysis.
  2. Click on the "Settings" tab.
    1. Select the appropriate sequence type. (For the sample data select "Paired.")
    2. Enter the first/only sequence reads file into “Reads 1.” (The .fq sample data file ending in "_1.fq" at Community Data > iplant_training > read_cleanup > scythe_output_from_sampledata)
    3. If your data consists of paired reads, enter the second of the paired sequence files into “Reads 2.” (The .fq sample data file ending in "_2.fq" at Community Data > iplant_training > read_cleanup > scythe_output_from_sampledata)
    4. Provide a unique "Output file 1" name.
    5. Provide a name for "Output file 2 (for pairs)."
    6. Provide a name for the "Single Read Output" file. (Single reads result from orphaned reads formed by the removal of low-quality reads from pairs.)
  3. Click on the "Options" tab.
    1. As "Quality format" enter "sanger." (Click on the "i" button to see the choices available. Acceptable formats for fastq files are "solexa," "illumina," or "sanger." The default setting for the app is “illumina,” but even Illumina sequences are predominantly presented in sanger format.)
    2. Enter an appropriate "Quality Threshold" as determined by an FastQC or equivalent assessment method. (For the sample data enter "15" for for the "fragSC" sample data and "12" for the "shrtjmpSC" sample data.)
    3. Enter an appropriate "Minimum length" for sequences to be retained. (For the sample data enter "40.")
    4. Checking the check box for "No N's" would establish more stringent conditions as it removes all reads that contain N’s in the sequence. (For the sample data check the "No N's" check box.)
  4. Click "Launch Analysis."
  5. Repeat the Sickle trimming/filtering procedure with the two "shrtjmpSC" sequence files in Community Data > iplant_training > read_cleanup > scythe_output_from_sampledata. (Set "Options" > "Quality threshold" to "12")
  6. Once an analysis is completed (approx. 5 min. with the sample data), click on "Analyses," and then click on the analysis name to open the output folder.
  7. Examine the results:
    1. Comparing the output for the fragSC and shrtSC sequence samples, respectively, it is apparent that Sickle generated some single reads from both paired read files. This indicates issues that led to the removal of sequences due to the presence of Ns. Alternatively, some sequences may have been of poor quality, requiring large portions to be trimmed, and resulting in sequences that were of less than the "Minimum length" threshold set. Unless both reads of a pair were removed, either of these operations would have generated single reads.

...