The DE Quick Start tutorial provides an introduction to basic DE functionality and navigation.
Rationale and background:
BWA: Fast and accurate short read alignment with Burrows-Wheeler Transform
Li H. and Durbin R.
Bioinformatics 2009; 25:1754-60. [PMID: 19451168]
BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It first needs to construct the FM-index for the reference genome (the index command) and then invoked with different sub-commands for alignment algorithms, BWA-backtrack, BWA-SW, and BWA-MEM. BWA-MEM is the latest algorithm and generally recommended for high-quality queries as it is faster and more accurate. The algorithm supports both single (SR) and paired-end (PE) reads and performs chimeric alignment. It is applicable to a wide range of query sequences, 70bp-1Mbp, and has better performance than BWA-backtrack for 70-100bp Illumina reads.
This AGAVE/DE app wraps bwa-index and bwa-mem modules of BWA for ChIP-Seq workflow but not limited to. It takes fastq files as inputs and produces alignments in SAM/BAM format.
- A CyVerse account. (Register for an CyVerse account here - user.cyverse.org)
- Mandatory arguments
- Sequences folder for protein of interest (Note: the files could be in FASTA or FASTQ format but should be named including reads end information for PE reads, e.g., test_R1.fq and test_R2.fq)
- Sequences folder for background control (Same as b)
- Reference genome sequence in FASTA format
- Read type: SR vs PE
- Optional arguments:
- Minimum score: Don’t output alignments with score lower than INT
- Type of sequencing reads: Illumina, PacBio, Oxford Nanopore, Intra-species contains to ref
- Sort method for BAM: Sort alignments by leftmost coordinates, or by read name
- Mark shorter split: Mark shorter split hits as secondary (for Picard compatibility)
- Sam output: keep or purge the alignments in SAM
The following test data are provided for testing BWA-index-mem here /iplant/home/xiaofei_iplant/Sorghum_chr8/chr8_test:
G3_P_K4me3_rep1_chr8_R1.fq and G3_P_K4me3_rep1_chr8_R1.fq
G3_P_K4me3_rep2_chr8_R1.fq and G3_P_K4me3_rep2_chr8_R1.fq
G3_P_H3_rep1_chr8_R1.fq and G3_P_H3_rep1_chr8_R2.fq
G3_P_H3_rep2_chr8_R1.fq and G3_P_H3_rep2_chr8_R2.fq
Successful execution of the BWA-index-mem assessment pipeline will create a directory named out for each sample. The directory will contain SAM/BAM files for both samples of protein of interest and background input, which can be further processed for downstream analysis and visualization.
*SAM folders are optional to keep or not.