TEMP for transposable elements detection

Rationale and background:


TEMP is a software package can detect Transposable Elements insertion and absence, pinpoint their junctions with genomic DNA at base pair resolution and estimate their frequencies in the population. TEMP insertion and absence algorithms are available in CyVerse Discovery Environment as two different applications: 

  1. TEMP-insertions- for TE insertion analysis
  2. TEMP-absence- for TE absence analysis.

TEMP: a computational method for analyzing transposable element polymorphism in populations. Jiali Zhuang, Jie Wang, William Theurkauf, Zhiping WengNucleic Acids Research, Volume 42, Issue 11, 17 June 2014, Pages 6826–6838, https://doi.org/10.1093/nar/gku323 


Pre-Requisites

  1. A CyVerse account. (Register for a CyVerse account here - https://user.cyverse.org/register)
  2. Mandatory arguments for TEMP-insertions
    1. Input file in bam format.
    2. Transposon consensus sequence fasta format
    3. Annotated transposon positions in the genome
    4. Number of mismatches allowed when mapping to TE concensus sequences
    5. An integer specifying the length of the fragments
  3. Mandatory arguments for TEMP-absence
    1. Input file in bam format
    2. Annotated transposon positions in the genome (e.g., RepeakMasker) in bed6 format with full path
    3. 2bit file for the reference genome
    4. An integer specifying the length of the fragments (inserts) of the library

Refer to TEMP manual pages for more details- https://github.com/JialiUMassWengLab/TEMP/blob/master/Manual

Test with sample data

Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> temp. This test data is a simulated set generated using Drosophila Melanogaster Chromosome 2L as the template. Please check TEMP github manual for more details about this dataset https://github.com/JialiUMassWengLab/TEMP/blob/master/Manual

  1. Input BAM file  - test_chromosome.sorted.bam
  2. Transposon consensus sequence - test_concensus.fa
  3. Annotated transposon positions in the genome - test_TE_annotation.bed
  4. 2bit file for the reference genomedm3_chr2L.2bit

Output

  1. For TE insertion analysis, the summay output file has the suffix: .insertion.refined.bp.summary.
  2. For TE absence analysis, the summay output file has the suffix: .absence.refined.bp.summary.