The applications listed here are available for use in the Discovery Environment and are documented in: Discovery Environment Manual.

Discovery Environment Applications List

The box below searches only this space.
To search the entire iPlant wiki, enter your query in the box at the upper right.

 

 

 

 

 

Skip to end of metadata
Go to start of metadata

Please work through the documentation and add your comments on the bottom of this page, or email comments to support@cyverse.org. Thank you.

Rationale and Background

CNVnator is a tool for Copy number variation (CNV) discovery and genotyping from depth-of-coverage by mapped reads.  CNV in the genome is a complex phenomenon, and not completely understood. CNVnator is a method for CNV discovery and genotyping from read-depth (RD) analysis of personal genome sequencing. The method is based on combining the established mean-shift approach with additional refinements (multiple-bandwidth partitioning and GC correction) to broaden the range of discovered CNVs. 

 

Icon

Some useful information about CNVnator from this blog

CNVnator can identify CNVs from a few 100 bases to megabases in length. Furthermore, the precision is good: 200 bp for 90% of the breakpoints in a test case studied in the CNVnator paper (using a bin size of 100 bp). The higher the coverage you have, the smaller the bin size you can use, which will give you greater precision. They recommend to use ~100-bp bins for 20-30x coverage, ~500-bp bins for 4-6x coverage, and ~30-bp bins for 100x coverage. However, they say that the bin size used shouldn't be shorter than the read length in your data

 

Mandatory arguments

  • Input(s)
    • Custom Reference genome or Reference genome from DE: The user has to select one of this option, otherwise the app will fail
    • Bam files: Make sure the bam files are the same files that have been generated by mapping  to the above selected reference genome
    • Chromosome id or Chromosome ids from file: Chromosome names must be specified the same way as they are described in bam header, e.g., chrX or X. The user can simply specify a single chromosome id. For example 10 or upload a file that contains multiple chromosome id's one line per chromosome id. The user has to select one of this option, otherwise the app will fail.
  • Parameters(s)
    • Histogram bin size: The bin size (window size) for generating histogram for all the windows in your genome assembly. For example 100
    • Stat bin size: The bin size (window size) for calculating statistical significance (p-values) for the windows that have unusual read depth. For example 100
    • Partition bin size: The bin size (window size) for partitioning the chromosomes/scaffolds into long regions (each one of which could be longer than the window size) that have similar read depth, and so presumably similar copy number. For example 100
    • Call bin size: The bin size (window size) for calling CNV's. For example 100
    • Prefix: The prefix that will be added to the vcf file column when converting cnvantor to vcf file
  • Output
    • The name of the output file: For example result

Test Run using a single chromosome id

All files are located in the Community Data directory of the CyVerse Discovery Environment at the following path:

Community Data > iplantcollaborative > example_data > cnvnator (/iplant/home/shared/iplantcollaborative/example_data/cnvnator) 

Mandatory arguments

  • Input(s)
    • Custom Reference genome: Sorghum_bicolor.Sorbi1.20.dna.toplevel.fa
    • Bam files: IS20351_DS_1_1.sorted.bam and IS20351_DS_2_1.sorted.bam
    • Chromosome id: 10
  • Parameters(s)
    • Histogram bin size: 100
    • Stat bin size: 100
    • Partition bin size: 100
    • Call bin size: 100
    • Prefix: test
  • Output
    • The name of the output file: result

 

Test Run using a chromosome id file

All files are located in the Community Data directory of the CyVerse Discovery Environment at the following path:

Community Data > iplantcollaborative > example_data > cnvnator (/iplant/home/shared/iplantcollaborative/example_data/cnvnator) 

Mandatory arguments

  • Input(s)
    • Custom Reference genome: Sorghum_bicolor.Sorbi1.20.dna.toplevel.fa
    • Bam files: IS20351_DS_1_1.sorted.bam and IS20351_DS_2_1.sorted.bam
    • Chromosome id: chr_list.txt
  • Parameters(s)
    • Histogram bin size: 100
    • Stat bin size: 100
    • Partition bin size: 100
    • Call bin size: 100
    • Prefix: test

  • Output
    • The name of the output file: result

Output files generated

  1. cnvnator.root: Output ROOT file. Binary file and so donot try to open it.
  2. result.cnvnator: The final output from CNVnator.
  3. result.vcf: The final output from CNVnator in vcf format

According to the CNVnator README file, the columns of the output file are:

Icon
CNV_type coordinates CNV_size normalized_RD e-val1 e-val2 e-val3 e-val4 q0

normalized_RD -- normalized to 1.
e-val1        -- is calculated using t-test statistics.
e-val2        -- is from the probability of RD values within the region to be in
the tails of a gaussian distribution describing frequencies of RD values in bins.
e-val3        -- same as e-val1 but for the middle of CNV
e-val4        -- same as e-val2 but for the middle of CNV
q0            -- fraction of reads mapped with q0 quality
Icon

Mor information about CNVnator can be found here - https://github.com/abyzovlab/CNVnator

  • No labels