The applications listed here are available for use in the Discovery Environment and are documented in: Discovery Environment Manual.

Discovery Environment Applications List

The box below searches only this space.
To search the entire iPlant wiki, enter your query in the box at the upper right.

 

 

 

 

 

Skip to end of metadata
Go to start of metadata

Please work through the documentation and add your comments on the bottom of this page, or email comments to support@cyverse.org. Thank you.

What is Medusa?

A draft genome scaffolder that uses multiple reference genomes in a graph-based approach.

Reference:

E Bosi, B Donati, M Galardini, S Brunetti, MF Sagot, P Lió, P Crescenzi, R Fani, and M Fondi. MeDuSa: a multi-draft based scaffolder. Bioinformatics (2015): btv171.

Input and Output

The following inputs are required:

Mandatory

  1. Target genome file: A draft genome in fasta format. This is the genome you are interested in scaffolding.
  2. Comparison drafts folder: An arbitrary long list of auxiliaryDraft files: other draft genomes in fasta format. The closest these organisms are related to the target, the better the results will be. These files are expected to be collected in a specific directory.
  3. Scripts folderA sub-folder with python scripts needed to run the program (medusa_scripts)

Optional

  1. Output fasta file: Name of the output file (Default is output.fa)
  2. Number of cleaning rounds: This option allows the user to run a given number of cleaning rounds and keep the best solution. Since the variability is small, 5 rounds are usually sufficient to find the best score (Default is 5)
  3.  

    N50 stat of fasta file: This option allows the calculation of the N50 statistic on a FASTA file. All the other options will be ignored if you chose this option

  4. Sequence similarity based weighting scheme: This option allows for a sequence similarity based weighting scheme. Using a different weighting scheme may lead to better results

  5.  Estimation of the distance between pairs of contigs based on the reference genome: This option allows for the estimation of the distance between pairs of contigs based on the reference genome(s): in this case the scaffolded contigs will be separated by a number of N characters equal to this estimate. The estimated distances are also saved in the "*_distanceTable" file. By default the scaffolded contigs are separated by 100 Ns

  6. gexf format of the contig network: he gexf format of the contig network and the path cover are provided

 

The following output files will be produced.

  1. targetGenome_SUMMARY: a textual file containing information about your data. Number of scaffolds, N50 value etc..
  2. targetGenomeScaffold.fasta: a fasta file with the sequences grouped in scaffolds. Contigs in the same scaffolds are separated by 100 Ns by default, or a variable number of Ns (estimate of the distance between the contigs), if the option "-d" is used.

The following output files can optionally be produced.

  1. targetGenome_distanceTable: a tabular file with the estimation of the distance between successive contigs (bp).
  2. targetGenome_network.gexf: the contig network in gexf format.
  3. targetGenome_cover.gexf: the final path cover in gexf format.

Test Run

All files are located in the Community Data directory of the CyVerse Discovery Environment at the following path:

Community Data > iplantcollaborative > example_data > medusa  (/iplant/home/shared/iplantcollaborative/example_data/medusa)

Step 1 After logging into Discovery Environment, click on the app window and in the search box, enter medusa

Step 2 Click on the medusa-1.6 app and enter "Medusa-1.6_analysis1_test_run" under the Analysis Name

Mandatory Inputs: 

  1. Use Rhodobacter_target.fna for Target genome file
  2. Use reference_genomes for Comparison drafts folder
  3. Use medusa_scripts for Medusa scripts folder. Note: The medusa_scripts folder is available at /iplant/home/shared/iplantcollaborative/example_data/medusa/medusa_script location

Leave the optional arguments as they are.

outputs:

Two output files will be generated.

  1. output.fa
  2. Rhodobacter_target.fna_SUMMARY


Parallel execution of Medusa

If you have a lot of target files using the same set of parameters for running Medusa-1.6, then you can use "HT Analysis Path List file" option in DE. Here are the detailed steps

Step1: Create a "New HT analysis path list file"

Step 2: Drag and drop the target files into the newly created HT analysis path list file

Step 3: Save the newly created HT analysis path list file as medusa_ht_path (You can named it whatever you want)


Step 4 Click on the medusa-1.6 app and enter "Medusa-1.6_analysis1_ht_path" under the Analysis Name


Mandatory Inputs: 

  1. Use medusa_ht_path for Target genome file
  2. Use reference_genomes for Comparison drafts folder
  3. Use medusa_scripts for Medusa scripts folder. Note: The medusa_scripts folder is available at /iplant/home/shared/iplantcollaborative/example_data/medusa/medusa_script location

Leave the optional arguments as they are. 

 

Once the app is launched and completes running, then you can find different analysis folders corresponding to the number of target files. Since in this case, three target files has been used in the ht path list file, you will find 3 analysis outputs 



Tool Source for App