The applications listed here are available for use in the Discovery Environment and are documented in: Discovery Environment Manual.

Discovery Environment Applications List

The box below searches only this space.
To search the entire iPlant wiki, enter your query in the box at the upper right.

Maintenance: Tues, 28 Jan 2020

ACCESS TO OR USAGE OF THE FOLLOWING SERVICES WILL BE UNAVAILABLE OR DISRUPTED:

Discovery Environment         8:00am to 5:00pm MST
The Discovery Environment will be unavailable while patches and updates are applied.
        ** Currently running analyses will be terminated. Please plan accordingly.

Data Store                    8:00am to 5:00pm MST
The Data Store will be unavailable during the maintenance period.
 
Data Commons                  8:00am to 5:00pm MST
The Data Commons will be unavailable during the maintenance period.
 
Atmosphere and Cloud Services 8:00am to 5:00pm MST
Marana Cloud: Atmosphere instances in the Marana Cloud will be operational; however, you will not be able to use the Data Store within your instance, and you may not be able to access the Atmosphere web interface.
 
User Portal                   8:00am to 5:00pm MST
The User Portal, http://user.cyverse.org, will be unavailable while we perform maintenance and updates.
 
Agave/Science API             8:00am to 5:00pm MST
The Agave/Science API will be unavailable during this maintenance period.
 
DNA Subway                    8:00am to 5:00pm MST
DNA Subway will be unavailable during this maintenance period.
 
The following services will NOT be affected by the maintenance: CyVerse Wiki and JIRA

Keep up to date with our maintenance schedules on the CyVerse public calendar
http://www.cyverse.org/maintenance-calendar
Check your local timezone here https://bit.ly/36iVOkX 
 
Please contact support@cyverse.org for any questions, or concerns.

 

 

 

 

 

Skip to end of metadata
Go to start of metadata

fastaRename

Community rating: ?????

A utility to replace headers in a fasta file with simple, incremental sequence IDs to (hopefully) eliminate issues with headers when the fasta file is used as input with other apps.  fastaRename is intended as a step in the Cluster Orthologs and Paralogs and Assemble Custom Gene Sets workflow, but can be used to rename fasta sequences for other uses. Sequences are renamed based on a user-defined two-letter genus species abbreviation. 3 files are produced:

1- .fasta - renamed sequences
2- .gg - new sequence names (for downstream OrthoMCL input in the Cluster Orthologs and Paralogs and Assemble Custom Gene Sets workflow.)
3- .map - maps new sequence names to original fasta headers.  This will be useful to associate sequences with original fasta headers if needed.

Notes: 

  • As part of the workflow referenced above, it is intended that an input fasta file contain sequences from a single species, presumably the entire protein-encoding gene repertoire.  This is why a two-letter abbreviation is suggested.  If using the app to rename sequences beyond the scope of this workflow,  choose an abbreviation that makes sense for your experimental design.
  • It is a good idea to keep track of the numbers of sequences and headers in your input files, and compare them to the outputs to ensure that output faithfully represents input.
  • Please visit Cluster Orthologs and Paralogs and Assemble Custom Gene Sets to see how fastaRename fits into the larger workflow.
  • flattenClusters 1.0 can be used to map renamed sequences back to original FASTA headers
  • App adapted from PERL script originally written by Chih-Horng Kuo.

Quick Start

  • To use fastaRename, import your data in fasta format, choose and input a 2-letter abbreviation, and choose the output directory.

Test Data

Icon

Input and output test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> homolog_clustering -> 2-fastaRename_input and 3-fastaRename_output

Input File(s)

Use any of the files from the input directory above for testing.  To better understand the output, run the app multiple times, with different test files, and use different two-letter abbreviations for each species.  Test input files are from 4 species: Plasmodium falciparum, Toxoplasma gondii, Neospora caninum, and Theileria annulata.    Each file represents a 'complete' protein-encoding gene repertoire for that species.  For the interested, these are Apicomplexan species, chosen for their relatively small gene repertoires.  Data are for testing purposes.  The latest data for each species can be found at EuPathDB.

Parameters Used in App

When the app is run in the Discovery Environment, use the following parameters with the above input file(s) to get the output provided in the next section below.

  • Use these parameters within the DE app interface:
    •  User-defined two-letter taxon abbreviation: PF
      • Above is a suggested example for a fasta file containing sequences from Plasmodium falciparum.

Output File(s)

Expect 3 output files, named for the two-letter abbreviation used as input.

1- .fasta - renamed sequences

2- .gg - new sequence names (for downstream OrthoMCL input in the Cluster Orthologs and Paralogs and Assemble Custom Gene Sets workflow.)

3- .map - maps new sequence names to original fasta headers.  This will be useful to associate sequences with original fasta headers if needed.

  • No labels