The applications listed here are available for use in the Discovery Environment and are documented in: Discovery Environment Manual.

Discovery Environment Applications List

The box below searches only this space.
To search the entire iPlant wiki, enter your query in the box at the upper right.

Maintenance: Tues, 28 Jan 2020

ACCESS TO OR USAGE OF THE FOLLOWING SERVICES WILL BE UNAVAILABLE OR DISRUPTED:

Discovery Environment         8:00am to 5:00pm MST
The Discovery Environment will be unavailable while patches and updates are applied.
        ** Currently running analyses will be terminated. Please plan accordingly.

Data Store                    8:00am to 5:00pm MST
The Data Store will be unavailable during the maintenance period.
 
Data Commons                  8:00am to 5:00pm MST
The Data Commons will be unavailable during the maintenance period.
 
Atmosphere and Cloud Services 8:00am to 5:00pm MST
Marana Cloud: Atmosphere instances in the Marana Cloud will be operational; however, you will not be able to use the Data Store within your instance, and you may not be able to access the Atmosphere web interface.
 
User Portal                   8:00am to 5:00pm MST
The User Portal, http://user.cyverse.org, will be unavailable while we perform maintenance and updates.
 
Agave/Science API             8:00am to 5:00pm MST
The Agave/Science API will be unavailable during this maintenance period.
 
DNA Subway                    8:00am to 5:00pm MST
DNA Subway will be unavailable during this maintenance period.
 
The following services will NOT be affected by the maintenance: CyVerse Wiki and JIRA

Keep up to date with our maintenance schedules on the CyVerse public calendar
http://www.cyverse.org/maintenance-calendar
Check your local timezone here https://bit.ly/36iVOkX 
 
Please contact support@cyverse.org for any questions, or concerns.

 

 

 

 

 

Skip to end of metadata
Go to start of metadata

OrthoMCL v1.4

Community rating: ?????

OrthoMCL version 1.4 uses parsed BLASTp input to cluster homologs based on a Markov Clustering Algorithm.  It is part of a larger Cluster Orthologs and Paralogs and Assemble Custom Gene Sets workflow.

Notes: 

  • Please see visit Cluster Orthologs and Paralogs and Assemble Custom Gene Sets to see how this app fits into the larger workflow.
  • Note that this is a stable version of the OrthoMCL algorithm, but is not the latest version.  Version 2.0 was re-engineered for large-scale analyses involving hundreds of genomes.  Unless you plan to analyze on this scale, this Version (1.4) will likely meet your needs.  Please visit the OrthoMCL Website for more, and to see if your species of interest has already been included in the freely available OrthoMCL DB.

Quick Start

Test Data

Icon

Input test data for this app appears directly in the Discovery Environment in the Data window under
Community Data -> iplantcollaborative -> example_data -> homolog_clustering -> 4_Concatenate_Multiple_Files_output -> GG_Combined.txt

and
Community Data -> iplantcollaborative -> example_data -> homolog_clustering -> 7_parseBlastBpo_output -> parsedBLASToutput_bpo.txt

Output test data for this app appears directly in the Discovery Environment in the Data window under
Community Data -> iplantcollaborative -> example_data -> homolog_clustering -> 8_OrthoMCL_output

Input File(s)

Use GG_Combined.txt and parsedBLASToutput_bpo.txt from the directory above as test input.

Parameters Used in App

For your first attempt, default parameters are strongly encouraged unless you are an advanced user. See app interface information callouts for parameter explanations. See the OrthoMCL Web Site for pertinent publications and v1.4-specific release notes. If you wish to alter parameters to attempt to optimize results, the Inflation Index (aka -I flag) should likely be your first stop. See this page and section 7 of this page to get started.

Output File(s)

This explanation is based on the Output test data shown above in the 'Test Data' section.  The main output directory contains:

  • 'logs' directory: Contains the job submission standard output and standard error files generated by CyVerse systems.  Usually this will only be important for troubleshooting if your job does not run.
  • 'OrthoMCL_homolog_clustering_workflow_example.conf' file: Contains a record of the input files and parameters used.  Use this to verify that your run executed as you intended.  Some of this information is duplicated in files below, but is presented at this level for ease of use.
  • 'OrthoMCL_homolog_clustering_workflow_example.log' file: Contains a log record of OrthoMCL output.  This contains valuable information for you to ensure that the correct number of species and sequences went into the analysis and to see how many clusters were generated.  While some lines of the file may not be meaningful to you, it is worth your time to have a look at this file as part of examining your output. Some of this information is duplicated in files below, but is presented at this level for ease of use.
  • 'parsedBLASToutput_bpo.txt_bpo.idx': Contains OrthoMCL's index of the input BPO file.  Unless you have a specific interest, you can ignore this file.
  • 'Nov_14' directory: Contains the main program output and other accessory files needed for downstream processing in the Cluster Orthologs and Paralogs and Assemble Custom Gene Sets workflow.
    • 'all_orthomcl.out' file:  This is the main output file for the program.  Each line of the text file contains an OrthoMCL cluster, ie a set of genes detected as evolutionarily related orthologs and paralogs.  Each line begins with an arbitrary ID followed by the number of genes and number of taxa in that cluster, followed by a list of gene IDs and 2-letter taxa abbreviations chosen in the Cluster Orthologs and Paralogs and Assemble Custom Gene Sets workflow.  The number of lines in this file should match the number of clusters generated, recorded in the OrthoMCL_homolog_clustering_workflow_example.log file. For example, consider cluster number 610 from the example all_orthomcl.out file

ORTHOMCL610(4 genes,4 taxa): NC01972(NC) PF02630(PF) TA00120(TA) TG07927(TG)

This cluster, number 610, has 4 genes from 4 taxa, one from each of the species used as input.

    • 'all_orthomcl.pat' file: A pattern image file that summarizes the number of clusters, genes and taxa.  You may ignore this file for the larger workflow.
    • 'mcl' folder: Contains MCL program output.  These files are used in later steps of the larger workflow, to add unclustered sequences to OrthoMCL output if desired.
    • 'mtx' folder: Contains edge and weight output for each pairwise combination of species.  These can be ignored at this step.
    • 'orthomcl.log' file: Contains a log record of OrthoMCL output.  Can be ignored as all information here is also in OrthoMCL_homolog_clustering_workflow_example.log.
    • 'orthomcl.rbh' file: Contains reciprocal best hit data used by OrthoMCL to cluster homologs.  
    • 'orthomcl.setting' file:  Contains a summary of the inputs, outputs, and parameters used for the analysis.  There is some overlap between this file and OrthoMCL_homolog_clustering_workflow_example.conf above.

Tool Source for App:

See the Downloads section of the OrthoMCL Website.

  • No labels