Community rating: ?????
A utility to generate custom homolog sets based on the copy number of genes for each species. Search criteria are input in a text file created by the User.
- Please visit Cluster Orthologs and Paralogs and Assemble Custom Gene Sets to see how this app fits into the larger workflow. Based on this workflow, this app can be run directly on OrthoMCL v1.4 output, or on the output of the appendUnclustered app. Both input options are shown below and are referred to respectively as 'without unclustered added' and 'with unclustered added' throughout.
- App adapted from PERL script originally written by Chih-Horng Kuo
- To use clusterReport you will need the orthomcl.index, orthomcl.mclout files produced either by theOrthoMCL v1.4 or appendUnclustered app, the GG file created as pat of the Cluster Orthologs and Paralogs and Assemble Custom Gene Sets workflow, and a User-created text file of search criteria described below.
Use Community Data -> iplantcollaborative -> example_data -> homolog_clustering -> 4_Concatenate_Multiple_Files_output -> GG_Combined.txt
Community Data -> iplantcollaborative -> example_data -> homolog_clustering -> 11_queryOrthoMCL_input -> minMax.txt
without unclustered added: Community Data -> iplantcollaborative -> example_data -> homolog_clustering -> 8_OrthoMCL_output -> Nov_14 -> mcl -> orthomcl.index and orthomcl.mclout
with unclustered added: Community Data -> iplantcollaborative -> example_data -> homolog_clustering -> 9_appendUnclustered_output -> orthomcl.index and orthomcl.mclout
Parameters Used in App
There are no parameters for this app.
Expect 2 output files:
- A log file with a summary of the search criteria
- A '.group' file with the clusters that meet the search criteria, one per line. Each line follows the format: ortholog cluster#(#species in cluster:#sequences in cluster, comma delimited list of the # of sequences per species) tab delimited list of protein-encoding gene ids (each followed by 2-letter abbreviation). For example, consider the file Community Data -> iplantcollaborative -> example_data -> homolog_clustering -> 12_queryOrthoMCL_output -> Query.group Line 3 shows OrthoMCL cluster ID 67. This cluster contains sequences from 4 species and 8 sequences, 2 er species. There are 8 sequence IDs:
67(4:8,NC:2,PF:2,TA:2,TG:2) NC03984(NC) NC04882(NC) PF03069(PF) PF04899(PF) TA00779(TA) TA01773(TA) TG00643(TG) TG01355(TG)