InterProScan_Results_Function

The InterProScan_Results_Function app will parse InterProScan XML result files and generate readable tables and a gene association file (GAF) file that can be used in subsequent GO enrichment analyses and functional data count files. For this app to work, InterProScan must have been run with the GO annotation and pathways annotation parameters checked (default setting).

Quick Start

  • To use InterProScan_Results_Function, import your xml output file from InterProScan. 

Test Data

Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> interproscan_results_function.

Output Tables

All tables are tab separated, with multiple values separated by a semi-colon. Tables are txt files that may be opened in text-editors or loaded into Excel.

The tables produced are:

1. *_acc_interpro_counts
This table includes input accessions, number of InterPro IDs for each accession, InterPro IDs assigned to each sequence and the InterPro ID name.
Example:
ENSGALP00000006626    1    IPR006121    DOMAIN:HeavyMe-assoc_HMA
ENSGALP00000004419    2    IPR016135;IPR017986    DOMAIN:UBQ-conjugating_enzyme/RWD;DOMAIN:WD40_repeat_dom

2. *_acc_go_counts
This table includes input accessions, the number of GO IDs assigned to each accession and GO ID names. GO IDs are split into BP (Biological Process), MF (Molecular Function) and CC (Cellular Component).
Example:
ENSGALP00000043106    1            GO:0008270    zinc ion binding
ENSGALP00000006626    2    GO:0030001    metal ion transport    GO:0046872    metal ion binding        
ENSGALP00000034620    3    GO:0042773;GO:0055114    ATP synthesis coupled electron transport;oxidation-reduction process    GO:0016651    oxidoreductase activity, acting on NAD(P)H

3. *_acc_pathway_counts
This table includes input accessions, number of pathway IDs for the accession and the patheway names. GMultiple values are separated by a semi-colon.
Example:
ENSGALP00000002985    1    Reactome: REACT_14797    Signaling by GPCR
ENSGALP00000020373    2    KEGG: 00920+2.8.1.1;MetaCyc: PWY-5350    Sulfur metabolism;Thiosulfate disproportionation III (rhodanese)

4. *_gaf
This table follows the formatting of a gene association file (gaf) and can be used in GO enrichment analyses. However the exact format that enrichment tools use varies, so please check these requirements prior to use. For more information about the gaf format please see:
http://geneontology.org/page/go-annotation-file-gaf-format-21

NOTE that this gaf file includes the GO term name - this column is deleted from standard gaf formats.

5. *_go_counts
This table counts the numbers of sequences assigned to each GO ID so that the user can quickly identify all genes assigned to a particular function.
Example:
GO:0000381    regulation of alternative mRNA splicing, via spliceosome    Biological_Process    1    ENSGALP00000001460
GO:0006421    asparaginyl-tRNA aminoacylation    Biological_Process    2    ENSGALP00000004871;ENSGALP00000027851

6. *_interpro_counts
This table counts the numbers of sequences assigned to each InterPro ID so that the user can quickly identify all genes with a particular motif.
Example:
IPR019495    FAMILY:EXOSC1    1    ENSGALP00000032597
IPR026622    FAMILY:Mxra7    2    ENSGALP00000002786;ENSGALP00000042423

7. *_pathway_counts
This table counts the numbers of sequences assigned to each Pathway ID so that the user can quickly identify all genes assigned to a pathway.
Example:
KEGG: 00232+1.17.3.2    Caffeine metabolism    1    ENSGALP00000014144
MetaCyc: PWY-6369    Inositol pyrophosphates biosynthesis    2    ENSGALP00000013649;ENSGALP00000007450


8. *.err
This file will list any sequences that were not able to be analyzed by InterProScan. Examples of sequences that will cause an error are sequences with a lrge run of Xs and sequences >10,000 aa.