Please work through the documentation and add your comments to the bottom of this page, or email comments to firstname.lastname@example.org. Thank you.
What is OposSOM?
Rationale and background
High-throughput technologies such as whole genome transcriptional profiling have revolutionized molecular biology and provide an incredible amount of data. On the other hand, these techniques pose methodological challenges simply by the huge and ever-increasing amount of data they produce. Researchers need adequate tools to extract the information content of the data in an effective and intelligent way. This includes algorithmic tasks such as data compression and filtering, feature selection, linkage to functional context, and proper visualization. In particular, this latter task is important because an intuitive visualization of massive amounts of data promotes quality control, the discovery of the data's intrinsic structure, functional data mining and finally the generation of hypotheses.
Introduction and overview
OposSOM bundles a series of sophisticated analysis methods with intuitive visualization options to study high-dimensional data with the special focus on gene-centered expression data. The algorithm transforms whole genome expression pattern of genes into a SOM coordinate system, which allows intuitive visualization of transcriptional activity of each sample in terms of mosaic portraits. This approach simultaneously searches for features which are differentially expressed and correlated in their profiles in the set of samples studied.
Wirth H, Loffler M, von Bergen M, Binder H (2011). "Expression cartography of human tissues using self organizing maps" BMC Bioinformatics.
A brief overview of the oposSOM workflow:
Single gene expression values are clustered to metagenes using a self-organizing map.
Based on the identified metagenes, visualizations (e.g. expression portraits), downstream sample similarity analyses (e.g. hierarchical clustering, ICA) and functional enrichment analyses are performed.
The results are given within a separate folder and can be browsed using the summary HTML file.
- A CyVerse account. (Register for a CyVerse account at https://user.cyverse.org/.)
- An up-to-date Java-enabled web browser. (Firefox recommended. If you wish to work with your own large datasets and upload them using icommands, Chrome is not suitable due to its issues in utilizing 64-bit Java.)
Normalized RNA-seq counts file - This is the text file that contains normalized counts that are generated using edgeR/DESeq2
Sample metadata file - This is the text file that contains the information of labels and colors of your samples.
Dataset name (character): Name of the dataset. Used to name results folder and environment image (default: ”Unnamed”).
Database Biomart (character): Type of ensemble dataset queried using biomaRt interface (default: ”auto”). Use ”auto” to detect database parameters automatically.
- Database host (character): Host of ensemble
- Database dataset (character): Database Id Type (character): type of rowname identifier in biomaRt database (default: ””). Obsolete if database.dataset=”auto”
Species (Select your species from the list): Select your species from the list of species. If your species is not present in the species, we are currently on a different app where the user can provide their custom file
- Log transformation (Check if you want the app to log-transform your data before running opossom)
an OposSOM-2.0.1-biomart job in the DE (Test run)
The example data for the test run is located at: /iplant/home/shared/iplantcollaborative/example_data/OposSOM
- In the DE Apps window, search for and open OposSOM-2.0.1-biomart.
- In the Analysis Name field:
a. Change the name for your analysis (optional).
b. Enter any comments (optional).
c. In the Select output folder field, click Browse and navigate to the folder of your choice, or leave the default name of iplant/home/username/analyses.
d. To retain copies of the input files in your analysis results output folder, click this check box (not recommended).
- Click to open the Input files panel:
a. For the Normalized RNA-seq counts file, click Browse and navigate to your normalized RNA-seq counts file. For the test run use - Atha_normalized_counts.txt
b. For the Sample metadata file, click Browse and navigate to the metadata file for samples. For the test run use - Atha_label_color.txt
- Click to open the Biomart parameters panel:
Dataset name (charactera. Species (Select your species from the list): Atha
Database Biomart (character): plants_mart
Database host (character): plants.ensembl.org
Database dataset (character): athaliana_eg_gene
Database Id Type (character): ensembl_gene_idOptional parameters;
- Optional parameters:
a. Log transformation (check this box)
- Click Launch Analysis.
- After successful completion of running of the app, the following files and figures are generated from the run.
The results comprise a variety of PDF documents with plots and images addressing the input data, supplementary descriptions of the SOM generated, the metadata obtained by the SOM algorithm, the sample similarity structures and also functional annotations.