Metadata Mapper

Metadata Mapper Concept, September, 2010

Bernice Rogowitz

A common thread that has emerged from our discussions with scientists in the G2P and iPToL projects is the need for tools that allow biologists to combine different types of data. Scientists should be able to adorn the phylogenetic tree with images, gene expression and trait data, and experiments testing phenotypic hypotheses should combine genetic, metabolomic, and physiological data. There are many instances of tools that let the scientist map data from one domain onto data in another. In the phylogenetic tree area, the Mesquite system allows the user to paint trait and genetic data onto the edges of the tree and onto glyphs at the nodes. The Ondex tool allows the G2P scientist to map genetic and metabolomic data onto network diagrams. The eFP browser allows the scientist to select variables and color morphological regions in Arabidopsis accordingly. And, the Visualization and Visual Analysis (ViVA) workbench lets the user identify a region in one visualization and have that coloring reflected in corresponding regions in other representations of the same data.

Currently, however, these packages are all separate, and the interactive mapping of values from one domain to another is accomplished within each separate package. There is no system that lets components in one system interplay with components in another. For example, there is currently no way to select a set of traits in Mesquite or ViVA, for example, and map them onto clades in the iPToL tree of life viewer.

In iPLANT, the architecture supports independent components which the user can assemble to achieve his or her analysis needs. Several working groups have identified the need to create visualizations where metadata from one component or visualization could be mapped onto another. To do so requires creating communication between components. The goal of the Metadata Mapper is to allow scientists to map metadata from various sources, including other visualizations, onto any of the visualizations in the environment.

Another important feature to consider is giving the scientist control over how the metadata mapping should be done. For example, in a gene array experiment, certain genes may be identified as having high expression under certain conditions, for certain species. The user may want to simply color all species names that have this characteristic, or may want to color the clades proportionally to the percentage of responding species, or may want to create a glyph that encodes this information. Or, the user may want to select which range of a variable they want to map, or how many different variables to map at the same time. For example, in a protein-protein interaction graph, they might like to show one variable mapped onto the size of a glyph, another onto its color, and a third onto a color scale representing the magnitude of the variable.

Looking forward, it will also be important to allow the user to connect to new visual representations, such as a 3-D model of a cell or a geographic map of a region. It would be desirable to build the iPLANT system in such a way that new modules could participate in this metadata mapping functionality in a way that minimizes the need for custom code. The goal, thus, is to create a mechanism for handing the mapping of metadata from one module to the other in a way that minimizes the burden on individual components.

We propose creating a web service that provides metadata mapping, so that metadata from any module or source can be output to any module, and which provides all the controls the scientist needs to select and control the metadata.

Here is a powerpoint showing an initial design that embodies this concept: https://pods.iplantcollaborative.org/wiki/download/attachments/4526446/Metadata+Mapper.ppt