Schema Overview

This schema for storing information related to gene tree reconciliation keeps data related to species trees separate from data related to gene trees. This makes minimal changes to the existing tables in Ensemble Compara and adds information for species trees, reconciliations, and a controlled vocabulary.

Advantages of this schema:

Minimizes changes to the existing Ensemble/Compara schema
Follows weak typing paradigm, attribute tables can hold any type of specialized information
Makes use of controlled vocabularies and ontologies store data types
Use Chaod tables for controlled vocabularies(alternatively could use Chado cv module - http://gmod.org/wiki/Chado_CV_Module) allows for import of CV terms in OBO format using existing tools

Disadvantages of this schema:

Not the best schema for representing uncertainty in tree topology
Uncertainty in root not easily supported, The E/C schema assumes a gene tree with a single root, other roots would require a replication of the tree topology
This requires that ontologies exist and are loaded into the database before attribute values can be stored

The tables in this database fall into a few major groups:

Protein Tree Tables - Generally these tables already exist in the E/C schema
Species Tree Tables - Tables related to the information in the species tree
Reconciliation Tables - Tables related to the mapping of the nodes of the gene tree to the nodes and edges of the species tree
Controlled Vocabulary Tables - Tables storing information for ontologies and other controlled vocabularies

Protein Tables

Unknown macro: {mockup}

member

Information related to the an individual locus

Name	Type	Description
member_id	INTEGER(8)	Primary key
stable_id	VARCHAR(64)	Stable identifier for the locus such as the gene name or gene identifier.
version	INTEGER(8)	Version of the stable ID (see EnsEMBL core DB)
source_name	CHAR(17)	The source of the member.
taxon_id	INTEGER(8)	Source taxon the protein is derived from. Reference to ncbi_taxa_node.taxon_id
genome_db_id	INTEGER(8)	Reference to geneome_db.genome_db_id
sequence_id	INTEGER(8)	Identifier for the sequence used to represent this protein. Reference to sequence.sequence_id.
gene_member_id	INTEGER(8)	Reference to member.member_id to allow linkage from peptides to genes.
description	TEXT	The description of the gene/protein as described in the core database or from the Uniprot entry.
chr_name	CHAR(40)	The name of the chromosome this sequence maps to. Many assemblies use letters or other strings to name their chromosomes, so an integer would not work here.
chr_start	INTEGER(8)	The start of the location of the member on the chromosome.
chr_end	INTEGER(8)	The end of the location of the member on the chromosome.
chr_strand	TINYINT(1)	Strand of the chromosome.
display_label	VARCHAR(128)	This information not described in the E/C schema description.

member_attribute

This table allows for additional information to be stored in the member attribute table using the controlled vocabulary terms in the database.

Name	Type	Description
member_attribute_id	INTEGER(10)	Primary key
member_id	INTEGER(8)	Foreign key to member table
cvterm_id	INTEGER(10)	The id for term using the controlled vocabulary tables
value	TEXT	The value for the specific term_id
rank	SMALLINT(3)	Default is 0, This allows for lists of data for a given term id
source_id	INTEGER(10)

sequence

The protein sequence.

Name	Type	Description
sequence_id	INTEGER(10)	Primary key
length	INTEGER(10)	The length of the protein sequence
sequence	LONGTEXT	The protein sequence.

Protein Family Tables

Unknown macro: {mockup}

family

The protein family. Contains all the group homologies found. There are several family_member entries for each family entry.

Name	Type	Description
family_id	INTEGER(10)	Primary key.
stable_id	VARCHAR(40)	stable family ID
version	INTEGER(10)	version of the stable_id (may only change when members migrate between this family and another one; stays the same otherwise)
method_link_species_set_id	INTEGER(10)	external reference to method_link_species_set.method_link_species_set_id
description	VARCHAR(255)	description of the family as found using the Longest Common String (LCS) of the descriptions of the member proteins.
description_score	DOUBLE	Scores the accuracy of the annotation (max. 100)

family_attribute

Attributes of the protein family.

Name	Type	Description
family_attribute_id	INTEGER(10)	Primary Key, Unique Identifier
family_id	INTEGER(10)
cvterm_id	INTEGER(10)	The id for term using the controlled vocabulary tables
value	VARHCHAR(255)	The value for the specific term_id
rank	SMALLINT(3)	Default is 0, This allows for lists of data for a given term id
source_id	INTEGER(10)

family_member

Contains the proteins corresponding to protein family relationship found. There are several family_member entries for each family entry.

Name	Type	Description
family_id	INTEGER(10)	external reference to family.family_id
member_id	INTEGER(10)	external reference to member.member_id
cigar_line	MEDIUMTEXT	internal description of the multiple alignment (see homology_member table)

Protein Tree Tables

These are mostly relevant tables from the Ensemble Compara schema.

Unknown macro: {mockup}

protein_tree

Information related to an entire protein tree.

Name	Type	Description
protein_tree_id	INTEGER(10)	Unique identifier for the protein tree
family_id	INTEGER(10)
root_node_id	INTEGER(10)	ID of the root node for the tree.

protein_tree_attribute

This is a table to store information about the protein tree.

Name	Type	Description
protein_tree_attribute_id	INTEGER(10)	Primary Key, Unique Identifier
protein_tree_id	INTEGER(10)
cvterm_id	INTEGER(10)	The id for term using the controlled vocabulary tables
value	VARHCHAR(255)	The value for the specific term_id
rank	SMALLINT(3)	Default is 0, This allows for lists of data for a given term id
source_id	INTEGER(10)

protein_tree_member

The information for the individual proteins in the tree. I believe this holds the alignment information for the tree in the cigar lines

Name	Type	Description
node_id	INTEGER(10)	The node_id in protein tree this refers to
root_id	INTEGER(8)	The root_id of the tree this refers to. I think that this is the way that E/C refers to the set of nodes the comprise a tree
member_id	INTEGER(8)	Foreign key, The protein that is being represented
method_link_species_set_id	INTEGER(8)	External reference to method_link_species_set.method_link_species_set_id
cigar_line	TEXT	The alignment in CIGAR format
cigar_start	INTEGER(8)	The start of the alignment
cigar_end	INTEGER(8)	The end of the alignment

protein_tree_node

This table represents the topology of the gene tree.

Name	Type	Description
node_id	INTEGER(8)	Unique identifier for nodes in the protein tree
protein_tree_id	INTEGER(10)	This is an addition to the existing E/C schema. This is a unique identifier for the protein tree that the nodes are a member of.
parent_id	INTEGER(8)	This is the parent node for the node representend by node_id
root_id	INTEGER(8)	The id for the root node in the protein tree. By my reading of the E/C schema, this is the only way to refer to the gene tree as a group
clusterset_id	INTEGER(8)	This is the set of clusters (the gene family) represented by the gene tree
left_index	INTEGER(8)	Left indexing value, this can be used with right_index to select the child nodes of a given internal node in the gene tree.
right_index	INTEGER(8)	Right indexing value, this can be used with left_index to select the child nodes of a given internal node in the gene tree.
distance_to_parent	DOUBLE	This is essentially the edge length. This is the way that the data are currently represented in E/C, but it would be possible to make this information available as a protein_tree_node attribute value

protein_tree_node_attribute

This is a table to store information about the individual nodes in the protein tree.

Name	Type	Description
protein_node_attribute_id	INTEGER(10)	Primary Key, Unique Identifier
node_id	INTEGER(8)	The identifier of the node the attribute refers to.
cvterm_id	INTEGER(10)	The id for term using the controlled vocabulary tables
value	VARHCHAR(255)	The value for the specific term_id
rank	SMALLINT(3)	Default is 0, This allows for lists of data for a given term id
source_id	INTEGER(10)

protein_tree_node_path

This table holds information or transitive closure queries, ie the reachability and distance from an ancestral node to all of its children nodes. This table will be used for least common ancestor queries, and to select a subtree from a larger species tree.

Name	Type	Description
protein_tree_node_path_id	INTEGER(10)	Primary key.
parent_node_id	INTEGER(10)	The parent node being referenced
child_node_id	INTEGER(10)	The child node being referenced (not necessatily the direct descendent, but a child node
path	TEXT	This may be considered optional. This would store the node path from the parent to the child
distance	INTEGER(10)	The distance in number of edges from the parent node to the child node. This data can be used to minimize the distance for Least Common Ancestor (LCA) queries.

Species Tree Tables

These are the tables related to the species tree data.

Unknown macro: {mockup}

species_tree

Name	Type	Description
species_tree_id	INTEGER(10)	Primary key
species_tree_name	TEXT	The name of the species tree.
root_node_id	INTEGER(10)	The root node of the species tree. The current version of the TR database required rooted trees.
version	SMALLINT(3)	I am assuming that there will be different versions of species trees. The version information will make sure that the reconciled data match to the species tree being referred to here. This can be null

Example of data in this table:

mysql> select * from species_tree LIMIT 1;
+-----------------+-------------------+--------------+---------+
| species_tree_id | species_tree_name | root_node_id | version |
+-----------------+-------------------+--------------+---------+
|               1 | bowers_rosids     |            1 |    NULL |
+-----------------+-------------------+--------------+---------+
1 row in set (0.00 sec)

species_tree_attribute

Name	Type	Description
species_tree_attribute_id	INTEGER(10)	Primary key
species_tree_id	INTEGER(10)	Foreign Key, The species tree being referenced
term_id	INTEGER(10)	The id for term using the controlled vocabulary tables
value	VARHCHAR(255)	The value for the specific term_id
rank	SMALLINT(3)	Default is 0, This allows for lists of data for a given term id

species_tree_node

Name	Type	Description
species_tree_node_id	INTEGER(10)	Primary Key
species_tree_id	INTEGER(10)	The species tree this node is a member of
parent_id	INTEGER(10)	The id of the parent node to this species_tree_node
left_index	INTEGER(10)	A left index value used to select clades
right_index	INTEGER(10)	A right index value used to select clades

Example of species tree nodes from the bowers_rosids tree:

mysql> SELECT * FROM species_tree_node WHERE species_tree_id = 1;
+----------------------+-----------------+-----------+-------------+------------+-------------+
| species_tree_node_id | species_tree_id | parent_id | label       | left_index | right_index |
+----------------------+-----------------+-----------+-------------+------------+-------------+
|                    1 |               1 |         0 | NULL        |          1 |          22 |
|                    2 |               1 |         1 | grape       |          2 |           3 |
|                    3 |               1 |         1 | NULL        |          4 |          21 |
|                    4 |               1 |         3 | NULL        |          5 |          14 |
|                    5 |               1 |         4 | poplar      |          6 |           7 |
|                    6 |               1 |         4 | NULL        |          8 |          13 |
|                    7 |               1 |         6 | cucumber    |          9 |          10 |
|                    8 |               1 |         6 | soybean     |         11 |          12 |
|                    9 |               1 |         3 | NULL        |         15 |          20 |
|                   10 |               1 |         9 | Arabidopsis |         16 |          17 |
|                   11 |               1 |         9 | papaya      |         18 |          19 |
+----------------------+-----------------+-----------+-------------+------------+-------------+
11 rows in set (0.03 sec)

species_tree_node_attribute

Name	Type	Description
species_tree_node_attribute_id	INTEGER(10)	Primary key
species_tree_node	INTEGER(10)	Foreign Key, The species tree node being referenced
term_id	INTEGER(10)	The id for term using the controlled vocabulary tables
value	VARHCHAR(255)	The value for the specific term_id
rank	SMALLINT(3)	Default is 0, This allows for lists of data for a given term id

species_tree_node_path

This table holds information or transitive closure queries, ie the reachability and distance from an ancestral node to all of its children nodes. This table will be used for least common ancestor queries, and to select a subtree from a larger species tree.

Name	Type	Description
parent_node_id	INTEGER(10)	The parent node being referenced
child_node_id	INTEGER(10)	The child node being referenced (not necessatily the direct descendent, but a child node
path	VARCHAR(255)	This may be considered optional. This would store the node path from the parent to the child
distance	INTEGER(10)	The distance in number of edges from the parent node to the child node. This data can be used to minimize the distance for Least Common Ancestor (LCA) queries.

Reconciliation Tables

The purpose of the reconciliation tables is to map the nodes of a gene tree to the nodes and edges of a species tree.

Unknown macro: {mockup}

reconciliation

This table describes the gene_tree and species tree that are being reconciled. This also allows for a subset of the species to be selected from the species tree with the species_set column.

Name	Type	Description
reconciliation_id	INTEGER(10)	Primary Key
species_tree_id	INTEGER(10)	The species tree that is serving as the host for the mapping of the protein tree
species_set_id	INTEGER(10)	The species from the large species tree that are being used in the reconciliation, this would probably best be represented as the list of nodes in the species tree that are the leaf nodes being represented
protein_tree_id	INTEGER(10)	The protein tree that is being mapped onto the species tree

reconciliation_attribute

This table contains information related to the overall mapping of the reconciliation itself.

Name	Type	Description
reconciliation_attribute_id	INTEGER(10)	Primary Key
reconciliation_id	INTEGER(10)	The id for the specific reconciliation being referenced
term_id	INTEGER(10)	The id for term using the controlled vocabulary tables
value	VARHCHAR(255)	The value for the specific term_id
rank	SMALLINT(3)	Default is 0, This allows for lists of data for a given term id

reconciliation_node

The mapping of an individual node from a gene tree to a node or edge in the species tree.

Name	Type	Description
reconciliation_node_id	INTEGER(10)	Primary Key, unique identifier for each row in the table.
reconciliation_id	INTEGER(10)	Foreign Key, The individual reconciliation the node maps is a member of
node_id	INTEGER(10)	Foreign Key, The node_id in the gene tree that this reconciliation mapping refers to.
host_parent_node_id	INTEGER(10)	The parent node in the species tree, this will be an ID from the species_tree_node_table
host_child_node_id	INTEGER(10)	The child node in the species tree, tis will be an ID from the species_tree_node table. NOTE: It is not required that the parent_node of this child be the parent node of the host_parent_node ID in the species tree.
is_on_node	BOOLEAN	TRUE if the node on the gene tree is located on a node in the species tree.

reconciliation_node_attribute

An attribute related to the node mapping of a node in the gene tree to a node in the species tree.

Name	Type	Description
reconciliation_node_attribute_id	INTEGER(10)	Primary Key
reconciliation_node_id	INTEGER(10)	Foreign Key, The reconciled node the attribute refers to
term_id	INTEGER(10)	The id for term using the controlled vocabulary tables
value	VARHCHAR(255)	The value for the specific term_id
rank	SMALLINT(3)	Default is 0, This allows for lists of data for a given term id

Controlled Vocabulary Tables

The controlled vocabulary tables incorporated into this schema were taken from the Chado CV module and converted from PostgreSQL syntax to MySQL tables. The following documentation is taken from the GMOD description of these tables.

Unknown macro: {mockup}

cv

The controlled vocabulary groups. Individual controlled vocabulary terms belong to individual controlled vocabularies as indicated by the cv_id in the cvterm table.

Name	Type	Description
cv_id	INTEGER(10)
name	VARCHAR(255)
definition	TEXT

For example, the ontologies loaded in the current implementation of the tree reconciliation database:

mysql> SELECT * FROM cv;
+-------+------------------------------------------+------------+
| cv_id | name                                     | definition |
+-------+------------------------------------------+------------+
|     1 | relationship                             |            |
|     2 | synonym_type                             |            |
|     3 | cvterm_property_type                     |            |
|     4 | anonymous                                |            |
|     5 | homology_and_related_concepts_in_biology |            |
|     6 | nhx_tag                                  |            |
|     7 | prime_tag                                |            |
|     8 | phyloXML_element                         |            |
|     9 | biological_process                       |            |
|    10 | molecular_function                       |            |
|    11 | cellular_component                       |            |
|    12 | gene_ontology                            |            |
+-------+------------------------------------------+------------+
12 rows in set (0.11 sec)

cvterm

A term, class, universal or type within an ontology or controlled vocabulary. This table is also used for relations and properties. cvterms constitute nodes in the graph defined by the collection of cvterms and cvterm_relationships.

Name	Type	Description
cvterm_id	INTEGER(10)	Primary Key
cv_id	INTEGER(10)	The cv or ontology or namespace to which this cvterm belongs.
name	VARCHAR(950)	A concise human-readable name or label for the cvterm. Uniquely identifies a cvterm within a cv.
definition	TEXT	A human-readable text definition.
dbxref_id	INTEGER(10)	Primary identifier dbxref - The unique global OBO identifier for this cvterm. Note that a cvterm may have multiple secondary dbxrefs - see also table: cvterm_dbxref.
is_obsolete	TINYINT(1)	Boolean 0=false,1=true; see GO documentation for details of obsoletion. Note that two terms with different primary dbxrefs may exist if one is obsolete.
is_relationshiptype	TINYINT(1)	Boolean 0=false,1=true relations or relationship types (also known as Typedefs in OBO format, or as properties or slots) form a cv/ontology in themselves. We use this flag to indicate whether this cvterm is an actual term/class/universal or a relation. Relations may be drawn from the OBO Relations ontology, but are not exclusively drawn from there.

Note that relationship types among cvterms are also stored as cvterms:

mysql> select * from cvterm where cv_id = 1;
+-----------+-------+-------------------+------------+-----------+-------------+---------------------+
| cvterm_id | cv_id | name              | definition | dbxref_id | is_obsolete | is_relationshiptype |
+-----------+-------+-------------------+------------+-----------+-------------+---------------------+
|         3 |     1 | is_a              |            |         3 |           0 |                   1 |
|         8 |     1 | part_of           |            |        10 |           0 |                   1 |
|         9 |     1 | has_part          | NULL       |        12 |           0 |                   1 |
|        10 |     1 | integral_part_of  |            |        14 |           0 |                   1 |
|        11 |     1 | has_integral_part | NULL       |        16 |           0 |                   1 |
|        12 |     1 | proper_part_of    |            |        18 |           0 |                   1 |
|        13 |     1 | has_proper_part   | NULL       |        20 |           0 |                   1 |
|        14 |     1 | located_in        |            |        22 |           0 |                   1 |
|        15 |     1 | location_of       | NULL       |        24 |           0 |                   1 |
|        16 |     1 | contained_in      |            |        26 |           0 |                   1 |
|        17 |     1 | contains          | NULL       |        28 |           0 |                   1 |
|        18 |     1 | adjacent_to       |            |        30 |           0 |                   1 |
|        19 |     1 | transformation_of |            |        32 |           0 |                   1 |
|        20 |     1 | transformed_into  | NULL       |        34 |           1 |                   1 |
|        21 |     1 | derives_from      |            |        36 |           0 |                   1 |
|        22 |     1 | derived_into      | NULL       |        38 |           0 |                   1 |
|        23 |     1 | preceded_by       |            |        40 |           0 |                   1 |
|        24 |     1 | precedes          | NULL       |        42 |           0 |                   1 |
|        25 |     1 | has_participant   |            |        44 |           0 |                   1 |
|        26 |     1 | participates_in   | NULL       |        46 |           0 |                   1 |
|        27 |     1 | has_agent         |            |        48 |           0 |                   1 |
|        28 |     1 | agent_in          | NULL       |        50 |           0 |                   1 |
|        29 |     1 | instance_of       |            |        52 |           0 |                   1 |
|        30 |     1 | has_improper_part | NULL       |        54 |           1 |                   1 |
|        31 |     1 | improper_part_of  |            |        56 |           1 |                   1 |
|        32 |     1 | relationship      |            |        58 |           1 |                   1 |
+-----------+-------+-------------------+------------+-----------+-------------+---------------------+
26 rows in set (0.20 sec)

The tree reconciliation database includes commonly use ontologies such as GO terms, but also controlled vocabularies for systems of tagging trees such as the PRIME format of representing reconciled trees. For example:

mysql> select cvterm_id,name,definition from cvterm where cv_id = 7 limit 5\G
*************************** 1. row ***************************
 cvterm_id: 104
      name: prime_tag
definition: NULL
*************************** 2. row ***************************
 cvterm_id: 159
      name: AC
definition: Anti-chain on edge. Used to define a reconciliation.
*************************** 3. row ***************************
 cvterm_id: 160
      name: D
definition: A Duplication.
*************************** 4. row ***************************
 cvterm_id: 161
      name: S
definition: Species
*************************** 5. row ***************************
 cvterm_id: 162
      name: ID
definition: The node identifier.
5 rows in set (0.00 sec)

cvterm_dbxref

In addition to the primary identifier (cvterm.dbxref_id) a cvterm can have zero or more secondary identifiers/dbxrefs, which may refer to records in external databases. The exact semantics of cvterm_dbxref are not fixed. For example: the dbxref could be a pubmed ID that is pertinent to the cvterm, or it could be an equivalent or similar term in another ontology. For example, GO cvterms are typically linked to InterPro IDs, even though the nature of the relationship between them is largely one of statistical association. The dbxref may be have data records attached in the same database instance, or it could be a "hanging" dbxref pointing to some external database. NOTE: If the desired objective is to link two cvterms together, and the nature of the relation is known and holds for all instances of the subject cvterm then consider instead using cvterm_relationship together with a well-defined relation.

Name	Type	Description
cvterm_dbxref_id	INTEGER
cvterm_id	INTEGER
dbxref_id	INTEGER
is_for_definition	INTEGER	A cvterm.definition should be supported by one or more references. If this column is true, the dbxref is not for a term in an external database - it is a dbxref for provenance information for the definition.

cvterm_relationship

A relationship linking two cvterms. Each cvterm_relationship constitutes an edge in the graph defined by the collection of cvterms and cvterm_relationships. The meaning of the cvterm_relationship depends on the definition of the cvterm R refered to by type_id. However, in general the definitions are such that the statement "all SUBJs REL some OBJ" is true. The cvterm_relationship statement is about the subject, not the object. For example "insect wing part_of thorax".

Name	Type	Description
cvterm_relationship_id	INTEGER
type_id	INTEGER	The nature of the relationship between subject and object. Note that relations are also housed in the cvterm table, typically from the OBO relationship ontology, although other relationship types are allowed.
subject_id	INTEGER	The subject of the subj-predicate-obj sentence. The cvterm_relationship is about the subject. In a graph, this typically corresponds to the child node.
object_id	INTEGER	The object of the subj-predicate-obj sentence. The cvterm_relationship refers to the object. In a graph, this typically corresponds to the parent node.

cvtermpath

The reflexive transitive closure of the cvterm_relationship relation.

Name	Type	Description
cvterm_relationship_id	INTEGER
type_id	INTEGER	The nature of the relationship between subject and object. Note that relations are also housed in the cvterm table, typically from the OBO relationship ontology, although other relationship types are allowed.
subject_id	INTEGER	The subject of the subj-predicate-obj sentence. The cvterm_relationship is about the subject. In a graph, this typically corresponds to the child node.
object_id	INTEGER	The object of the subj-predicate-obj sentence. The cvterm_relationship refers to the object. In a graph, this typically corresponds to the parent node.
cv_id	INTEGER	Closures will mostly be within one cv. If the closure of a relationship traverses a cv, then this refers to the cv of the object_id cvterm.
pathdistance	INTEGER	The number of steps required to get from the subject cvterm to the object cvterm, counting from zero (reflexive relationship).

cvtermprop

Additional extensible properties can be attached to a cvterm using this table. Corresponds to AnnotationProperty in W3C OWL format.

Name	Type	Description
cvtermprop_id	INTEGER	Primay key
cvterm_id	INTEGER
type_id		The name of the property or slot is a cvterm. The meaning of the property is defined in that cvterm.
value	INTEGER	The value of the property, represented as text. Numeric values are converted to their text representation.
rank	TEXT	Property-Value ordering. Any cvterm can have multiple values for any particular property type - these are ordered in a list using rank, counting from zero. For properties that are single-valued rather than multi-valued, the default 0 value should be used.

cvtermsynonym

A cvterm actually represents a distinct class or concept. A concept can be refered to by different phrases or names. In addition to the primary name (cvterm.name) there can be a number of alternative aliases or synonyms. For example, "T cell" as a synonym for "T lymphocyte".

Name	Type	Description
cvtermsynonym_id	INTEGER	Primay key
cvterm_id	INTEGER
synonym_id
type_id	INTEGER	A synonym can be exact, narrower, or broader than.

db

A database authority. Typical databases in bioinformatics are FlyBase, GO, UniProt, NCBI, MGI, etc. The authority is generally known by this shortened form, which is unique within the bioinformatics and biomedical realm. To Do - add support for URIs, URNs (e.g. LSIDs). We can do this by treating the URL as a URI - however, some applications may expect this to be resolvable - to be decided.

Name	Type	Description
db_id	INTEGER
name	VARCHAR(255)
description	TEXT
urlprefix
url

dbxref

A unique, global, public, stable identifier. Not necessarily an external reference - can reference data items inside the particular chado instance being used. Typically a row in a table can be uniquely identified with a primary identifier (called dbxref_id); a table may also have secondary identifiers (in a linking table <T>_dbxref). A dbxref is generally written as <DB>:<ACCESSION> or as <DB>:<ACCESSION>:<VERSION>

Name	Type	Description
dbxref_id	INTEGER(10)
db_id	INTEGER(10)
accession	VARCHAR(255)	The local part of the identifier. Guaranteed by the db authority to be unique for that db.
version	VARCHAR(255)
description	TEXT

dbxrefprop

Metadata about a dbxref. Note that this is not defined in the dbxref module, as it depends on the cvterm table. This table has a structure analagous to cvtermprop.

Name	Type	Description
dbxrefprop_id	INTEGER	Primay key
dbxref_id	INTEGER
type_id	INTEGER
value	TEXT
rank	INTEGER

iPToL

Database Schema

Schema Overview

Protein Tables

member

member_attribute

sequence

Protein Family Tables

family

family_attribute

family_member

Protein Tree Tables

protein_tree

protein_tree_attribute

protein_tree_member

protein_tree_node

protein_tree_node_attribute

protein_tree_node_path

Species Tree Tables

species_tree

species_tree_attribute

species_tree_node

species_tree_node_attribute

species_tree_node_path

Reconciliation Tables

reconciliation

reconciliation_attribute

reconciliation_node

reconciliation_node_attribute

Controlled Vocabulary Tables

cv

cvterm

cvterm_dbxref

cvterm_relationship

cvtermpath

cvtermprop

cvtermsynonym

db

dbxref

dbxrefprop