Database Schema
Schema Overview
This schema for storing information related to gene tree reconciliation keeps data related to species trees separate from data related to gene trees. This makes minimal changes to the existing tables in Ensemble Compara and adds information for species trees, reconciliations, and a controlled vocabulary.
Advantages of this schema:
- Minimizes changes to the existing Ensemble/Compara schema
- Follows weak typing paradigm, attribute tables can hold any type of specialized information
- Makes use of controlled vocabularies and ontologies store data types
- Use Chaod tables for controlled vocabularies(alternatively could use Chado cv module - http://gmod.org/wiki/Chado_CV_Module) allows for import of CV terms in OBO format using existing tools
Disadvantages of this schema:
- Not the best schema for representing uncertainty in tree topology
- Uncertainty in root not easily supported, The E/C schema assumes a gene tree with a single root, other roots would require a replication of the tree topology
- This requires that ontologies exist and are loaded into the database before attribute values can be stored
The tables in this database fall into a few major groups:
- Protein Tree Tables - Generally these tables already exist in the E/C schema
- Species Tree Tables - Tables related to the information in the species tree
- Reconciliation Tables - Tables related to the mapping of the nodes of the gene tree to the nodes and edges of the species tree
- Controlled Vocabulary Tables - Tables storing information for ontologies and other controlled vocabularies
Protein Tables
member
Information related to the an individual locus
Name |
Type |
Description |
---|---|---|
member_id |
INTEGER(8) |
Primary key |
stable_id |
VARCHAR(64) |
Stable identifier for the locus such as the gene name or gene identifier. |
version |
INTEGER(8) |
Version of the stable ID (see EnsEMBL core DB) |
source_name |
CHAR(17) |
The source of the member. |
taxon_id |
INTEGER(8) |
Source taxon the protein is derived from. Reference to ncbi_taxa_node.taxon_id |
genome_db_id |
INTEGER(8) |
Reference to geneome_db.genome_db_id |
sequence_id |
INTEGER(8) |
Identifier for the sequence used to represent this protein. Reference to sequence.sequence_id. |
gene_member_id |
INTEGER(8) |
Reference to member.member_id to allow linkage from peptides to genes. |
description |
TEXT |
The description of the gene/protein as described in the core database or from the Uniprot entry. |
chr_name |
CHAR(40) |
The name of the chromosome this sequence maps to. Many assemblies use letters or other strings to name their chromosomes, so an integer would not work here. |
chr_start |
INTEGER(8) |
The start of the location of the member on the chromosome. |
chr_end |
INTEGER(8) |
The end of the location of the member on the chromosome. |
chr_strand |
TINYINT(1) |
Strand of the chromosome. |
display_label |
VARCHAR(128) |
This information not described in the E/C schema description. |
member_attribute
This table allows for additional information to be stored in the member attribute table using the controlled vocabulary terms in the database.
Name |
Type |
Description |
---|---|---|
member_attribute_id |
INTEGER(10) |
Primary key |
member_id |
INTEGER(8) |
Foreign key to member table |
cvterm_id |
INTEGER(10) |
The id for term using the controlled vocabulary tables |
value |
TEXT |
The value for the specific term_id |
rank |
SMALLINT(3) |
Default is 0, This allows for lists of data for a given term id |
source_id |
INTEGER(10) |
|
sequence
The protein sequence.
Name |
Type |
Description |
---|---|---|
sequence_id |
INTEGER(10) |
Primary key |
length |
INTEGER(10) |
The length of the protein sequence |
sequence |
LONGTEXT |
The protein sequence. |
Protein Family Tables
family
The protein family. Contains all the group homologies found. There are several family_member entries for each family entry.
Name |
Type |
Description |
---|---|---|
family_id |
INTEGER(10) |
Primary key. |
stable_id |
VARCHAR(40) |
stable family ID |
version |
INTEGER(10) |
version of the stable_id (may only change when members migrate between this family and another one; stays the same otherwise) |
method_link_species_set_id |
INTEGER(10) |
external reference to method_link_species_set.method_link_species_set_id |
description |
VARCHAR(255) |
description of the family as found using the Longest Common String (LCS) of the descriptions of the member proteins. |
description_score |
DOUBLE |
Scores the accuracy of the annotation (max. 100) |
family_attribute
Attributes of the protein family.
Name |
Type |
Description |
---|---|---|
family_attribute_id |
INTEGER(10) |
Primary Key, Unique Identifier |
family_id |
INTEGER(10) |
|
cvterm_id |
INTEGER(10) |
The id for term using the controlled vocabulary tables |
value |
VARHCHAR(255) |
The value for the specific term_id |
rank |
SMALLINT(3) |
Default is 0, This allows for lists of data for a given term id |
source_id |
INTEGER(10) |
|
family_member
Contains the proteins corresponding to protein family relationship found. There are several family_member entries for each family entry.
Name |
Type |
Description |
---|---|---|
family_id |
INTEGER(10) |
external reference to family.family_id |
member_id |
INTEGER(10) |
external reference to member.member_id |
cigar_line |
MEDIUMTEXT |
internal description of the multiple alignment (see homology_member table) |
Protein Tree Tables
These are mostly relevant tables from the Ensemble Compara schema.
protein_tree
Information related to an entire protein tree.
Name |
Type |
Description |
---|---|---|
protein_tree_id |
INTEGER(10) |
Unique identifier for the protein tree |
family_id |
INTEGER(10) |
|
root_node_id |
INTEGER(10) |
ID of the root node for the tree. |
protein_tree_attribute
This is a table to store information about the protein tree.
Name |
Type |
Description |
---|---|---|
protein_tree_attribute_id |
INTEGER(10) |
Primary Key, Unique Identifier |
protein_tree_id |
INTEGER(10) |
|
cvterm_id |
INTEGER(10) |
The id for term using the controlled vocabulary tables |
value |
VARHCHAR(255) |
The value for the specific term_id |
rank |
SMALLINT(3) |
Default is 0, This allows for lists of data for a given term id |
source_id |
INTEGER(10) |
|
protein_tree_member
The information for the individual proteins in the tree. I believe this holds the alignment information for the tree in the cigar lines
Name |
Type |
Description |
---|---|---|
node_id |
INTEGER(10) |
The node_id in protein tree this refers to |
root_id |
INTEGER(8) |
The root_id of the tree this refers to. I think that this is the way that E/C refers to the set of nodes the comprise a tree |
member_id |
INTEGER(8) |
Foreign key, The protein that is being represented |
method_link_species_set_id |
INTEGER(8) |
External reference to method_link_species_set.method_link_species_set_id |
cigar_line |
TEXT |
The alignment in CIGAR format |
cigar_start |
INTEGER(8) |
The start of the alignment |
cigar_end |
INTEGER(8) |
The end of the alignment |
protein_tree_node
This table represents the topology of the gene tree.
Name |
Type |
Description |
---|---|---|
node_id |
INTEGER(8) |
Unique identifier for nodes in the protein tree |
protein_tree_id |
INTEGER(10) |
This is an addition to the existing E/C schema. This is a unique identifier for the protein tree that the nodes are a member of. |
parent_id |
INTEGER(8) |
This is the parent node for the node representend by node_id |
root_id |
INTEGER(8) |
The id for the root node in the protein tree. By my reading of the E/C schema, this is the only way to refer to the gene tree as a group |
clusterset_id |
INTEGER(8) |
This is the set of clusters (the gene family) represented by the gene tree |
left_index |
INTEGER(8) |
Left indexing value, this can be used with right_index to select the child nodes of a given internal node in the gene tree. |
right_index |
INTEGER(8) |
Right indexing value, this can be used with left_index to select the child nodes of a given internal node in the gene tree. |
distance_to_parent |
DOUBLE |
This is essentially the edge length. This is the way that the data are currently represented in E/C, but it would be possible to make this information available as a protein_tree_node attribute value |
protein_tree_node_attribute
This is a table to store information about the individual nodes in the protein tree.
Name |
Type |
Description |
---|---|---|
protein_node_attribute_id |
INTEGER(10) |
Primary Key, Unique Identifier |
node_id |
INTEGER(8) |
The identifier of the node the attribute refers to. |
cvterm_id |
INTEGER(10) |
The id for term using the controlled vocabulary tables |
value |
VARHCHAR(255) |
The value for the specific term_id |
rank |
SMALLINT(3) |
Default is 0, This allows for lists of data for a given term id |
source_id |
INTEGER(10) |
|
protein_tree_node_path
This table holds information or transitive closure queries, ie the reachability and distance from an ancestral node to all of its children nodes. This table will be used for least common ancestor queries, and to select a subtree from a larger species tree.
Name |
Type |
Description |
---|---|---|
protein_tree_node_path_id |
INTEGER(10) |
Primary key. |
parent_node_id |
INTEGER(10) |
The parent node being referenced |
child_node_id |
INTEGER(10) |
The child node being referenced (not necessatily the direct descendent, but a child node |
path |
TEXT |
This may be considered optional. This would store the node path from the parent to the child |
distance |
INTEGER(10) |
The distance in number of edges from the parent node to the child node. This data can be used to minimize the distance for Least Common Ancestor (LCA) queries. |
Species Tree Tables
These are the tables related to the species tree data.
species_tree
Name |
Type |
Description |
---|---|---|
species_tree_id |
INTEGER(10) |
Primary key |
species_tree_name |
TEXT |
The name of the species tree. |
root_node_id |
INTEGER(10) |
The root node of the species tree. The current version of the TR database required rooted trees. |
version |
SMALLINT(3) |
I am assuming that there will be different versions of species trees. The version information will make sure that the reconciled data match to the species tree being referred to here. This can be null |
Example of data in this table:
mysql> select * from species_tree LIMIT 1; +-----------------+-------------------+--------------+---------+ | species_tree_id | species_tree_name | root_node_id | version | +-----------------+-------------------+--------------+---------+ | 1 | bowers_rosids | 1 | NULL | +-----------------+-------------------+--------------+---------+ 1 row in set (0.00 sec)
species_tree_attribute
Name |
Type |
Description |
---|---|---|
species_tree_attribute_id |
INTEGER(10) |
Primary key |
species_tree_id |
INTEGER(10) |
Foreign Key, The species tree being referenced |
term_id |
INTEGER(10) |
The id for term using the controlled vocabulary tables |
value |
VARHCHAR(255) |
The value for the specific term_id |
rank |
SMALLINT(3) |
Default is 0, This allows for lists of data for a given term id |
species_tree_node
Name |
Type |
Description |
---|---|---|
species_tree_node_id |
INTEGER(10) |
Primary Key |
species_tree_id |
INTEGER(10) |
The species tree this node is a member of |
parent_id |
INTEGER(10) |
The id of the parent node to this species_tree_node |
left_index |
INTEGER(10) |
A left index value used to select clades |
right_index |
INTEGER(10) |
A right index value used to select clades |
Example of species tree nodes from the bowers_rosids tree:
mysql> SELECT * FROM species_tree_node WHERE species_tree_id = 1; +----------------------+-----------------+-----------+-------------+------------+-------------+ | species_tree_node_id | species_tree_id | parent_id | label | left_index | right_index | +----------------------+-----------------+-----------+-------------+------------+-------------+ | 1 | 1 | 0 | NULL | 1 | 22 | | 2 | 1 | 1 | grape | 2 | 3 | | 3 | 1 | 1 | NULL | 4 | 21 | | 4 | 1 | 3 | NULL | 5 | 14 | | 5 | 1 | 4 | poplar | 6 | 7 | | 6 | 1 | 4 | NULL | 8 | 13 | | 7 | 1 | 6 | cucumber | 9 | 10 | | 8 | 1 | 6 | soybean | 11 | 12 | | 9 | 1 | 3 | NULL | 15 | 20 | | 10 | 1 | 9 | Arabidopsis | 16 | 17 | | 11 | 1 | 9 | papaya | 18 | 19 | +----------------------+-----------------+-----------+-------------+------------+-------------+ 11 rows in set (0.03 sec)
species_tree_node_attribute
Name |
Type |
Description |
---|---|---|
species_tree_node_attribute_id |
INTEGER(10) |
Primary key |
species_tree_node |
INTEGER(10) |
Foreign Key, The species tree node being referenced |
term_id |
INTEGER(10) |
The id for term using the controlled vocabulary tables |
value |
VARHCHAR(255) |
The value for the specific term_id |
rank |
SMALLINT(3) |
Default is 0, This allows for lists of data for a given term id |
species_tree_node_path
This table holds information or transitive closure queries, ie the reachability and distance from an ancestral node to all of its children nodes. This table will be used for least common ancestor queries, and to select a subtree from a larger species tree.
Name |
Type |
Description |
---|---|---|
parent_node_id |
INTEGER(10) |
The parent node being referenced |
child_node_id |
INTEGER(10) |
The child node being referenced (not necessatily the direct descendent, but a child node |
path |
VARCHAR(255) |
This may be considered optional. This would store the node path from the parent to the child |
distance |
INTEGER(10) |
The distance in number of edges from the parent node to the child node. This data can be used to minimize the distance for Least Common Ancestor (LCA) queries. |
Reconciliation Tables
The purpose of the reconciliation tables is to map the nodes of a gene tree to the nodes and edges of a species tree.
reconciliation
This table describes the gene_tree and species tree that are being reconciled. This also allows for a subset of the species to be selected from the species tree with the species_set column.
Name |
Type |
Description |
---|---|---|
reconciliation_id |
INTEGER(10) |
Primary Key |
species_tree_id |
INTEGER(10) |
The species tree that is serving as the host for the mapping of the protein tree |
species_set_id |
INTEGER(10) |
The species from the large species tree that are being used in the reconciliation, this would probably best be represented as the list of nodes in the species tree that are the leaf nodes being represented |
protein_tree_id |
INTEGER(10) |
The protein tree that is being mapped onto the species tree |
reconciliation_attribute
This table contains information related to the overall mapping of the reconciliation itself.
Name |
Type |
Description |
---|---|---|
reconciliation_attribute_id |
INTEGER(10) |
Primary Key |
reconciliation_id |
INTEGER(10) |
The id for the specific reconciliation being referenced |
term_id |
INTEGER(10) |
The id for term using the controlled vocabulary tables |
value |
VARHCHAR(255) |
The value for the specific term_id |
rank |
SMALLINT(3) |
Default is 0, This allows for lists of data for a given term id |
reconciliation_node
The mapping of an individual node from a gene tree to a node or edge in the species tree.
Name |
Type |
Description |
---|---|---|
reconciliation_node_id |
INTEGER(10) |
Primary Key, unique identifier for each row in the table. |
reconciliation_id |
INTEGER(10) |
Foreign Key, The individual reconciliation the node maps is a member of |
node_id |
INTEGER(10) |
Foreign Key, The node_id in the gene tree that this reconciliation mapping refers to. |
host_parent_node_id |
INTEGER(10) |
The parent node in the species tree, this will be an ID from the species_tree_node_table |
host_child_node_id |
INTEGER(10) |
The child node in the species tree, tis will be an ID from the species_tree_node table. NOTE: It is not required that the parent_node of this child be the parent node of the host_parent_node ID in the species tree. |
is_on_node |
BOOLEAN |
TRUE if the node on the gene tree is located on a node in the species tree. |
reconciliation_node_attribute
An attribute related to the node mapping of a node in the gene tree to a node in the species tree.
Name |
Type |
Description |
---|---|---|
reconciliation_node_attribute_id |
INTEGER(10) |
Primary Key |
reconciliation_node_id |
INTEGER(10) |
Foreign Key, The reconciled node the attribute refers to |
term_id |
INTEGER(10) |
The id for term using the controlled vocabulary tables |
value |
VARHCHAR(255) |
The value for the specific term_id |
rank |
SMALLINT(3) |
Default is 0, This allows for lists of data for a given term id |
Controlled Vocabulary Tables
The controlled vocabulary tables incorporated into this schema were taken from the Chado CV module and converted from PostgreSQL syntax to MySQL tables. The following documentation is taken from the GMOD description of these tables.
cv
The controlled vocabulary groups. Individual controlled vocabulary terms belong to individual controlled vocabularies as indicated by the cv_id in the cvterm table.
Name |
Type |
Description |
---|---|---|
cv_id |
INTEGER(10) |
|
name |
VARCHAR(255) |
|
definition |
TEXT |
|
For example, the ontologies loaded in the current implementation of the tree reconciliation database:
mysql> SELECT * FROM cv; +-------+------------------------------------------+------------+ | cv_id | name | definition | +-------+------------------------------------------+------------+ | 1 | relationship | | | 2 | synonym_type | | | 3 | cvterm_property_type | | | 4 | anonymous | | | 5 | homology_and_related_concepts_in_biology | | | 6 | nhx_tag | | | 7 | prime_tag | | | 8 | phyloXML_element | | | 9 | biological_process | | | 10 | molecular_function | | | 11 | cellular_component | | | 12 | gene_ontology | | +-------+------------------------------------------+------------+ 12 rows in set (0.11 sec)
cvterm
A term, class, universal or type within an ontology or controlled vocabulary. This table is also used for relations and properties. cvterms constitute nodes in the graph defined by the collection of cvterms and cvterm_relationships.
Name |
Type |
Description |
---|---|---|
cvterm_id |
INTEGER(10) |
Primary Key |
cv_id |
INTEGER(10) |
The cv or ontology or namespace to which this cvterm belongs. |
name |
VARCHAR(950) |
A concise human-readable name or label for the cvterm. Uniquely identifies a cvterm within a cv. |
definition |
TEXT |
A human-readable text definition. |
dbxref_id |
INTEGER(10) |
Primary identifier dbxref - The unique global OBO identifier for this cvterm. Note that a cvterm may have multiple secondary dbxrefs - see also table: cvterm_dbxref. |
is_obsolete |
TINYINT(1) |
Boolean 0=false,1=true; see GO documentation for details of obsoletion. Note that two terms with different primary dbxrefs may exist if one is obsolete. |
is_relationshiptype |
TINYINT(1) |
Boolean 0=false,1=true relations or relationship types (also known as Typedefs in OBO format, or as properties or slots) form a cv/ontology in themselves. We use this flag to indicate whether this cvterm is an actual term/class/universal or a relation. Relations may be drawn from the OBO Relations ontology, but are not exclusively drawn from there. |
Note that relationship types among cvterms are also stored as cvterms:
mysql> select * from cvterm where cv_id = 1; +-----------+-------+-------------------+------------+-----------+-------------+---------------------+ | cvterm_id | cv_id | name | definition | dbxref_id | is_obsolete | is_relationshiptype | +-----------+-------+-------------------+------------+-----------+-------------+---------------------+ | 3 | 1 | is_a | | 3 | 0 | 1 | | 8 | 1 | part_of | | 10 | 0 | 1 | | 9 | 1 | has_part | NULL | 12 | 0 | 1 | | 10 | 1 | integral_part_of | | 14 | 0 | 1 | | 11 | 1 | has_integral_part | NULL | 16 | 0 | 1 | | 12 | 1 | proper_part_of | | 18 | 0 | 1 | | 13 | 1 | has_proper_part | NULL | 20 | 0 | 1 | | 14 | 1 | located_in | | 22 | 0 | 1 | | 15 | 1 | location_of | NULL | 24 | 0 | 1 | | 16 | 1 | contained_in | | 26 | 0 | 1 | | 17 | 1 | contains | NULL | 28 | 0 | 1 | | 18 | 1 | adjacent_to | | 30 | 0 | 1 | | 19 | 1 | transformation_of | | 32 | 0 | 1 | | 20 | 1 | transformed_into | NULL | 34 | 1 | 1 | | 21 | 1 | derives_from | | 36 | 0 | 1 | | 22 | 1 | derived_into | NULL | 38 | 0 | 1 | | 23 | 1 | preceded_by | | 40 | 0 | 1 | | 24 | 1 | precedes | NULL | 42 | 0 | 1 | | 25 | 1 | has_participant | | 44 | 0 | 1 | | 26 | 1 | participates_in | NULL | 46 | 0 | 1 | | 27 | 1 | has_agent | | 48 | 0 | 1 | | 28 | 1 | agent_in | NULL | 50 | 0 | 1 | | 29 | 1 | instance_of | | 52 | 0 | 1 | | 30 | 1 | has_improper_part | NULL | 54 | 1 | 1 | | 31 | 1 | improper_part_of | | 56 | 1 | 1 | | 32 | 1 | relationship | | 58 | 1 | 1 | +-----------+-------+-------------------+------------+-----------+-------------+---------------------+ 26 rows in set (0.20 sec)
The tree reconciliation database includes commonly use ontologies such as GO terms, but also controlled vocabularies for systems of tagging trees such as the PRIME format of representing reconciled trees. For example:
mysql> select cvterm_id,name,definition from cvterm where cv_id = 7 limit 5\G *************************** 1. row *************************** cvterm_id: 104 name: prime_tag definition: NULL *************************** 2. row *************************** cvterm_id: 159 name: AC definition: Anti-chain on edge. Used to define a reconciliation. *************************** 3. row *************************** cvterm_id: 160 name: D definition: A Duplication. *************************** 4. row *************************** cvterm_id: 161 name: S definition: Species *************************** 5. row *************************** cvterm_id: 162 name: ID definition: The node identifier. 5 rows in set (0.00 sec)
cvterm_dbxref
In addition to the primary identifier (cvterm.dbxref_id) a cvterm can have zero or more secondary identifiers/dbxrefs, which may refer to records in external databases. The exact semantics of cvterm_dbxref are not fixed. For example: the dbxref could be a pubmed ID that is pertinent to the cvterm, or it could be an equivalent or similar term in another ontology. For example, GO cvterms are typically linked to InterPro IDs, even though the nature of the relationship between them is largely one of statistical association. The dbxref may be have data records attached in the same database instance, or it could be a "hanging" dbxref pointing to some external database. NOTE: If the desired objective is to link two cvterms together, and the nature of the relation is known and holds for all instances of the subject cvterm then consider instead using cvterm_relationship together with a well-defined relation.
Name |
Type |
Description |
---|---|---|
cvterm_dbxref_id |
INTEGER |
|
cvterm_id |
INTEGER |
|
dbxref_id |
INTEGER |
|
is_for_definition |
INTEGER |
A cvterm.definition should be supported by one or more references. If this column is true, the dbxref is not for a term in an external database - it is a dbxref for provenance information for the definition. |
cvterm_relationship
A relationship linking two cvterms. Each cvterm_relationship constitutes an edge in the graph defined by the collection of cvterms and cvterm_relationships. The meaning of the cvterm_relationship depends on the definition of the cvterm R refered to by type_id. However, in general the definitions are such that the statement "all SUBJs REL some OBJ" is true. The cvterm_relationship statement is about the subject, not the object. For example "insect wing part_of thorax".
Name |
Type |
Description |
---|---|---|
cvterm_relationship_id |
INTEGER |
|
type_id |
INTEGER |
The nature of the relationship between subject and object. Note that relations are also housed in the cvterm table, typically from the OBO relationship ontology, although other relationship types are allowed. |
subject_id |
INTEGER |
The subject of the subj-predicate-obj sentence. The cvterm_relationship is about the subject. In a graph, this typically corresponds to the child node. |
object_id |
INTEGER |
The object of the subj-predicate-obj sentence. The cvterm_relationship refers to the object. In a graph, this typically corresponds to the parent node. |
cvtermpath
The reflexive transitive closure of the cvterm_relationship relation.
Name |
Type |
Description |
---|---|---|
cvterm_relationship_id |
INTEGER |
|
type_id |
INTEGER |
The nature of the relationship between subject and object. Note that relations are also housed in the cvterm table, typically from the OBO relationship ontology, although other relationship types are allowed. |
subject_id |
INTEGER |
The subject of the subj-predicate-obj sentence. The cvterm_relationship is about the subject. In a graph, this typically corresponds to the child node. |
object_id |
INTEGER |
The object of the subj-predicate-obj sentence. The cvterm_relationship refers to the object. In a graph, this typically corresponds to the parent node. |
cv_id |
INTEGER |
Closures will mostly be within one cv. If the closure of a relationship traverses a cv, then this refers to the cv of the object_id cvterm. |
pathdistance |
INTEGER |
The number of steps required to get from the subject cvterm to the object cvterm, counting from zero (reflexive relationship). |
cvtermprop
Additional extensible properties can be attached to a cvterm using this table. Corresponds to AnnotationProperty in W3C OWL format.
Name |
Type |
Description |
---|---|---|
cvtermprop_id |
INTEGER |
Primay key |
cvterm_id |
INTEGER |
|
type_id |
|
The name of the property or slot is a cvterm. The meaning of the property is defined in that cvterm. |
value |
INTEGER |
The value of the property, represented as text. Numeric values are converted to their text representation. |
rank |
TEXT |
Property-Value ordering. Any cvterm can have multiple values for any particular property type - these are ordered in a list using rank, counting from zero. For properties that are single-valued rather than multi-valued, the default 0 value should be used. |
cvtermsynonym
A cvterm actually represents a distinct class or concept. A concept can be refered to by different phrases or names. In addition to the primary name (cvterm.name) there can be a number of alternative aliases or synonyms. For example, "T cell" as a synonym for "T lymphocyte".
Name |
Type |
Description |
---|---|---|
cvtermsynonym_id |
INTEGER |
Primay key |
cvterm_id |
INTEGER |
|
synonym_id |
|
|
type_id |
INTEGER |
A synonym can be exact, narrower, or broader than. |
db
A database authority. Typical databases in bioinformatics are FlyBase, GO, UniProt, NCBI, MGI, etc. The authority is generally known by this shortened form, which is unique within the bioinformatics and biomedical realm. To Do - add support for URIs, URNs (e.g. LSIDs). We can do this by treating the URL as a URI - however, some applications may expect this to be resolvable - to be decided.
Name |
Type |
Description |
---|---|---|
db_id |
INTEGER |
|
name |
VARCHAR(255) |
|
description |
TEXT |
|
urlprefix |
|
|
url |
|
|
dbxref
A unique, global, public, stable identifier. Not necessarily an external reference - can reference data items inside the particular chado instance being used. Typically a row in a table can be uniquely identified with a primary identifier (called dbxref_id); a table may also have secondary identifiers (in a linking table <T>_dbxref). A dbxref is generally written as <DB>:<ACCESSION> or as <DB>:<ACCESSION>:<VERSION>
Name |
Type |
Description |
---|---|---|
dbxref_id |
INTEGER(10) |
|
db_id |
INTEGER(10) |
|
accession |
VARCHAR(255) |
The local part of the identifier. Guaranteed by the db authority to be unique for that db. |
version |
VARCHAR(255) |
|
description |
TEXT |
|
dbxrefprop
Metadata about a dbxref. Note that this is not defined in the dbxref module, as it depends on the cvterm table. This table has a structure analagous to cvtermprop.
Name |
Type |
Description |
---|---|---|
dbxrefprop_id |
INTEGER |
Primay key |
dbxref_id |
INTEGER |
|
type_id |
INTEGER |
|
value |
TEXT |
|
rank |
INTEGER |
|