Compara_database

Some encouraging news.

The four high priority use cases should be supported by the existing Compara database according to Albert Vilella. Also see attached PDF

  • Use Case 2 - Show all gene duplications in a gene family tree.
  • Use Case 3 - Find all gene trees showing a duplication event at a specific point on a species tree.
  • Use Case 12 - Show sampled gene copy number for all taxa gene family identifier
  • Use Case 4 - Find points on a species tree where a set of genes (e.g. all genes within a pathway or GO category)

Email exchange:

Hi Sheldon,

Yes, pretty much all the queries you describe can be done with our
system. Apart from the Perl API, which will let you traverse the trees
programmatically, we store the trees in a nestedset structure, and add
left and right indexes, so that a variety of queries can directly be
done in sql.

The GO information is not stored in Compara, but in the core dbs, so for
the last query you would need to query the core db and then run the
compara query on the resulting list.

Cheers,

Albert.

  • Hide quoted text -

On Mon, 2010-05-03 at 03:22 -0400, Sheldon McKay wrote:
> Hi Albert,
>
> I am looking into a gene/species tree reconconciliation pipeline and
> we are considering the compara database schema for storing our data.
> I was wondering if you might have an opinion on how suitable it might
> be for the following use-cases/queries:
>
> - Show all gene duplications in a gene family tree.
> - Find all gene trees showing a duplication event at a specific point
> on a species tree.
> - Show sampled gene copy number for all taxa gene family identifier
> - Find points on a species tree where a set of genes (e.g. all genes
> within a pathway or GO category) originated and/or diversified.
>
>
>
> Best Regards,
> Sheldon
>