2009 Tech Talk

Date	Presenter	Contact	Title	Host	URL/Link	Abstract/Notes
Aug 19 2009	Andres Varon		POY (Phylogenetic Analysis of DNA and other Data using Dynamic Homology)	Andy Lenard	src /website iPC presentation iPC followup	POY 4 is an open source, phylogenetic analysis program for molecular and morphological data. Version 4 supports Maximum Parsimony as its optimality criterion, analyzing the standard non-additive, additive, and matrix characters, commonly found in other phylogenetic analysis programs, and most importantly dynamic homology characters (DH) which allow the use of unaligned sequences as characters.
Sep 2 2009	Andy, Edwin, Sriram, Nirav, Sonya	Nirav	Topics of Inetrest to iPC from OSCON 2009	Nirav	iPC trip report OSCON 2009	OSCON is the "Open Source Convention" and has a impressive line up of speakers from various open source projects and opportunities to learn best practices from tutorials and talks. We will explore ways we can incorporate/leverage what was learnt into iPC projects.
Sept 30 2009	Stephen Kobourov	Nirav	GMap, Putting Data on the Map	Nirav		Information visualization can be invaluable in making sense out of large data sets. However, traditional graph visualization methods often fail to capture the underlying structural information, clustering, and neighborhoods. GMap, an algorithm for visualizing graphs as maps, provides a way to overcome some of the shortcomings with the help of the geographic map metaphor. While graphs, charts, and tables often require considerable effort to comprehend, a map representation is more intuitive, as most people are very familiar with maps and even enjoy carefully examining maps. The effectiveness of GMap is illustrated with examples from several domains, namely TV shows and Amazon books.
Oct 14 2009	Rutger Vos	Nirav	NeXML, Treebase, PhyloWS			Discussion with iPC team on the use of triple stores. We also will review roadmap and progress with Treebase, NeXML and PhyloWS
Dec 2nd 2009	Sheldon McKay	Nirav	A Survey of Genome and Comparative Genome Browsers	Nirav	Copy of presentation Book Chapter	The need to visualize genome-scale data has been addressed by genome browser applications, which typically present a graphical rendering of a reference sequence along with annotations such as gene models, experimental data from expressed sequence tags, microarray experiments, etc. Increasing availability of newly sequenced genomes has also led to growth in the field of comparative genomics and, with it, an emerging class of software known as comparative genome or synteny browsers. There is currently an embarrassment of riches in web-based software for visualizing genome annotations, alignment and co-linearity, with attendant heterogeneity in approaches to processing and displaying the data. I will review examples of commonly used genome and comparative genome browsers, with an emphasis on Generic Model Organism Database (GMOD) supported software and recent improvements to deal with very dense information from high throughput microarray and next-generation sequencing experiments.
Dec 9th 2009	Damian Gessler	Nirav	An Introduction to the Semantic Web			Dr. Damian Gessler, iPlant Semantic Web Architect, will present an introduction on the semantic web. He will discuss what it is, how it differs from web services, how it fits into NSF's multi-$100 million efforts in data and service persistence, access, and integration, and how it fits into iPlant's unique set of challenges. Today, the only thing greater than the plethora of technology choices available to us is the gap between any single technology and its ability to solve the challenging data and service integration problems ahead. Semantic web technologies offer unique assets by allowing internet-scalability over semantically difficult problems. Many problems in plant science are intransigent to solution via lexical and syntactical aggregation. These problems require that contextual information is made amenable to high-throughput reasoning and discovery. Dr. Gessler will present the results of research aimed specifically at addressing this problem.
Dec 16th 2009	Eric Lyon	Nirav	CoGe: A new kind of Comparative Genomics			Transforming genomes of information into knowledge continues to present a significant challenge. This transformative process often requires the trained brain of a biologist relying on pre-built computational systems to access, analyze, and visualize genomic data. At least four step are involved: data acquisition, analysis, data and results visualization, and experimental validation and refinement. Equally daunting are two chronic, computational infrastructure challenges: updating existing genomic resources with new data, and deploying new analytical tools. CoGe is a web-based software system designed to meet all of these challenges. CoGe currently stores genomes from over 7,000 organisms, comprising over 130 billion basepairs of genomic sequence data and their overlying annotations. It uses a novel genomic visualization system and a suite of interconnected and interactive tools permitting researchers anywhere in the world to quickly identify and validate genomes and genomic regions of interest, and characterize many patterns of genome evolution including synteny, whole genome duplication events, post-polyploid fractionation, subfunctionalization, deletions, local duplications, inversions, translocations, misannotations, motif patterns, and conserved noncoding sequence. By using a webbased system, it is trivial to link into CoGe’s analytical subsystems. This has proven to be an efficient way to visually proof and validate large datasets derived from automated computational pipelines, thus avoiding lists of data that cannot easily be evaluated by an end-user. Additionally, CoGe’s computational infrastructure substantially decreases the time for professionals to integrate new genomic data and analytical tools. By utilizing a database schema that is theoretically scaleable for many hundreds of thousands of genomes, deployment of new genomic information is seamless with tool integration. Likewise, when new analytical tools are developed for a studying a particular set of genomes, they are seamlessly integrated with all genomes residing in CoGe. CoGe’s ability to allow researchers to rapidly identify genes and genomic regions of interest, and visualize their evolution in comparison to any number of other genomes and genomic regions, constitutes a powerful new tool for any biologist. CoGe is publicly available at: [http://synteny.cnr.berkeley.edu/CoGe ] Current Database Statistics: Organisms: 7,400 Genomes: 7,950 Nucleotides: 135,000,000,000