2015 07 11 ISMB/ECCB Applied Knowledge Exchange Session on Cyberinfrastructure

iPlant Trainers

John Fonner, Jason Williams

Guest Panelists

on Bioinformatics Training

Vicky Schneider - The Genome Analysis Centre, Norwich, UK

David Clements - John Hopkins University, Baltimore MD, USA

Date

Saturday July 11th, 2015 - 1:30 PM

Location

Room Change: Liffey Room 1

Workshop Checklist

1. iPlant Account

2. Atmosphere Access

3. Laptop

Please bring your own WiFi enabled laptop to the workshop. Make sure your laptop has the following:

4. Pre-Workshop Guide and Homework

Download and complete the exercises in this guide to get the most out of the workshop: PRE-WORKSHOP PACKET (Updated)

Office Hour

We will be online live for any special assistance you need prior to the workshop (e.g. if you ran into trouble setting up your computer or uploading data). The live help for this workshop will be:

Wednesday July 8th  from 12:00-12:30PM - session is finished if no one requests joins/requests help by 12:15 Dublin time

URL: http://cshl.adobeconnect.com/iplant_livehelp

Instructional Materials

You can download a copy of the workshop packet to follow along:

WORKSHOP PACKET

Etherpad

https://etherpad.mozilla.org/akes-ci-workshop

Draft Workshop Program (Updates in Progress)

Time

Description

Slides

Links

Presenter

Pre-Workshop

Arrive / Sign-in / Verify iPlant Accounts 

 

 

 

01:30 PM

Self-intros and overview of Biological Cyberinfrastructure

slides Jason
01:40 PM

Scaling Data: Overview of challenges

slides

 John
01:50 PMScaling Data: Tutorial – Data Sharing Basics tutorial/demoJason
02:05 PM

Scaling Compute: Overview of the challenges

slides

 

Jason
02:10 PM

Scaling Compute: Demo – RNA-Seq

 tutorial/demoJason
03:30 PM

Coffee/Tea Break

 

 

 

03:55 PMScaling Compute: Cloud Computing with Atmosphere

 slides

 

John
04:05 PMScaling Compute: Tutorial – Visualizing genome annotations with Atmosphere tutorial/demoJohn
04:30 PM

Scaling Compute: Agave API (demo)

 slides 

John

04:45 PM

Scaling People: How training networks and collaborations help users scale capabilities

slides All
05:30 PM

End

   

Session Abstract:

Cyberinfrastructure (CI) is a powerful enabler for data-intensive biology. Although much investigation originates in organism-centered communities (plant, animal, microbes…), there are unifying similarities across types of datasets (sequencing, imaging, geospatial…), algorithms (assembly, alignment, association…), and personal objectives (student, faculty, industry…). Despite these commonalities, communities often split across domains as independently-developed tools, unshareable datasets, and uncommunicated experience results in isolation and needless redundancy. Utilizing common CI allows users to analyze and share data and experience efficiently by allowing communities to leverage pre-built CI solutions and develop application-specific components to a customized endpoint.

Navigating and connecting to CI is a critical component of computational thinking and essential to 21st century biology. Utilizing the CI developed by the iPlant Collaborative, the national biological cyberinfrastructure funded by the U.S. National Science Foundation, this session will include tutorials, demos and discussions that illustrate how CI allows science and people to scale along 3 topical areas. Originally servicing the plant science community, iPlant CI support challenges in all life sciences and the resulting open source CI integrates proven foundational platforms and technologies. Lessons learned will apply far beyond iPlant CI.

Scaling Data: The lifecycle of data necessitates interdisciplinary collaborations and team science approaches spanning multiple departments, institutes, and even continents. We will demonstrate how the Data Store utilizes IRODS technology to make sharing of large biological datasets routine. Users will learn how to upload data, manage metadata, and how the Data Commons can power collaborations. We will demonstrate image analysis as an exemplar use case for metadata.

Scaling Compute: Web-accessible tools and application interfaces for data analysis and management leverage federated data and consumption of resources from multiple providers (such as NSF funded XSEDE: eXtreme Science and Engineering Discovery Environment), campus clusters, and commercial clouds. Communities can access an array of tools and services, and if required extend the CI to accommodate specific needs. We will cover three methods of scaling compute through discussion and hands-on demos: 1) Discovery Environment - Web based interface to bioinformatics application and HPC; we will cover an RNA-Seq tutorial as a popular workflow 2)Atmosphere Cloud Compute – We will give an overview of the Atmosphere cloud, demo data visualization applications, and cover developer resources including virtual machines. 3) Science APIs - Web-based Application Programming Interfaces (APIs) to support automation and integration of tools and services in other applications and third party platforms; tutorials will cover authentication and job management as well as introduce the developer toolkit.

Scaling People: People are by definition a component of cyberinfrastructure; learning must engage all levels of users (from beginner to expert). iPlant CI’s orientation and learning materials are available in the form of asynchronous online tutorials, onsite workshops, webcasts, and support forums. A discussion with panelists from several bioinformatics-related projects will focus on educational applications of CI, best practices in training, how to develop self-sustaining training efforts (through training trainers), and how collaborating with efforts such as the Software Carpentry and Data Carpentry organizations accelerate user capabilities.