2015 07 11 ISMB/ECCB Applied Knowledge Exchange Session on Cyberinfrastructure

iPlant Trainers	John Fonner, Jason Williams
Guest Panelists on Bioinformatics Training	Vicky Schneider - The Genome Analysis Centre, Norwich, UK David Clements - John Hopkins University, Baltimore MD, USA
Date	Saturday July 11th, 2015 - 1:30 PM
Location	Room Change: Liffey Room 1

Workshop Checklist

1. iPlant Account

Get your free account at http://user.iplantcollaborative.org.

2. Atmosphere Access

Once you have your account, go to https://user.iplantcollaborative.org/dashboard/ and under Available Services, request access to Atmosphere. Indicate that you are attending a workshop when you are asked for justification.

3. Laptop

Please bring your own WiFi enabled laptop to the workshop. Make sure your laptop has the following:

VNC Viewer: Download the DMG (for MAC) or exe (for PC): http://www.realvnc.com/download/viewer/
Java: Please have JAVA installed and enabled (help)
Browser: Please have an up-to-date web browser (Recommended Firefox or Safari )
Windows Users: Please install PuTTY (http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html)

4. Pre-Workshop Guide and Homework

Download and complete the exercises in this guide to get the most out of the workshop: PRE-WORKSHOP PACKET (Updated)

Office Hour

We will be online live for any special assistance you need prior to the workshop (e.g. if you ran into trouble setting up your computer or uploading data). The live help for this workshop will be:

Wednesday July 8th from 12:00-12:30PM - session is finished if no one requests joins/requests help by 12:15 Dublin time

URL: http://cshl.adobeconnect.com/iplant_livehelp

Instructional Materials

You can download a copy of the workshop packet to follow along:

WORKSHOP PACKET

Etherpad

https://etherpad.mozilla.org/akes-ci-workshop

Draft Workshop Program (Updates in Progress)

Time	Description	Slides	Links	Presenter
Pre-Workshop	Arrive / Sign-in / Verify iPlant Accounts
01:30 PM	Self-intros and overview of Biological Cyberinfrastructure	slides		Jason
01:40 PM	Scaling Data: Overview of challenges	slides		John
01:50 PM	Scaling Data: Tutorial – Data Sharing Basics		tutorial/demo	Jason
02:05 PM	Scaling Compute: Overview of the challenges	slides		Jason
02:10 PM	Scaling Compute: Demo – RNA-Seq		tutorial/demo	Jason
03:30 PM	Coffee/Tea Break
03:55 PM	Scaling Compute: Cloud Computing with Atmosphere	slides		John
04:05 PM	Scaling Compute: Tutorial – Visualizing genome annotations with Atmosphere		tutorial/demo	John
04:30 PM	Scaling Compute: Agave API (demo)	slides		John
04:45 PM	Scaling People: How training networks and collaborations help users scale capabilities	slides		All
05:30 PM	End

Session Abstract:

Cyberinfrastructure (CI) is a powerful enabler for data-intensive biology. Although much investigation originates in organism-centered communities (plant, animal, microbes…), there are unifying similarities across types of datasets (sequencing, imaging, geospatial…), algorithms (assembly, alignment, association…), and personal objectives (student, faculty, industry…). Despite these commonalities, communities often split across domains as independently-developed tools, unshareable datasets, and uncommunicated experience results in isolation and needless redundancy. Utilizing common CI allows users to analyze and share data and experience efficiently by allowing communities to leverage pre-built CI solutions and develop application-specific components to a customized endpoint.

Navigating and connecting to CI is a critical component of computational thinking and essential to 21st century biology. Utilizing the CI developed by the iPlant Collaborative, the national biological cyberinfrastructure funded by the U.S. National Science Foundation, this session will include tutorials, demos and discussions that illustrate how CI allows science and people to scale along 3 topical areas. Originally servicing the plant science community, iPlant CI support challenges in all life sciences and the resulting open source CI integrates proven foundational platforms and technologies. Lessons learned will apply far beyond iPlant CI.

Scaling Data: The lifecycle of data necessitates interdisciplinary collaborations and team science approaches spanning multiple departments, institutes, and even continents. We will demonstrate how the Data Store utilizes IRODS technology to make sharing of large biological datasets routine. Users will learn how to upload data, manage metadata, and how the Data Commons can power collaborations. We will demonstrate image analysis as an exemplar use case for metadata.

Scaling Compute: Web-accessible tools and application interfaces for data analysis and management leverage federated data and consumption of resources from multiple providers (such as NSF funded XSEDE: eXtreme Science and Engineering Discovery Environment), campus clusters, and commercial clouds. Communities can access an array of tools and services, and if required extend the CI to accommodate specific needs. We will cover three methods of scaling compute through discussion and hands-on demos: 1) Discovery Environment - Web based interface to bioinformatics application and HPC; we will cover an RNA-Seq tutorial as a popular workflow 2)Atmosphere Cloud Compute – We will give an overview of the Atmosphere cloud, demo data visualization applications, and cover developer resources including virtual machines. 3) Science APIs - Web-based Application Programming Interfaces (APIs) to support automation and integration of tools and services in other applications and third party platforms; tutorials will cover authentication and job management as well as introduce the developer toolkit.

Scaling People: People are by definition a component of cyberinfrastructure; learning must engage all levels of users (from beginner to expert). iPlant CI’s orientation and learning materials are available in the form of asynchronous online tutorials, onsite workshops, webcasts, and support forums. A discussion with panelists from several bioinformatics-related projects will focus on educational applications of CI, best practices in training, how to develop self-sustaining training efforts (through training trainers), and how collaborating with efforts such as the Software Carpentry and Data Carpentry organizations accelerate user capabilities.