2015 07 11 ISMB/ECCB Applied Knowledge Exchange Session on Cyberinfrastructure
iPlant Trainers | |
---|---|
Guest Panelists on Bioinformatics Training | Vicky Schneider - The Genome Analysis Centre, Norwich, UK David Clements - John Hopkins University, Baltimore MD, USA |
Date | Saturday July 11th, 2015 - 1:30 PM |
Location | Room Change: Liffey Room 1 |
Workshop Checklist
1. iPlant Account
- Get your free account at http://user.iplantcollaborative.org.
2. Atmosphere Access
- Once you have your account, go to https://user.iplantcollaborative.org/dashboard/ and under Available Services, request access to Atmosphere. Indicate that you are attending a workshop when you are asked for justification.
3. Laptop
Please bring your own WiFi enabled laptop to the workshop. Make sure your laptop has the following:
- VNC Viewer: Download the DMG (for MAC) or exe (for PC): http://www.realvnc.com/download/viewer/
- Java: Please have JAVA installed and enabled (help)
- Browser: Please have an up-to-date web browser (Recommended Firefox or Safari )
- Windows Users: Please install PuTTY (http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html)
4. Pre-Workshop Guide and Homework
Download and complete the exercises in this guide to get the most out of the workshop: PRE-WORKSHOP PACKET (Updated)
Office Hour
We will be online live for any special assistance you need prior to the workshop (e.g. if you ran into trouble setting up your computer or uploading data). The live help for this workshop will be:
Wednesday July 8th from 12:00-12:30PM - session is finished if no one requests joins/requests help by 12:15 Dublin time
URL: http://cshl.adobeconnect.com/iplant_livehelp
Instructional Materials
You can download a copy of the workshop packet to follow along:
Etherpad
https://etherpad.mozilla.org/akes-ci-workshop
Draft Workshop Program (Updates in Progress)
Time | Description | Slides | Links | Presenter |
Pre-Workshop | Arrive / Sign-in / Verify iPlant Accounts |
|
| |
01:30 PM | Self-intros and overview of Biological Cyberinfrastructure | slides | Jason | |
01:40 PM | Scaling Data: Overview of challenges | John | ||
01:50 PM | Scaling Data: Tutorial – Data Sharing Basics | tutorial/demo | Jason | |
02:05 PM | Scaling Compute: Overview of the challenges | slides |
| Jason |
02:10 PM | Scaling Compute: Demo – RNA-Seq | tutorial/demo | Jason | |
03:30 PM | Coffee/Tea Break |
|
| |
03:55 PM | Scaling Compute: Cloud Computing with Atmosphere |
| John | |
04:05 PM | Scaling Compute: Tutorial – Visualizing genome annotations with Atmosphere | tutorial/demo | John | |
04:30 PM | Scaling Compute: Agave API (demo) | slides | John | |
04:45 PM | Scaling People: How training networks and collaborations help users scale capabilities | slides | All | |
05:30 PM | End |
Session Abstract:
Cyberinfrastructure (CI) is a powerful enabler for data-intensive biology. Although much investigation originates in organism-centered communities (plant, animal, microbes…), there are unifying similarities across types of datasets (sequencing, imaging, geospatial…), algorithms (assembly, alignment, association…), and personal objectives (student, faculty, industry…). Despite these commonalities, communities often split across domains as independently-developed tools, unshareable datasets, and uncommunicated experience results in isolation and needless redundancy. Utilizing common CI allows users to analyze and share data and experience efficiently by allowing communities to leverage pre-built CI solutions and develop application-specific components to a customized endpoint.
Navigating and connecting to CI is a critical component of computational thinking and essential to 21st century biology. Utilizing the CI developed by the iPlant Collaborative, the national biological cyberinfrastructure funded by the U.S. National Science Foundation, this session will include tutorials, demos and discussions that illustrate how CI allows science and people to scale along 3 topical areas. Originally servicing the plant science community, iPlant CI support challenges in all life sciences and the resulting open source CI integrates proven foundational platforms and technologies. Lessons learned will apply far beyond iPlant CI.
Scaling Data: The lifecycle of data necessitates interdisciplinary collaborations and team science approaches spanning multiple departments, institutes, and even continents. We will demonstrate how the Data Store utilizes IRODS technology to make sharing of large biological datasets routine. Users will learn how to upload data, manage metadata, and how the Data Commons can power collaborations. We will demonstrate image analysis as an exemplar use case for metadata.
Scaling Compute: Web-accessible tools and application interfaces for data analysis and management leverage federated data and consumption of resources from multiple providers (such as NSF funded XSEDE: eXtreme Science and Engineering Discovery Environment), campus clusters, and commercial clouds. Communities can access an array of tools and services, and if required extend the CI to accommodate specific needs. We will cover three methods of scaling compute through discussion and hands-on demos: 1) Discovery Environment - Web based interface to bioinformatics application and HPC; we will cover an RNA-Seq tutorial as a popular workflow 2)Atmosphere Cloud Compute – We will give an overview of the Atmosphere cloud, demo data visualization applications, and cover developer resources including virtual machines. 3) Science APIs - Web-based Application Programming Interfaces (APIs) to support automation and integration of tools and services in other applications and third party platforms; tutorials will cover authentication and job management as well as introduce the developer toolkit.
Scaling People: People are by definition a component of cyberinfrastructure; learning must engage all levels of users (from beginner to expert). iPlant CI’s orientation and learning materials are available in the form of asynchronous online tutorials, onsite workshops, webcasts, and support forums. A discussion with panelists from several bioinformatics-related projects will focus on educational applications of CI, best practices in training, how to develop self-sustaining training efforts (through training trainers), and how collaborating with efforts such as the Software Carpentry and Data Carpentry organizations accelerate user capabilities.