# Make your life (and analysis) easier with containers
--
## Audience
- Are you a biologist?
- Have you heard of Docker?
- Not sure where to start?
--
### You've come to the right place!
--
## Me
- Software Engineer
- Build software infrastructure for researchers
- Help researchers to use computational tools
- Was a 'container skeptic'
--
## CyVerse
Helps researchers:
1. Learn about, and
2. Productively use
New tech like containers
---
## Analysis is getting complex
- Multiple software packages (R, Python, etc.)
- With specific versions
- Have to work together
- On different platforms
--
## The pain
- Hard to install one-by-one
- Wasted effort and time
- Fragile, hard-to-reproduce analyses
--
## Help! Make it stop!
How we we make it easy to install & use things consistently?
--
## Containers! *
New packages & apps are increasingly available as containers (BioContainers, etc.)
Note:
- "BioContainers is an open source and community-driven framework which provides system-agnostic executable environments for bioinformatics software. BioContainers framework allows software to be installed and executed under an isolated and controllable environment."
- There will be a webinar specifically on BioContainers in the near future
---
## Concepts & Terms
Note:
- These are broad strokes
--
## Image
A self-contained, read-only 'snapshot' of your applications and packages, with all their dependencies
--
## Dockerfile (or Singularity recipe)
Executable instructions (script) for:
- Creating an image
- Specifing the 'entry point' for the container
--
## Container
- A 'running image'
Note:
- The entry point is executed
- From Matt Rich's Singularity tutorial: The running container will have exactly the environment defined in the image.
--
## Docker
- A server (sometimes called a daemon): A program that runs in the background, and handles life cycle of images and containers
- A command-line client: You use it to tell the server what to do
Download from:
Note:
The reason they made a separate server and client is so that you can have the server program running on a different machine from the the client
--
## Singularity
A way to run containers on HPC
Find out more:
Note:
- Because of computer security reasons HPC folks usually don't allow Docker
- It is easy to create Singularity images from Docker images
- With Singularity there is no separate server and client
--
## What about my data?
Do not put your data in the image!
- Local data: 'Mount' it into a container when you start it
- Remote data: Pull into the container once it's running (e.g. CyVerse Data Store, S3, etc.)
Note:
- "Bind mounts" make the host's filesystem accessible inside the container.
--
## Compute resources
I need more!
Talk to us. There are a few options, and it depends on what you need.
--
## Sharing containers
Image registries
Note:
- Singularity Hub and Docker Hub
---
## Using Containers
Note:
- These are pre-recorded
--
## Demo: Command line app
```bash
mkdir -p ~/blast
cd ~/blast
docker pull biocontainers/blast:v2.2.31_cv2
docker run biocontainers/blast:v2.2.31_cv2 blastp -help
wget http://www.uniprot.org/uniprot/P04156.fasta
curl -O ftp://ftp.ncbi.nih.gov/\
refseq/D_rerio/mRNA_Prot/zebrafish.1.protein.faa.gz
gunzip zebrafish.1.protein.faa.gz
docker run -v $PWD:/data/ biocontainers/blast:v2.2.31_cv2 \
makeblastdb -in zebrafish.1.protein.faa -dbtype prot
docker run -v $PWD:/data/ biocontainers/blast:v2.2.31_cv2 \
blastp -query P04156.fasta -db zebrafish.1.protein.faa \
-out results.txt
cat results.txt
```
Note:
- See BioContainer Example: https://biocontainers.pro/docs/101/running-example/
- The above commands assumes that you have Docker installed
--
## Demo: Web app (Jupyter)
```bash
cd ~
git clone https://github.com/plyte/blastn-jupyter-docker.git
cd blastn-jupyter-docker/
docker build --tag blastn-jupyter-docker:local .
docker run -p 8888:8888 blastn-jupyter-docker:local
```
Note:
- See example: https://github.com/plyte/blastn-jupyter-docker
- Explain local IP & port
---
## CyVerse support for containers
1. Command line (Atmosphere)
2. Interactive apps (VICE)
3. HPC (XSEDE & OSG)
Note:
- On Atmosphere run: `ezd` or `ezs`
- First will install Docker, the second Singularity
---
## Summary
- Package your analysis pipeline in a single container
- Everyone in your lab can have a consistent environment
--
## Next time
- How to build containers
- Running on different platforms
- Science applications
--
## Links & references
- [Docker](https://www.docker.com/)
- [Singularity](https://www.sylabs.io/singularity/)
- [Play with Docker Classroom](https://training.play-with-docker.com/)
- [Katacode - Learn Docker](https://www.katacoda.com/courses/docker/)
- [CyVerse Container Camp materials](https://cyverse-container-camp-workshop-2018.readthedocs-hosted.com/en/latest/)
- [Reproducible research with containers](http://typingducks.com/blog/reproducible_research_with_containers/)
- [Upendra's Cybercarpentry workshop notes](https://cyverse-cybercarpentry-container-workshop-2018.readthedocs-hosted.com/en/latest/)
- [Tyson Swetnam's Container Camp Presentation](https://gitpitch.com/tyson-swetnam/cc-camp)
- [Matthew Rich's Singularity workshop](https://nuitrcs.github.io/singularity-workshop/)
- [BioContainers](https://biocontainers.pro/)
--
## Thanks!
- Nirav Merchant
- Upendra Devisetty
- Tyson Swetnam
- Blake Joyce
- Eric Lyons
- Ariella Gladstein
- Tina Lee
- Shelley Littin
--
![CyVerse Webinar](cyverse_webinar_end_slide.png "CyVerse Webinar")