# Make your life (and analysis) easier with containers -- ## Audience - Are you a biologist? - Have you heard of Docker? - Not sure where to start? -- ### You've come to the right place! -- ## Me - Software Engineer - Build software infrastructure for researchers - Help researchers to use computational tools - Was a 'container skeptic' -- ## CyVerse Helps researchers: 1. Learn about, and 2. Productively use New tech like containers --- ## Analysis is getting complex - Multiple software packages (R, Python, etc.) - With specific versions - Have to work together - On different platforms -- ## The pain - Hard to install one-by-one - Wasted effort and time - Fragile, hard-to-reproduce analyses -- ## Help! Make it stop! How we we make it easy to install & use things consistently? -- ## Containers! * New packages & apps are increasingly available as containers (BioContainers, etc.) Note: - "BioContainers is an open source and community-driven framework which provides system-agnostic executable environments for bioinformatics software. BioContainers framework allows software to be installed and executed under an isolated and controllable environment." - There will be a webinar specifically on BioContainers in the near future --- ## Concepts & Terms Note: - These are broad strokes -- ## Image A self-contained, read-only 'snapshot' of your applications and packages, with all their dependencies -- ## Dockerfile (or Singularity recipe) Executable instructions (script) for: - Creating an image - Specifing the 'entry point' for the container -- ## Container - A 'running image' Note: - The entry point is executed - From Matt Rich's Singularity tutorial: The running container will have exactly the environment defined in the image. -- ## Docker - A server (sometimes called a daemon): A program that runs in the background, and handles life cycle of images and containers - A command-line client: You use it to tell the server what to do Download from: Note: The reason they made a separate server and client is so that you can have the server program running on a different machine from the the client -- ## Singularity A way to run containers on HPC Find out more: Note: - Because of computer security reasons HPC folks usually don't allow Docker - It is easy to create Singularity images from Docker images - With Singularity there is no separate server and client -- ## What about my data? Do not put your data in the image! - Local data: 'Mount' it into a container when you start it - Remote data: Pull into the container once it's running (e.g. CyVerse Data Store, S3, etc.) Note: - "Bind mounts" make the host's filesystem accessible inside the container. -- ## Compute resources I need more! Talk to us. There are a few options, and it depends on what you need. -- ## Sharing containers Image registries Note: - Singularity Hub and Docker Hub --- ## Using Containers Note: - These are pre-recorded -- ## Demo: Command line app ```bash mkdir -p ~/blast cd ~/blast docker pull biocontainers/blast:v2.2.31_cv2 docker run biocontainers/blast:v2.2.31_cv2 blastp -help wget http://www.uniprot.org/uniprot/P04156.fasta curl -O ftp://ftp.ncbi.nih.gov/\ refseq/D_rerio/mRNA_Prot/zebrafish.1.protein.faa.gz gunzip zebrafish.1.protein.faa.gz docker run -v $PWD:/data/ biocontainers/blast:v2.2.31_cv2 \ makeblastdb -in zebrafish.1.protein.faa -dbtype prot docker run -v $PWD:/data/ biocontainers/blast:v2.2.31_cv2 \ blastp -query P04156.fasta -db zebrafish.1.protein.faa \ -out results.txt cat results.txt ``` Note: - See BioContainer Example: https://biocontainers.pro/docs/101/running-example/ - The above commands assumes that you have Docker installed -- ## Demo: Web app (Jupyter) ```bash cd ~ git clone https://github.com/plyte/blastn-jupyter-docker.git cd blastn-jupyter-docker/ docker build --tag blastn-jupyter-docker:local . docker run -p 8888:8888 blastn-jupyter-docker:local ``` Note: - See example: https://github.com/plyte/blastn-jupyter-docker - Explain local IP & port --- ## CyVerse support for containers 1. Command line (Atmosphere) 2. Interactive apps (VICE) 3. HPC (XSEDE & OSG) Note: - On Atmosphere run: `ezd` or `ezs` - First will install Docker, the second Singularity --- ## Summary - Package your analysis pipeline in a single container - Everyone in your lab can have a consistent environment -- ## Next time - How to build containers - Running on different platforms - Science applications -- ## Links & references - [Docker](https://www.docker.com/) - [Singularity](https://www.sylabs.io/singularity/) - [Play with Docker Classroom](https://training.play-with-docker.com/) - [Katacode - Learn Docker](https://www.katacoda.com/courses/docker/) - [CyVerse Container Camp materials](https://cyverse-container-camp-workshop-2018.readthedocs-hosted.com/en/latest/) - [Reproducible research with containers](http://typingducks.com/blog/reproducible_research_with_containers/) - [Upendra's Cybercarpentry workshop notes](https://cyverse-cybercarpentry-container-workshop-2018.readthedocs-hosted.com/en/latest/) - [Tyson Swetnam's Container Camp Presentation](https://gitpitch.com/tyson-swetnam/cc-camp) - [Matthew Rich's Singularity workshop](https://nuitrcs.github.io/singularity-workshop/) - [BioContainers](https://biocontainers.pro/) -- ## Thanks! - Nirav Merchant - Upendra Devisetty - Tyson Swetnam - Blake Joyce - Eric Lyons - Ariella Gladstein - Tina Lee - Shelley Littin -- ![CyVerse Webinar](cyverse_webinar_end_slide.png "CyVerse Webinar")