The following steps serve as a guide for Dockerizing a tool in the DE (see Figure 2 below).
Sample Dockerfile 1: hisat2
Dockerfile for installing hisat2 in a Docker container based on the Ubuntu:14.04.03 image:
FROM ubuntu:14.04.3 MAINTAINER Eric Lyons RUN apt-get update && apt-get install -y \ build-essential \ git \ python ENV BINPATH /usr/bin ENV SRCPATH /usr/src ENV HISAT2GIT https://github.com/infphilo/hisat2.git ENV HISAT2PATH $SRCPATH/hisat2 RUN mkdir -p $SRCPATH WORKDIR $SRCPATH # Clone and checkout the 2.0.3-beta release version of the git repo RUN git clone "$HISAT2GIT" \ && cd $HISAT2PATH \ && git checkout 3f8c81375700d4107fdfd1caeaec01b5719ae4b8 RUN make -C $HISAT2PATH \ && cp $HISAT2PATH/hisat2 $BINPATH \ && cp $HISAT2PATH/hisat2-* $BINPATH ENTRYPOINT ["/usr/bin/hisat2"]
Sample Dockerfile 2: NCBI SRA Submission pipeline
Dockerfile for installing NCBI SRA Submission in a Docker container based on a python:2.7 base image.
FROM python:2.7-slim WORKDIR /root COPY requirements.txt ./ RUN set -x \ && apt-get update \ && apt-get install -y gcc libxml2-dev libxslt1-dev lib32z1-dev --no-install-recommends \ && rm -rf /var/lib/apt/lists/* \ && pip install -r requirements.txt \ && apt-get purge -y --auto-remove gcc lib32z1-dev # Download Aspera Connect client from http://downloads.asperasoft.com/connect2/ ADD http://download.asperasoft.com/download/sw/connect/3.5/aspera-connect-126.96.36.199523-linux-64.sh aspera-connect-install.sh # Install Aspera Connect client to ~/.aspera RUN chmod 755 aspera-connect-install.sh RUN ./aspera-connect-install.sh \ && rm aspera-connect-install.sh COPY ncbi_sra_submit.py metadata_client.py ncbi_sra_report_download.py ./ VOLUME [ "/root/config", "/root/templates", "/root/schemas" ] ENTRYPOINT [ "python", "/root/ncbi_sra_submit.py" ] CMD [ "--help" ]
Why are these good Dockerfiles? Both Dockerfiles adhered to best practices by using official Docker images and installing the tool from a reliable source.
Here is a hypothetical example of how a poor Dockerfile looks like. Let's assume we have a tool named FooBar.
FROM test/Ubuntu RUN apt-get upgrade RUN apt-get update RUN apt-get install -y emboss python RUN wget https://someuniversity.edu/test_tool/test.py
Why is this a poor Dockerfile? It used an untested/unofficial Docker image. It contains a step that fetches binaries from some server at a university.
In addition, no fail-fast was written to the Dockerfile. At some point in time, this server was taken offline, despite assurances that it would remain online forever. This causes the image build to fail because its binaries cannot be retrieved, and no errors were written to the Dockerfile.
docker build -t <your/docker-image> .
docker run --rm <your/docker-image> <entrypoint arguments>
docker run --rm -v ~/my-scratch-dir:/working-dir -w /working-dir your/docker-image user-input-1 user-input-2 ...
The -v option mounts the scratch directory on the host machine into that /working-dir directory inside the container, and the -w option sets the working directory inside the container to that same /working-dir directory.
If the tool's container produced outputs in that host's scratch directory, then this tool is ready for the next step (Request installation of the Dockerized tool in the DE).
Read the main steps in the DE user manual for submitting your request for installation of the new tool (executable) in the DE. Once the tool is installed, you will receive an email notification.
As noted in the previous step, if your tool requires Reference Genome/Sequence/Annotation input arguments, note that in your request.
Once the Dockerized tool is installed, go to the CyVerse wiki to learn how to design a new interface, preview it, and save the new app within the DE.
After creating the new app according to your design, test your app in the DE to make sure it works properly.
Complete the additional optional steps as needed for your tool.
Once the app is working to your satisfaction and you have published it, it is immediately available in your personal workspace in the DE and you can begin using it to run your own analyses. If you want to share it with other users, you can either keep it in your personal workspace and share it with selected users (including defining their permissions in the app), or share it with the public. For more information, see Sharing your App or Workflow and Editing the User Manual.
If you have not yet shared the app with the public (that is, it is still listed in your Apps under development folder in your personal workspace), you can still edit the file and create a new Dockerfile. Then email CyVerse Support to replace the Dockerfile.
Once you have shared an app with the public, it cannot be deleted because of CyVerse's commitment to supporting reproducible science. Because public apps cannot be edited once they have been made public, if you need to change the app you must create a new version of the app and then create a new Dockerfile. Learn more about editing apps.
When you share your app with the public, you will indicate the category or categories into which you think it should be placed. To request that your app be moved or added to a different or additional category, email CyVerse Support with the app name, current category or categories, and desired target category or categories.
Before you Dockerize a tool, it is important that you understand program dependencies (check the program documentation/manual thoroughly).
The Kallisto Docker image was built on an Ubuntu-64 bit Virtual Machine using Virtual Box.
1. Install Docker:
wget -qO- https://get.docker.com/ | sudo sh
2. Create a Dockerfile:
FROM ubuntu:14.04.3 MAINTAINER Kapeel Chougule LABEL Description="This image is used for running Kallisto RNA seq quantification tool" # Install dependencies RUN apt-get update && apt-get install -y build-essential cmake zlib1g-dev libhdf5-dev # Install git and clone the kallisto tool RUN apt-get install --yes git RUN git clone https://github.com/pachterlab/kallisto.git \ && cd kallisto \ && git checkout 5c5ee8a45d6afce65adf4ab18048b40d527fcf5c \ && mkdir build \ && cd build \ && cmake .. \ && make \ && make install ENTRYPOINT ["kallisto"]
3. Build a Docker image:
Docker build -t"=ubuntu/kallisto" .
4. Test the built Kallisto image:
docker run --rm -v=/Users/kchougul/Downloads:/kallisto_data -w kallisto_data ubuntu/kallisto kallisto index -i transcripts.idx transcripts.fasta.gz
5. Tag the built Kallisto image:
docker tag ubuntu/kallisto:latest kapeel/kallisto:latest
6. Push the Kallisto image to Dockerhub (optional):
docker login docker push kapeel/kallisto:latest
The ParaAT Docker image was built on Mac OS X using the Docker toolkit (quick start terminal).
1. Install Docker Toolbox for Mac OS X (see step 1 in Example 1).
2. Create a paraAT git repo on GitHub.
3 Create a paraAT git repo locally on your computer:
git init paraAT
4. Clone the paraAT git repo to local:
git clone https://github.com/jdebarry/paraat.git
5. Download the paraAT files, and then add and commit them to the local paraAT GitHub repo folder:
git add . && git commit -m "adding paraat files"
6. Push the local paraAT repo to the remote repo:
git push -u origin master
7. Create a Dockerfile:
FROM ubuntu:14.04 MAINTAINER Jeremy DeBarry email@example.com #Get paraat.pl and epal2nal.pl code and add into $PATH to make it executable anywhere, then make it executable ADD https://raw.githubusercontent.com/jdebarry/paraat/master/ParaAT2.0/ParaAT.pl /usr/bin/ ADD https://raw.githubusercontent.com/jdebarry/paraat/master/ParaAT2.0/Epal2nal.pl /usr/bin/ RUN [ "chmod", "+x", "/usr/bin/ParaAT.pl" ] RUN [ "chmod", "+x", "/usr/bin/Epal2nal.pl" ] #Installing aligners, renaming clustalw executable so paraat.pl will use it - this is a bandaid but it works RUN echo "deb http://archive.ubuntu.com/ubuntu trusty multiverse" >> /etc/apt/sources.list RUN DEBIAN_FRONTEND=noninteractive apt-get -qq update RUN DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y clustalw RUN DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y mafft RUN DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y muscle RUN DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y t-coffee RUN [ "mv", "/usr/bin/clustalw" , "usr/bin/clustalw2" ] ENTRYPOINT ["ParaAT.pl"]
8. Build the paraAT image:
docker build -t paraat .
9. Test the paraAT image:
docker run --rm -v=/Users/jdebarry/Dropbox/Docker/paraat/Development/input/:/data paraat -h /data/test.homologs -n /data/test.cds -a /data/test.pep -p proc -o
Start with Google. Also check out the websites Stack Overflow or Biostars, as well as the Docker cheat sheet for help troubleshooting the build and Dockerfile-related issues. For local testing of Dockerfiles, please refer to step 3.
The base image depends on the tool or script. You can pursue available images by searching Docker Hub for the domain (e.g., “biology”, “science”) and read the documentation for specific images.
Most frequently used CyVerse base images:
Other CyVerse base images:
Licensing is a definite factor to consider in Docker image creation. Many programs want to count downloads or have other restraints, and creating a Dockerfile means you are essentially automating access to the software. Due diligence is required to ensure that the onus is on the user and not on CyVerse (click below to view the example).
Copyright 2015. The Regents of the University of California (Regents). All Rights Reserved. Permission to use, copy, modify, and distribute this software and its documentation for educational and research not-for-profit purposes, without fee and without a signed licensing agreement, is hereby granted, provided that the above copyright notice, this paragraph and the following two paragraphs appear in all copies, modifications, and distributions. Contact The Office of Technology Licensing, UC Berkeley, 2150 Shattuck Avenue, Suite 510, Berkeley, CA 94720-1620, (510) 643-7201, for commercial licensing opportunities. Created by Nicolas Bray, Harold Pimentel, Pall Melsted and Lior Pachter, University of California, Berkeley IN NO EVENT SHALL REGENTS BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF REGENTS HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. REGENTS SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE AND ACCOMPANYING DOCUMENTATION, IF ANY, PROVIDED HEREUNDER IS PROVIDED "AS IS". REGENTS HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
Dockerfile creators should store their Dockerfiles and dependencies (wrapper scripts, etc.) in a reliable source such as GitHub or Bitbucket. CyVerse has its own GitHub repository where sources and Dockerfiles are saved for tools installed in the DE.
The DE does not specifically cap the disk size for each Docker container running in Condor nodes, so we are using the default Docker cap of 10 GB per image. This limit applies only to the tool's Docker images and data containers, not to any inputs or outputs used when running the analysis (which are mounted from the node's working directory).
Although a tool's Docker image may contain up to 10 GB of data if required, users should still strive to keep their Docker images small. This will ensure that jobs run faster for DE users when the tool's image needs to be pulled from our registry to a Condor node before running an analysis.
The DE has a number of reference genomes uploaded and available for use. View the list here. If the genome you want to use is not listed, contact CyVerse Support to ask that it be added.
No, due to security issues, currently Docker is not available to apps that require HPC resources, i.e., apps integrated using the Agave API.
Future enhancements include the ability to: