Team Members:

Sateesh Peri:

Tanner Campbell:

Naomi Yescas:

Mohammad Moghaddam:

Sahil Brahmankar:



    blastEasy is a framework of virtual machine images for the purpose of improving genomics training and research for students and professors part of the Genomics Education Alliance. blastEasy provides an easy and scalable way of running protein and nucleotide searches by modifying and sequenceserver, an existing tool with launches a server to allow users to use the Basic Local Alignment Search Tool (BLAST). SequenceServer provides a user-friendly interface to allow those with limited command line or computer science experience to conduct protein and nucleotide database searches using BLAST. This service is limited in the number of users it can support without significant loss in search speed. blastEasy provides a solution by intercepting each BLAST search and using the Cooperative Computing Lab’s WorkQueue, distributes work loads across multiple machines. This allows for scalability and improved search times for classrooms of multiple students conducting searches at the same time.

Description and Technical Objectives

    blastEasy harnesses the power and flexibility of Cyverse’s Atmosphere cloud computing platform by packaging sequenceserver, cctools, and BLAST databases into virtual machine images. It requires a level of familiarity with launching virtual machines (VMs) on Atmosphere as well as a Cyverse account. One simply creates a Master-VM using the TeamBLASTEasy SeqServer 1.0.12 image and one or more Worker-VMs using the CCTools_7.0.19 image. Then, the instructor can launch sequencserver on the Master-VM and connect as many Worker-VMs as needed. The students can conduct BLAST searches using sequenceserver as they normally would and the blastEasy framework will handle distributing searches across all available workers.

blastEasy is Scalable:

This means blastEasy’s framework is not limited to Atmosphere VM’s. It’s possible for any machine with access to the blastEasy containers to supply computational power.

blastEasy is Customizable:

blastEasy is Easy:

GOAL: Create GEA BLAST service to support genomics training and research for undergraduate students

+ Develop a system that divides and distributes BLAST searches across multiple nodes and processors to obtain results faster. [critical]

+ Host BLAST implementations that support multiple classes of at least 100 students. [critical]

+ Assurance that the ~100 jobs will finish at approximately the same time. [critical]

+ Require support for faculty to create custom BLAST databases and adjust BLAST search parameters [nice-to-have]

+ Require support for caching BLAST results [nice-to-have]

+ Provide authentication for security [nice-to-have]

+ video tutorial demo of the end-product [nice-to-have]



blasteasy source code:


Atmosphere VMs:

    Master Image: TeamBLASTEasy SeqServer 1.0.12

    Worker Image: CCTools_7.0.19

blastEasy Setup Instructions

Instructions to Instructor:

Note: Setup time takes around half an hour prior to class

Blast Databases


  1. Launch a Master (small) instance which will broadcast as a Master using this image.

  2. Launch a Worker (medium to large) instance with this image with this cctools image.

  3. On the Master VM, launch sequenceServer as follows: sequenceserver -d /path_to_databases

Note: Take a note of the Master VM's IP_ADDRESS and the port on which sequenceServer is listening for the next steps.

  1. Now you or your students can open a web-browser and go to IP_ADDRESS_of_Master_VM:PORT to access sequence server front-end.

  2. Connect Work_Queue_Factory to Master VM before submitting blast jobs by work_queue_factory IP PORT -T local -w Min_NUM_OF_Workers

NOTE: The PORT for connecting work_queue_factory above would be the (Sequence_Server_PORT_NUM + 1)

Note: One can connect as many Work_Queue_Factory's as needed as above but, make sure to have the blast databases in the same path as Master and other workers.

  1. Once worker factory is connected, blast queries can be submitted and results can be accessed using front end while the time to blast query is printed on the Master VM backend terminal for benchmarking.

Team Members

Sateesh Peri:

    Role: Team lead, backend-design

    Expertise: Bioinformatics, Genetics, Cyverse & Cloud Computing

Tanner Campbell:

    Role: sequenceserver-reverse-engineering, code, backend-design

    Expertise: Celestial Mechanics/Spacecraft GNC/ Machine Learning

Mohammad Moghaddam:

    Role: Benchmarking, Testing

    Expertise: Hydrology/MIS, Machine Learning, Statistics

Sahil Brahmankar:

    Role: Benchmarking, Testing

    Expertise: Information Science

Naomi Yescas:

    Role: Documentation, Concept Map

    Expertise: Information Science, Machine Learning

Project Timeline:



Identify Stakeholders, Preliminary Planning and Concept Map


Use and test sequenceserver docker container


Benchmark sequenceserver on 1, 2, 4, 8, and 16 CPU-virtual machines


Implement single BLAST queue parallelization using Makeflow and Workqueue


Create sequenceserver Atmosphere Image modified to wait for workers


Launch sequenceserver with workers across multiple VMs



  • blastEasy GitHub source code

  • blastEasy DockerHub container

  • Master Atmosphere Image 

  • Worker Atmosphere Image 


Benchmarking Part-1: Initial testing


Table-1: Benchmark results; multiple Atmosphere virtual machines with 1, 2, 4, 8, and 16 CPU cores


The benchmarking was one by launching virtual machines with sequenceserver images with access to 1, 2, 4, 8, and 16 CPUs. The ncbi-blast nt and refseq_protein databases were downloaded and random dna and protein sequences were generated. Nucleotide sequences of 1000, 2000, 5000, 10000, 50000, 100000, 500000 were tested across each virtual machine. The protein queries were tested using multiple sequences: 1, 5, 10, 50, 100, and 500. The time was calculated using sequenceserver’s debug mode, which displays the time each search begins and when it ends and starts a new process. This was done using sequenceserver’s debug command (-D): sequenceserver -n 14 -D -d blast_dbs/ and checking the output for the displayed time: 

Blast Begins:

[Date&Time] DEBUG Executing: blastn -db …

Blast Ends:

[Date&Time] DEBUG Executing: blast_formatter …

Benchmarking Part-2: Prototype Testing


Additional Links and Resources:



    More info and Instructions:



    More info and Instructions:




Presentation Slides:

Midterm Demo:

Human GAPDH sequence:


>NR_152150.2 Homo sapiens glyceraldehyde-3-phosphate dehydrogenase (GAPDH), transcript variant 6, non-coding RNA


Random DNA generator:

Post-Mortem Analysis

What worked well:

What didn't work well:

What could have been differently: