Team Star's solution is to integrate SequenceServer and WorkQueue to run workers on an HPC. To do this, we initialized an Atmosphere Virtual Machine Instance with CCTools to ensure that we could utilize WorkQueue. Team blastEasy provided a SeqInit.py script which starts SequenceServer and initiates the WorkQueue master file. Once we adjusted the script to meet our needs, we were able to establish a connection between the Virtual Machine, which acts as the master for WorkQueue, and the HPC. From this, we were able to bring up the SequenceServer GUI in a browser using an IP address and port number.
Team blastEasy's integration of WorkQueue limits the amount of possible blast searches occurring at the same time. Team Star has taken this a step further and improved scalability by using the power of an HPC.
In our solution, the Team blastEasy sequenceserver2.0 Virtual Machine Instance is used as the Master. Simply, upon starting Sequenceserver, the Master is up and running, waiting for the workers to accept jobs.
At the same time, on an HPC, a fexible .pbs file allows the HPC to be populated with workers. This .pbs can be modified to scale depending on class size and required power. Once the .pbs is submitted and the workers are up and running, the Master VM will submit jobs to the Workers on the HPC.
The beauty of this method is that it does not have limitations: although made to assist blast searches for large classrooms, it can be scaled down as one may prefer.
Team Star is thankful for the work of team blastEasy, the sequenceserver team, the CCTools team, and all the students and instructors of the UArizona ACIC course.
For detailed instructions (requirements, installation, running) please visit our Github page.
Benchmark was performed using the team blastEasy sequenserver2.0 VM image as the Master and the HPC as Workers.
Database used: Mouse protein database (46 Mb), custom.
The Benchmarks were performed with 4 different computers simultaneously running the same query search. Each computer would have 4 tabs of sequenceserver and each query was submitted at the same time (therefore 16 simultaneous blasts). Trial times were recorded as when Sequenceserver would display the results.
Notes: more benchmarking could be made - scaling the workers available resources (cpus, ram, nodes) would have resulted in better data.
|nodes = 1 , cpus = 6 , mem = 24gb|
|time (trial 1)||time (trial 2)||time (trial 3)||time (trial 4)||time (trial 5)||time (trial 6)|
|Average Sec / # Runs||n/a||1.634023438||2.9078125||2.110039063||11.70930556||59.035625|
1- successful implementation of the master-worker setup within the same machine (tested with work_queue_example.py)
2- successful implementation of the master-worker setup between two virtual machines (tested with work_queue_example.py)
3- successful implementation of the master-worker setup between a virtual machine and the HPC (tested with work_queue_example.py)
4- successfully ran blast search between master virtual machine and worker HPC
5- successful implementation of the master-worker setup between the blastEasy sequenceserver virtual machine and the HPC
Team member contributions:
Michele Cosi: Terminal, Documentation, PBS script
Emmanuel Gonzalez: Presentation, Terminal, PBS script
Anthony Dominguez: Coding, Terminal, PBS script
TJ Lippincott: Coding, Terminal, PBS script
Brandi Diesso: Presentation, Documentation
The Good: pride in the final product; learning challenge.
The Bad: Communication. Need more communication between all parties.
The Ugly: Timing. Should have started earlier.
What to do different: Communicate. Listen. Start earlier.