Reetu and Friends
Reetu Tuteja - Created the Sequenceserver2.0 app on Discovery Environment, presentation, video tutorial
Jennen Maryniak - Presentation, plots
Jiatian Wang - Initial Makeflow and Work Queue scripts
Erika Tapia - Benchmarking and documentation
Nick Reppe - Benchmarking, Demo/Procedure
CyVerse provides life scientists with a powerful computational infrastructure to handle huge datasets and complex analyses that enable data-driven discovery. CyVerse's extensible platforms provide data storage, bioinformatics tools, image analyses, cloud services, APIs, and more.
The Discovery Environment (DE) is a key product of the CyVerse cyber-infrastructure, providing a modern web interface for powerful computing. The visual and interactive computing environment (VICE) is the recently introduced feature within CyVerse’s Discovery Environment (DE) for running interactive apps.
Our midterm project is an implementation of CyVerse's Discovery Environment using an app we created via VICE within CyVerse.
Sequences used were provided by Team BlastEasy :
Did our best to learn and implement ways distributed computing using HPCs and a Work Queue platform, but ultimately decided on using CyVerse's Discovery Environment as a possible solution.
It all starts with an account on CyVerse where you can access the Discovery Environment once logged in. An app which previously existed can be used to generate a potential database. We created an app called "sequenceserver" to utilize the output of that app as the database to test a query against. If you already have a database, there is no requirement to run the app called “Create BLAST database-2.6.0+".
The instructions are in the pdf: Procedures for DE sequence analysis.pdf
Benchmarking was ran by running the sequenceserver app we created in the Discovery Environment (DE) within CyVerse. Before launching the app, the number of cores was specified in the drop-down field "Number of threads":
The running analysis was accessed which allowed us to paste the protein sequences in Sequence Server to run BLAST:
All protein sequences ran through Sequence Server were 100 residues in length and the number of cores were pre-selected before launching the app analysis. A significant decrease in run-time was observed as more cores were involved in running queries. Running queries on 8 cores clearly reduces run-time as compared to running queries on simply 1 core.
Number of Protein Sequences