Critical: The client has asked us to use their existing SequenceServer UI, and optimize the back-end querying process to NCBI’s database. It must be able to support multiple classrooms of 100 people, possibly querying at the same time for about 2-3 minutes.
Nice-to-haves: Ideally, the client would like for caching to be enabled to make querying quicker. Also, making any functionality updates to their UI would be nice too.
Plan to Achieve Deliverables
I’m envisioning a Discovery Environment app that can be launched on a Virtual Machine with a series of checkboxes allowing the instructor to determine the database and search characteristics of each instance before the VM is launched.
Implement some sort of database round-robin load balancer, fulfilled through Azure/AWS/cloud provider, that upon reaching maximum capacity on the first SequenceServer instance, will spin up another instance to help manage the load. These different instances would then, with the round-robin methodology, be rotated out as needed. These would maintain continuity through a master-slave relationship, where a master copy of the database would reside above the others, and the ‘slaves’ so to speak would all report back to and synch up with the master copy.
Technology Requirements - Schools would need an account with the cloud provider, as well as a budget to accommodate the amount of additional traffic handled by Azure.
Strengths - Using a cloud provider with limitless capacity is extremely scalable. Master-slave architecture is much easier to implement that multi-master architecture, and it maintains atomicity, consistency, durability and isolation (ACID) properties.
Weaknesses - We’d have to find a cloud provider that supports load balancing with SequenceServer, or implement our own. There would be an added cost with using a 3rd party cloud provider. Having a master-slave architecture would make it super fault intolerant, cuz if the master copy goes down, the slaves have nothing to synch with.
Potential Unknowns/Problems - This is a huge undertaking to implement, in theory it makes sense but implementing such an architecture will be ridden with speed bumps.
List of Questions
How big are the submitted queries?
Is cost an issue here as we design a solution? Something we should worry about?
Are there any glaring shortcomings that you would like to avoid on the new solution design?
Does the client need to be able to supply a customized database?
Do different institutions need to be able to launch multiple custom versions of sequence server? Example: One classroom uses the fly database, whereas another classroom wants to query multiple databases.
What levels of privilege need to be supported?
Is there an Administrator role (read: Instructor) that needs to be able to launch bespoke instances of Sequence Server depending on individual class requirements?
If so, what layers of customizability should we include to the Administrator relative to the student.
Description of Development Process (Agile Methodology)
Design- Once we have a strong basis for what the client expects of us, the applications and their functionality, we will design a basic DFD for our solution. The team will have to pick one of our solutions to go with. We will use a collaborative tool to build our data flow diagram, such as draw.io. If given the opportunity, we will consult with the client to get their feedback on our solution.
Develop- This is where the majority of the heavy lifting will be done, as we will now have to make all of our applications talk to each other. Based on our current proposed solutions, we will have to make use of either some kind of virtual machine, load balancer, virtual servers, etc. to make this solution come to fruition.
Test - After we build the basic infrastructure for our solution, now is the stage where we double-back to fix any issues that persisted through the development stage. If we are to really test our solution, we would have to find a way to test it with 100+ connections to see if it really fulfills the client’s needs. We should also try to virtualize and add on multiple instances of the SequenceServer to make sure it is indeed scalable
Deploy - After rigorous testing, this is the stage where the final product is proposed and implemented. Hopefully it interacts with the live environment properly, but as with most software and architectures, there may be a reason to go back to the “Develop” stage and try again.
Review - The final stage is where we consult with the client to make sure we met all of their needs and to decide if any further tweaking needs to be done. As the final stage, it is important to determine if the implemented solution is something that requires extended maintenance; if so, consult with the client.
1.Download sequence server with Blast
2. Run the existing system
3. Rebuild the system with using new tools
4. Using container
5. Using HPC for scaling