This box searches only this space. The box at the upper right searches the entire iPlant wiki.

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In the beginning of the semester, Wilson came to us with a problem: How can we create a scalable solution capable of supporting up to hundreds of people  people using SequenceServer Sequence Server to query NCBI Blast databases. Moreover, Wilson needed us to also include: 

...

Asiedu Owusu-Kyereko - Contributed on making the docker container. Contributed with write-up, bench-marking, and presentation.


Special Thanks to John Xu, Sateesh Peri, and Team BLASTEasy.

...

For the first nucleotide bench-marking, we took the random generator and had it create sequences from 1000- 500000 sequences long. What we found in the beginning was expected; the smaller the queries the quicker the times. We ended up not being able to break the solution or crash the browser; everything ran fine for the nucleotide sequences. This was not the case with the protein sequences.   

...

Sequence Length (DNA)

...

4 core (Naseer)

...

4 core (Josh)

...

8 core (Jaeden)

...

8 core (Ace)

...

Derek

For all benchmarks (nucleotide and protein) there were Three x 4 Core VMs and Two x 8 Core VMs as dedicated workers (28 threads) with a 2 Core VM set as the master node.

Times are in seconds and for simultaneous randomly generated (non-similar) runs.  Times are for the rendered results to appear on the end-user's screen, not backend server complete times.

For this benchmark we each used a simple stopwatch.

 

Nucleotide Sequence Length

Nasser

Josh

Jaeden

Ace

Derek

Average

1000

0.55

0.4

1

2

0.19

0.83

2000

0.28

0.67

1.8

1.87

0.14

0.95

5000

2.49

1.32

2.1

2.5

0.17

1.72

15000

2.44

2.9

3.2

3.5

2.9

2.99

50000

7.04

6.41

7.5

7

7.6

7.11

100000

15.49

15.57

15.5

15.5

15.59

15.53

500000

56.1

56.43

56.6

57

56.8Nucleotide

56.59

 

    

 

 To To benchmark the protein sequences, we used an online random protein sequence generator that only gave us the ability to go from 1 - 100 sequences. In this test, we experienced increasingly diversified times depending on our devices. Those with more RAM in their machines had quicker times, so we recognize by the bench-marking process that something is running or processing on the local memory. The following table illustrates our results. 

 

Sequence Length Protein 1000 (DNAlength)

4 core (Naseer)

4 core (Josh)

8 core (Jaeden)

8 core (Ace)

Derek

x n

Nasser

Josh

Jaeden

Ace

Derek

Average

n = 1

2.65

3.02

2.71

3.18

8.37

3.99

n = 10

17.49

17.47

16.82

21

53.35

25.23

n = 50

01:2888.8

01:2989.4

1:30

2:05

3:22

100

03:01.0

03:03.4

3:05

05:25.3

7:48

Protein

     

 

90

125

202

119.04

n = 100

181

183.4

185

325.3

468

268.54
Derek was running his benchmarks on a 4 year old Surface tablet and had much higher response times (skewing our average). Josh was running hardware monitor on his machine and noticed a spike (>20%) in CPU activity before results were rendered. This would indicate that time can vary greatly depending on the local CPU power (on 1 thread). It would be worth testing if John's solution of integrating nginx alleviates this bottleneck. 

Presentation:

 Google Slides -  https://docs.google.com/presentation/d/1SOUsKjVtZrnL7GM0E0m_KDS_d3FSJZITMnHaoGh_rFs/edit?pli=1#slide=id.g73ce69d719_2_5

...