This box searches only this space. The box at the upper right searches the entire iPlant wiki.

Skip to end of metadata
Go to start of metadata

Midterm Project (191003 Update)

Presentation

https://docs.google.com/presentation/d/104FhLIzcydNqGo7366uvJzXhYHMVCVsILLqL68nHId4/edit?usp=sharing

Team Star

Anthony Dominguez

Brandi Diesso

Emmanuel Gonzales

Michele Cosi

TJ Lippincott

Concept Map

 

Deliverables required from the Client

  • Priority 
    • Image/deployment can scale as required by the number of students in the class
    • Instructors add/build their datasets to Sequence Server
  • Nice to have
    • Caching results for the same query
    • Updates to UI (need more specification from client)

Plans

  • Docker
    • Technology requirements
      • Computer and some tutorials
    • Strengths
      • Allows us to use and contain a lot of the tools we'll need
      • Environment standardization and version control
      •  Applications and resources are isolated and segregated
      • Easily portable
    • Weaknesses
      • Doesn't run at bare-metal speeds
      • Graphical applications don't work well
      •  No seamless way of backing up data.
    • Potential unknowns and problems
      • Most of us don't have much experience with Docker, so something is bound to go sideways at some point

Questions for the client

(Checked other shared questions, these are 2 simple questions that, on top of others, we believe we need to address)

  • Is there a preference on how the client would like to access SequenceServer?

  • SequenceServer requires a list of genomes to function; will those genomes be added by the user or by the devepoler? If so, we would need a list of the genomes required.

 

 

Description of your development process (e.g., Agile)


Before beginning the project, all members will work to become familiar with all the technology needed to successfully complete the project. We will ask Dr. Wilson clarifying questions to accurately assess his needs. Our team will strive to work on the project frequently and do our best to communicate effectively so that we are all aware of the project's progress. Furthermore, we will keep simplicity in mind because the simpler something is, the more agile it will be. We will assess our progress through analyzing the state of the project, its agility and simplicity, as well as each member's contributions. 

 

Your project plan according to your development process

Install SequenceServer and required genomes → Optimize for scalability → Containerize the SequenceServer/genome instance →  Upload to public repository as container 

 

Homework 8 Update 

  • Installed, tested SS on 4 local VMs:

    • 1 core, 4 GB, Xubuntu VM

    • 2 core, 4 GB, Xubuntu VM

    • 4 core, 4 GB, Ubuntu VM

    • 7 core, 4 GB, Xubuntu VM

Nucleotide benchmark 

DNA

1 Core2 Core4 Core7 Core

1 kb

5.4

1.47

0.89

3.57

2 kb

5.1

1.31

1.07

2.57

5 kb

8.4

1.85

1.38

3.34

10 kb

17.9

3.38

1.81

6.59

50 kb

50.5

10.89

4.82

18.54

100 kb

1:40 (min)

17.5

9.21

31.83

500 kb

7:59 (min)

58.09

32.1

2:23 (min)

 

Protein benchmark

Protein
(300 aa x n seq)

1 Core2 Core4 Core7 Core

1

3.13

0.97

0.86

2

5

5

3.22

1.60

4.09

10

8

4.29

2.42

7.52

50

47.8

27.24

14.75

49.11

100

2 (min)

79.97

42.63

1:45 (min)

500

crashed

20+ (min)

Error

error

 

Note: the 7 core Benchmark was run on a VM running over a 8 core system. Possibly, there was a bottleneck performing the 7 core benchmarks.


Results (true to 1, 2, 4 cores): linear improvement of time (faster processing) over cores.


More cores = faster processing. 

 

Post Mortem

The Good: Although Atmosphere was down, we had a backup plan to which everyone contributed

The Bad: Bad timing.

The Ugly: Atmosphere went down.

What could have been done differently? Start earlier.

 

  • No labels