Skip to end of metadata
Go to start of metadata

This is just a short note to say that I've ported Salika's multithreaded spot detector onto iPlant machines, so now it runs in Bisque.  Thus the module we call the Pollen Tube Tracker now runs the multithreaded code.

It turns out that the iPlant machine used for Bisque is not quite as abundant with thread resources as our development systems.  In retrospect this is unsurprising:  the development systems we use are heavy-duty CPU servers with no other purpose in life except to run Computer Vision programs.  Whereas iPlant's Bisque server is not as beefy (less CPU and less memory) and is juggling a lot of tasks, including a fairly complicated Nginx webserver, the Bisque code, and everybody else's Bisque modules.  So, although we could achieve something like a twenty-fold speedup in the lab, on the Bisque server we will have to content ourselves with less.  I've configured the spot detector to use eight threads, and my tests indicate we get approximately a 4.5-fold speedup, i.e., spot detection formerly took about thirty minutes, and now it takes a bit less than seven minutes.  I'd call that a huge improvement.

As I said last month, even in a lab setting, one of the bottlenecks is file IO.  There's an irreducible time expense just to read the images off the disk.  I believe on the Bisque server, there is more than one hardware constraint that limits performance, and the second big one is RAM.  If I crank up the number of threads on the Bisque server (say, to 40), the detection process uses up all its RAM allowance, and the system swaps out detector RAM pages back onto the disk.  In other words, it's like the data has to commute through the IO bottleneck numerous times.  That really hurts performance.  So, more threads can actually be worse.  Again, in retrospect this is not surprising, but I did not see it coming; I didn't realize that RAM was quite as tight as all that.  We might want to do some tuning to find the best choice for the number of threads.

However, it's probably not worth it to glean much more performance from this first stage of the module, the detector.  The second stage of the module, the tracker, is now the one we probably should try to accelerate.  Improvements here depend on work done by Ernesto Brau and Jinyan Guan, who've done lots of tracker work and, I think, have found ways to accelerate the tracker used by our module.  I hope to be able to say more about this soon.

  • No labels

2 Comments

  1. Hi Andrew,

    Great work on optimization, at some point number of threads have a diminishing rate of return ..and finding the sweet spot is always hard when you dont know hardware capabilities on the execution platform. 

    The execution should be happening on a condor execute node and not bisque app server, we can verify the specs and you can set the num threads to be suitable for that platform.

    Please check with bisque team to make sure the job is getting executed on appropriate platform. 

    Regards,

    Nirav

    1. predoehl AUTHOR

      Ok, thanks -- I didn't know that.  I'm sure we can get the module running in condor without too much trouble.