NINJA_Java_Code

NINJA source code

You can extract the contents of a jar file with the following command:
jar xf Ninja.jar

This will leave you with two directories:

  • one is "gnu", and contains the code for gathering options from the
    command line. This code can be safely ignored, as its functionality
    will be replaced by whatever option-capturing code the porters like to
    use, e.g. the GetOpt class in c++.
  • the other is "com", which is where the meat is.

Inside "com" you'll find the directories "bluemarsh" and
"traviswheeler". The former is of little importance. Open the latter.

Inside "traviswheeler", you'll find two directories of interest:

  • "libs" contains implementations of a Binary Heap and an External
    Memory Array Heap, both of which are fairly specifically tied to the
    kind of input they'll get from Ninja.
  • "ninja" holds the core of the application

The code has some extra layers of abstraction that were necessary to
plug it into another piece of software called Mesquite, but it's not
too much to deal with. The file "Ninja.java" runs the show, but all
the meat is in TreeBuilderManager, in the function "doJob".

The logical flow of that function depends on which method is being
used, based on the value given to the "-m" flag at the command line.
The core options are "cand" (which drives the program to use the
external-memory data structures) and "bin" (which drive use of
in-memory structures). The option "ext" was for testing, and need not
be reimplemented. The "standard" option is a pathetic hack I put in
place to account for the fact that the external-memory version is a
good deal slower than the internal memory version - it first tries to
do the job in memory, then falls back to the external-memory method if
that fails.

If in-memory structures are being used, DistanceFileReader reads
distances, then TreeBuilderBinHeap makes the tree.
If external-memory, DistanceFileReaderExtMem reads the distances, then
TreeBuilderExtMem makes the tree.

I'd say that someone looking to port this might be well-served by
following the in-memory part first, to understand the core concepts
... then going after the external-memory parts, which work the same,
but with more complex code handling the heaps and distance matrix.