This box searches only this space. The box at the upper right searches the entire iPlant wiki.

Skip to end of metadata
Go to start of metadata

Data Management - Save the Raw Data

While doing any kind of scientific analysis, it is extremely important to save the raw collected data for posterity. At a high level, this is obvious since the entire purpose of the scientific method is to make results replicable, which would be impossible without pristine copies of the initial inputs. More specifically, when discussing computational and data science, it is important to save your raw data not just for the sake of reproducibility of results, but because quite often experimentation is a destructive process. If your process requires reading in data and modifying it in any way, it is good practice to have a separate copy of raw data, or at least read only permissions to keep from ruining your experiment by a typo or careless mistake.

 

Software - Make Dependencies and Requirements Explicit

It is very important to keep explicit track of any library dependencies or other software requirements your project may have. Again, this can fall under the category of reproducibility of results by other researchers, but this is also important for ease of use by both your future self and any collaborators. Could you imagine trying to install a complicated software package from source on a fresh system with no a priori knowledge of any of system requirements? Some may have lived this, and know that it takes hours of trial and error to get everything right, if you can succeed at all. You may end up having to set your software up again in a new environment mid-research. Computers break, hardware is upgraded, etc and if you have not kept track of how you got it to work on your original environment, it may be difficult to reproduce again.

 

Collaboration - Create an Overview of your Project

Even on projects where everyone is devoting their full bandwidth to a single project, it can be very difficult to get everyone on the same page as to the scope and specifics of the work. Unifying a vision is difficult when a team grows to a size of more than one. Unfortunately, the much more realistic scenario is that each team member is balancing multiple projects and having a clearly defined top-level view of the project with objectives and even role assignments can be extremely beneficial to keep everyone on-task and ensures they understand the goals.

 

Project Organization - All

For this section I elected to select all the sub-categories instead of just one because I feel like they could all be summarized as "subscribe to standard Unix project structure". It works, it is simple, and it has been vetted and used for almost 50 years for a reason.

 

Tracking Changes - Keep a Change Log

This paper recommended having a specific changelog.txt file that keeps track of changes to the software, which for very large projects with hundreds of changes with each update, this is a very good idea. For smaller projects, it is often convenient to just keep a version history with a brief description of the update inside the software header. A simple example is below.

 

Project Layout

The project below is a python software package that can be compiled and installed via the standard "configure, make all, make install" commands. The "README.txt" file contains installation instructions for Windows, OSX, and Linux as well as usage examples and other documentation that would usually fall into the "doc" directory. Since it is very long, only the first section of the "README.txt" is given below.

 

 

README.txt

 

Software License

The license for this project is the GNU GPL because it makes use of other software that is licensed by the GNU GPL so it is constrained as such.

 

  • No labels