Validation of Scaffolds with Transrate

Thousand Plants Assemblies


One quick observation is that very short 1kP scaffolds tend to be classified as bad, but the fraction of good scaffolds improves with greater length. (see TransratePassRate.png)  Scaffolds of 300 bp or longer, are overwhelmingly (89%) rated as good.  While we previously recommended that sub-300 bp scaffolds be ignored for most analyses, you may be interested in this alternative argument for doing so.

Steven Kelly cautions that the automated scaffold cut-off is great for rejecting bad assemblies, but it does discard some correct assemblies.  He suggests that for OneKP it should be generally fine, as most of SOAPdenovo-Trans errors result from multiple fragments of larger scaffolds already present in the assembly.  A very conservative alternative cut off is to just throw away anything with a minimum score, i.e. 0.01, indicating the scaffold has no supporting reads.

Accessing the Results


Transrate produces three tables with per scaffold statistics.  I have combined these into single files (tab separated columns) with some sample specific details as a header.  Named CODE-SOAPdenovo-Trans-Transrate-stats.tsv.gz they are now placed in the "assembly" directories on http://onekp.westgrid.ca/1kp-data/

If there is a problem with using these modified files, please contact Eric Carpenter.  The originals have been retained, but have not been posted for reasons of brevity.