This box searches only this space. The box at the upper right searches the entire iPlant wiki.

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Added comments on alignment with human genome

...

Our analysis relies on ribosomal small sub-unit material being assembled from each sample. Because a significant fraction of a cell's RNA is ribosomal, this is likely to be a sensitive detector of contamination.  However, if the contamination is from a closely related species, the sequences will co-assemble. Experimentally, we have found that this can happen when ribosome sequences differ by 2% or less. Such contamination will not be reported by our methodologies.

Comparison with Other Results - 1. Barkman

Todd Barkman has constructed trees with SABATH methyltransferase sequences and then manually decided whether samples are taxonomically misplaced. When results from his efforts are compared with the 18S RNA taxonomic validation they agree for 94% of samples.

...

His detailed report with an assessment for each assembly is available 1kp-Barkman.xlsx. The category codes are explained on the second sheet of the workbook. The above table groups categories 1-3 and 4-5. Also available is a spreadsheet listing samples which failed either source validation Sample Source Issues.xlsx.

Comparison with Other Results - 2. Mirarab

A number of samples have noticeably odd locations in the capstone test MAFFT tree produced by Siavash Mirarab. These are:

LVNW

Basal Eudicots

Cocculus laurifolius

WPYJ

Magnoliids

Frankenia laevis

DYFF

Core Eudicots/Asterids

Pycnanthemum tenuifolium

XMQO

Basal Eudicots

Gunnera manicata

JLLY

Core Eudicots/Rosids

Melaleuca quinquenervia

CYVA

Basal Eudicots

Cimicifuga racemosa

QJXB

Core Eudicots/Rosids

Wikstroemia indica

FWBF

Core Eudicots

Alangium chinense

FONV

Core Eudicots/Rosids

Greyia sutherlandii

NPND

Basalmost angiosperms

Ceratophyllum demersum

ULGV

Core Eudicots/Asterids

Morinda citrifolia

JBGU

Core Eudicots

Amaranthus palmeri

YMES

Monocots/Commelinids

Typhonium blumei

JBLI

Eusporangiate Monilophytes

Bolbitis repanda

FITN

Liverworts

Treubia lacunosa

NIJU

Core Eudicots/Rosids

Heteropyxis natalensis

UZNH

Core Eudicots/Asterids

Curtisia dentata

IQJU

Hornworts

Anthoceros formosae

FANS

Hornworts

Leiosporoceros dussii

Comparison with Other Results - 3. Human Genome

For each of the datasets was mapped to a human genome reference (available at https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.29 ) using Bowtie 2 (version 2.2.4).  Then the number of read-pairs that cleanly aligned was counted.

This provides a count of human-like reads in the library.  For most samples these reads are small fraction of the total.  However, a few cases have much larger counts suggesting that substantial contamination with human material may have occurred.  A spreadsheet with details is here.

This technique is not intend to be perfect, but provides a rapid estimate.  For RNA contamination the result wlll be an under-count, as introns will prevent the reads from aligning with the genome and being counted.  Similarly, read-ends that do not align in the expected paired-end fashion are not counted.

Example of the commands used:

# align reads to the genome reference - output temporary file (AALA.sam)
bowtie2 --phred64 --no-unal -x GCF_000001405.29_GRCh38.p3_genomic \
   -1 AALA-read_2.fq -2 AALA-read_2.fq -S AALA.sam

# print first read of properly-mapped (flag 64+2) read-pairs and count (lines)
samtools view -f 66 AALA.sam | wc -l

SUMMARY OF RESULTS

Here now are the latest results, BEFORE manual inspection by our plant experts. Detailed analysis reports are available, 1328_statistics_final.xls and 1328_blast_info_2.xls.

...