Rationale and background:
MaxBin-2.2 is software for binning assembled metagenomic sequences based on an Expectation-Maximization algorithm. Users can understand the underlying bins (genomes) of the microbes in their metagenomes by simply providing assembled metagenomic sequences and the reads coverage information or sequencing reads. For users' convenience MaxBin will report genome-related statistics, including estimated completeness, GC content and genome size in the binning summary page.
Users can use MEGAN or similar software on MaxBin bins to find the taxonomy of each bin after the binning process is finished.
Wu YW, Tang YH, Tringe SG, Simmons BA, and Singer SW, "MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm", Microbiome, 2:26, 2014.
- A CyVerse account. (Register for an CyVerse account here - user.cyverse.org)
- Mandatory arguments
- Contig file name
- Output file name
- At least one of the following
Contig abundance files / a list file of all contig abundance files.
Reads file in fasta or fastq format / a list file of all reads file.
- Optional arguments
Reassembly (Reassembly option is still highly experimental. To use this function, you need to feed MaxBin "interleaved paired-end" fastq or fasta file if you were to use this option)
Prob_threshold (minimum probability for EM algorithm; default 0.8)
Markerset (By default MaxBin will look for 107 marker genes present in >95% of bacteria. Alternatively you can also choose 40 marker gene sets that are universal among bacteria and archaea (Wu et al., PLoS ONE 2013). This option may be better suited for environment dominated by archaea; however it tend to split genomes into more bins. You can choose between different marker gene sets and see which one works better).
The following test data are provided for testing Maxbin-2.2 at /iplant/home/shared/iplantcollaborative/example_data/Maxbin.sample.data
- Contigs file (20x.scaffold)
- Abundance file (20x.abund)
- reads file (20x.reads)
Assume your output file header is (out). MaxBin will generate information using this file header as follows.
- (out).0XX.fasta -- the XX bin. XX are numbers, e.g. out.001.fasta
- (out).summary -- a summary file describing which contigs are being classified into which bin.
- (out).log -- a log file recording the core steps of MaxBin algorithm
- (out).marker -- marker gene presence numbers for each bin. This table is ready to be plotted by R or other 3rd-party software.
- (out).marker.pdf -- visualization of the marker gene presence numbers using R. Will only appear if -plotmarker is specified.
- (out).noclass -- this file stores all sequences that pass the minimum length threshold but are not classified successfully.
- (out).tooshort -- this file stores all sequences that do not meet the minimum length threshold.
- (out).marker_of_each_gene.tar.gz -- this tarball file stores all markers predicted from the individual bins. Use "tar -zxvf (out).marker_of_each_gene.tar.gz" to extract the markers [(out).0XX.marker.fasta].
(if -reassembly is given) (out)_reassem/(out).reads.0xx -- the collected reads for the 0xx bin. (out)_reassem/(out).reads.noclass - reads that cannot be assigned to any bin.