MUSCLE

How to download the tool or source code including installation and usage instructions as well as any source code that might be associated with the executable. This should also include a listing of any dependencies for this tool or script.
Copy the appropriate muscle binary file from http://www.drive5.com/muscle/downloads.htm to a directory.
Extract the file using tar -zxvf filename

Required version of the program necessary to perform the desired task
muscle3.8.31

Sample dataset and expected results to be output

Excerpt from sample input file ago.fa (FASTA format sequence file, either protein or nucleic acid)

>AGO1 | Arabidopsis thaliana | protein sequence | AGO1 Group (Dicots and Monocots)
MVRKRRTDAPSEGGEGSGSREAGPVSGGGRGSQRGGFQQGGGQHQGGRGYTPQPQQGGRG
GRGYGQPPQQQQQYGGPQEYQGRGRGGPPHQGGRGGYGGGRGGGPSSGPPQRQSVPELHQ
ATSPTYQAVSSQPTLSEVSPTQVPEPTVLAQQFEQLSVEQGAPSQAIQPIPSSSKAFKFP
MRPGKGQSGKRCIVKANHFFAELPDKDLHHYDVTITPEVTSRGVNRAVMKQLVDNYRDSH
LGSRLPAYDGRKSLYTAGPLPFNSKEFRINLLDEEVGAGGQRREREFKVVIKLVARADLH
HLGMFLEGKQSDAPQEALQVLDIVLRELPTSRYIPVGRSFYSPDIGKKQSLGDGLESWRG
FYQSIRPTQMGLSLNIDMSSTAFIEANPVIQFVCDLLNRDISSRPLSDADRVKIKKALRG
VKVEVTHRGNMRRKYRISGLTAVATRELTFPVDERNTQKSVVEYFHETYGFRIQHTQLPC
LQVGNSNRPNYLPMEVCKIVEGQRYSKRLNERQITALLKVTCQRPIDREKDILQTVQLND
YAKDNYAQEFGIKISTSLASVEARILPPPWLKYHESGREGTCLPQVGQWNMMNKKMINGG
TVNNWICINFSRQVQDNLARTFCQELAQMCYVSGMAFNPEPVLPPVSARPEQVEKVLKTR
YHDATSKLSQGKEIDLLIVILPDNNGSLYGDLKRICETELGIVSQCCLTKHVFKMSKQYM
ANVALKINVKVGGRNTVLVDALSRRIPLVSDRPTIIFGADVTHPHPGEDSSPSIAAVVAS
QDWPEITKYAGLVCAQAHRQELIQDLFKEWKDPQKGVVTGGMIKELLIAFRRSTGHKPLR
IIFYRDGVSEGQFYQVLLYELDAIRKACASLEAGYQPPVTFVVVQKRHHTRLFAQNHNDR
HSVDRSGNILPGTVVDSKICHPTEFDFYLCSHAGIQGTSRPAHYHVLWDENNFTADGLQS
LTNNLCYTYARCTRSVSIVPPAYYAHLAAFRARFYMEPETSDSGSMASGSMARGGGMAGR
STRGPNVNAAVRPLPALKENVKRVMFYC
>AGO704 | Oryza sativa ssp. japonica | protein sequence | AGO1 Group (Dicots and Monocots)
MEGGGGRGGYRGDGDGGYGRGGGGYHGDGERGYGRGGGGGGGGGGGYRGDDEGRSSYGRA
RGGGGGGGGYHGDGEAGYGRGRGGRDYDGGRGGGGRRGGRGGGGSSYHQQPPPDLPQAPE
PRLAAQYAREIDIAALRAQFKGLTTTTPGAASSQFPARPGFGAAGEECLVKVNHFFVGLK
NDNFHHYDVAIAPDPVLKGLFRTIISKLVTERRHTDFGGRLPVYDGRANLYTAGELPFRS

Excerpt from sample output file ago_aligned.fa (aligned FASTA format) generated using muscle -in ago.fa -out ago_aligned.fa

>AGO704 | Oryza sativa ssp. japonica | protein sequence | AGO1 Group (Dicots and Monocots)
----MEGGGGRGGYRGDGDGGYGRGGGGYHGDGERGYGRGGGGGGGGGGGYRG------D
DEGRSSYGRARGGGGGGGG------------------------YHGDGEAGY--------
--------------G---RGRGGRDYDGGRG---GGGRRGGRGGGGSSYHQQ--PPPDLP
QAPEPRLAAQYA----------------REIDIAALRAQFKGLTTTTPGAAS--------
------SQFPARPGFGAAGEECLVKVNHFFVGL---KNDNFHHYDVAIAPDPVLKGLFRT
IISKLVTERRHTDFGGRLPVYDGRANLYTAGELPFRSRELEVEL--------------SG
SRKFKVAIRHVAPVSLQDLRMVMAGCPAGIPSQALQLLDIVLRDMVLAERNDMGYVAFGR
SYFSPGLGSRE-LDKGIFAWKGFYQSCRVTQQGLSLNIDMSSTAFIEPGRVLNFVEKAIG
RRITNAITV-GYFLNNYGNELMRTLKGVKVEVTHRGNLRKKYRIAGFTEQSADVQTFTSS
DG--IKTVKEYFNKKYNLKLAFGYLPCLQVGSKERPNYLPMELCNIVPGQRYKNRLSPTQ
VSNLINITNDRPCDRESSIRQTVSSNQYNSTERADEFGIEVDSYPTTLKARVLKAPMLKY
HDSGRVRVCTPEDGAWNMKDKKVVNGATIKSWACVNLCEGLDNRVVEAFCLQLVRTSKIT
GLDFA-NVSLPILKADPHNVKTDLPMRYQEACSWSRDNK---ID-LLLVVMTDDKNNASL
YGDVKRICETEIGVLSQCCRAKQVYKERNVQYCANVALKINAKAGGRNSVFLN-VEASLP
VVSKSPTIIFGADVTHPGSFDESTPSIASVVASADWPEVTKYNSVVRMQASRKEIIQDL-
------------DSIVRELLNAFKRDSKMEPKQLIFYRDGVSEGQFQQVVESEIPEIEKA
WKSLYAG-KPRITFIVVQKRHHTRLFPNNYNDPRGMDGTGNVRPGTVVDTVICHPREFDF
FLCSQAGIKGTSRPSHYHVLRDDNNFTADQLQSVTNNLCYLYTSCTRSVSIPPPVYYAHK
LAFRARFYLTQVPVAGG----------------------DPGAAKFQWVLPEIKEEVKKS
MFFC
>AGO716 | Oryza sativa ssp. japonica | protein sequence | AGO1 Group (Dicots and Monocots)
MESQRMT-------------------------------------------WLY------D
RHHSLKHNKAER------------------------------------------------
------------------------------------------------------------
QAILSTYRLAKR------------------------------------------------
---------PNLSSEGMIGESCIVRTNCFSVHLESLDDQTIYEYDVCVTPEV---GINRA

Set of parameters and command line switches that match the expected execution of the tool including the possible command line definitions according to the occurrence of optional parameters. Also, validation instructions for parameters are requested.
There are two types of command-line options: value options and flag options. Value options are followed by the value of the given parameter, for example --in <filename>; flag options just stand for themselves, such as --msf. All options are a dash (not two dashes!) followed by a long name; there are no single-letter equivalents. Value options must be separated from their values by white space in the command line. Thus, muscle does not follow Unix, Linux or Posix standards, for which we apologize. The order in which options are given is irrelevant unless two options contradict, in which case the right-most option silently wins.

actual command-line parameter	name and brief description of the parameter	required	default value	text, number, or name of file	description of validation rules
anchorspacing	minimum spacing between anchor columns; used for tree-dependent refinements	Y	32	integer	>=0
center	center paramater; used when specifying a protein substitituion matrix	Y	[1]	floating point	<=0
cluster1 cluster2	Clustering method; cluster 1 is used in iteration 1 and 2, cluster 2 in later iterations	Y	upgmb	text	upgma upgmb neighborjoining
clwout	Write ouptut in CLUSTALW format to given file name	N	none	file name
diagbreak	Maximum distance between two diagonols that allows them to merge into one diagonol	Y	1	integer	>=1
diaglength	Maximum length of diagonol	Y	24	integer	>=1
diagmargin	Discard this many positions at ends of diagonol	Y	5	integer	>=0
distance1	Distance measure for iteration 1	Y	kmer6_6 (amino) kmer4_6 (nucleo)	text	kmer6_6 kmer20_3 kmer20_4 kmer20_3 kmer4_6
distance2	Distance measure for iterations 2, 3, ...	Y	pctid_kimura	text	pctid_kiumar pctid_log
fastaout	Write output in FASTA format to given file	N	none	file name
gapopen	That gap open score	Y	[1]	floating point	must be negative
hydro	Window size for determining whether a region is hydrophobic	Y	5	integer	>=1
hydrofactor	Multiplier for gap open/close penalities in hydrophobic regions	Y	1.2	floating point
in	Input file; file that contains the sequences to be aligned	Y	standard input	file name
in1	Input alignment; file that conatins an alignment to be refined or appended to the sequences	N	none	file name
in2	Input alignment; file that conatins an alignment to be refined or appended to the sequences	N	none	file name
log	Log file name (delete existing file)	N	none	file name
loga	Log file name (append to existing file)	N	none	file name
matrix	File name for substitution matrix in NCBI or WU-BLAST format. If specified, must all specify -gapopen <g>, -gapextend <e>, -center 0.0 (<g> and <e> must be negative)	N	none	file name
maxhours	Maximum time to run in hours. The actual time may exceed the requested limit	N	none	floating point	Decimals are allowed so 1.5 means one hour and 30 minutes
maxiters	Maximum number of iterations	Y	16	integer	>=1
maxtrees	Maximum number of new trees to build in interation 2	Y	1	integer	>=1
minbestcolscore	Minimum score a column must have to be an achor	Y	[1]	floating point
minsmoothscore	Minimum smoothed score a column must have to be an anchor	Y	[1]	floating point
msaout	Write output to given file name in MSF format	N	none	file name
objscore	Objective score used by tree dependent refinement sp = sum-of-pairs score spf = sum-of-pairs score (dimer appromixation) spm = sp for <100 seqs, otherwise spf dp = dynamic programming score ps = average profile-sequence score xp = cross profile score	Y	spm	text	sp ps dp xp spf spm
out	Where to wrie the alignment	Y	standard out	file name
phyiout	Write output in Phylip interleaved format to given file name	N	none	file name
physout	Write output in Phylip sequential format to given file name	N	none	file name
refinewindow	Length of window for -refinew	Y	200	integer
root1 root2	Method used to root tree; root1 is used in iteration 1 and 2, root2 in later iterations	Y	pseudo	text	pseduo midlongestspan minavgleafdist
scorefile	File name wehre to write a score file. This contains one line for each column in the alignment. The line contains ths letters in the columns followed by the average BLOSUM62 score over pairs of letters in the column	N	none	file name
seqtype	Sequence type found in input file	Y	auto	text	protein nucleo auto
smoothscoreceil	Maximum value of column score for smoothing purposes	Y	[1]	floating point
smoothwindow	Window used for anchor column smoothing	Y	7	integer
spscore	Compute SP objective score of multiple alignment	N	none	file name
SUEFF	Constant used in UPGMB clustering. Determines the relative fraction of average linkage (SUEFF) vs. nearest-neighbor linkage (1-SUEFF)	Y	0.1	floating point	0<x<1
tree1 tree2	Save tree produced in first or second iteration to give file in Newick (Phylip-compatible) format	N	none	file name
usetree	Use given tree as guide tree. Must be in Newick (Phylip-compatible) format	N	none	file name
weight1 weight2	Sequence weighting scheme. weight1 is used in iterations 1 and 2. weight2 is used for tree-dependent refinement none = all sequences have equal weight henikoff = Henikoff & Henikoff weighting scheme henikoffpb = Modified Henikoff scheme as used in PSI-BLAST clustaw = CLUSTALW method threeway = Gotoh three-way method	Y	clustalw	text	none henikoff henikoffpb gsc clustalw threeway

[1] Default depends on the profile scoring function. To determine the default, use --verbose --log and check the log file.

Flags

Flag Option	Set by default?	Description
anchors	Y	Use anchor optimization in tree dependent refinement iterations
brenner	N	Use Steven Brenner's method for computing the root alignment
cluster	N	Perform fast clustering of input sequences. Use the -tree1 option to save the tree
dimer	N	Use dimer approximation for the SP score (faster, slightly less accurate)
clw	N	Write output in CLUSTALW format (default is FASTA)
clwstrict	N	Write output in CLUSTALW format with the "CLUSTAL W (1.81)" header rather than the MUSCLE version
core	Y	Do not catch exceptions
diags	N	Use diagonol optimizations. Faster, escpecially for closely related sequences but may be less accurate
diags1	N	Use diagonol optimizations in first iteration
diags2	N	Use diagonol optimizations in second iteration
fasta	Y	Write output in FASTA format
group	Y	Group similar sequences together in output
html	N	Write output in HTML format (default is FASTA)
le	?	Use log-expectation profile score (VTML240). Alternatives are to use -sp or -sv. This is the default for amino acid sequences
msf	N	Write output in MSF format. Designed to be compatible with the GCG package
noanchors	N	Disable anchor optimization. Default is -anchors
nocore	N	Catch exceptions and give an error message if possible
phyi	N	Write output in Phylip interleaved format
phys	N	Write output in Phylip sequential format
profile	N	Compute profile-profile alignment. Input alignments must be given using -in1 and -in2 options
quiet	N	Do not display progress messages
refine	N	Input file is already aligned, skip first two iterations and begin tree dependent refinement
refinew	N	Refine an alignment by dividing it into non-overlapping windows and re-aligning each window. Typically used for whole-genome nucleotide alignments
sp	N	Use sum-of-pairs protein profile score (PAM200). Default is -le
spscore	N	Compute alignment score of profile-profile alignment. Input alignments must be given using --in1 and --in2 options. These must be pre-aligned with gapped columns as needed, i.e. must be of the same length (have same number of columns).
spn	?	Use sum-of-pairs nucleotide profile score. This is the only option for nucleotides, and is therefore the default. The substitution scores and gap penalty scores are "borrowed" from BLASTZ.
stable	N	Preserve input order of sequences in output file. Default is to group sequences by similarity (--group). WARNING THIS OPTION WAS BUGGY AND IS NOT SUPPORTED IN v3.8.
sv	N	Use sum-of-pairs profile score (VTML240). Default is --le.
termgaps4	Y	Use 4-way test for treatment of terminal gaps. (Cannot be disabled in this version).
termgapsfull	N	Terminal gaps penalized with full penalty. Not fully supported in this version.
termgapshalf	Y	Terminal gaps penalized with half penalty. Not fully supported in this version.
termgapshalflonger	N	Terminal gaps penalized with half penalty if gap relative to longer sequence, other with full penalty. Not fully supported in this version.
verbose	N	Write parameter settings and progress messages to log file.
version	N	Write version string to stdout and exit.

Example invocation of the command line application and its associated parameters such that it can perform an analysis
Simplest case:
```
muscle -in ago.fa -out ago_aligned.fa
```
Refine an alignment
```
muscle -in seqs.afa -out refined.afa -refine
```
Using a pre-computed guide tree
```
muscle -in seqs.fa -out seqs.afa -usetree mytree.phy
```
Output alignment to multiple file formats
```
muscle -in seqs.fa -fastaout seqs.afa -clwout seqs.aln
```

Reference
Edgar, R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput.Nucleic Acids Res. 32(5):1792-1797.
doi:10.1093/nar/gkh340

How to download the tool or source code including installation and usage instructions as well as any source code that might be associated with the executable. This should also include a listing of any dependencies for this tool or script.

Required version of the program necessary to perform the desired task

Sample dataset and expected results to be output

Set of parameters and command line switches that match the expected execution of the tool including the possible command line definitions according to the occurrence of optional parameters. Also, validation instructions for parameters are requested.

Example invocation of the command line application and its associated parameters such that it can perform an analysis

Reference