MUSCLE
How to download the tool or source code including installation and usage instructions as well as any source code that might be associated with the executable. This should also include a listing of any dependencies for this tool or script.
Copy the appropriate muscle binary file from http://www.drive5.com/muscle/downloads.htm to a directory.
Extract the file using tar -zxvf filename
Required version of the program necessary to perform the desired task
muscle3.8.31
Sample dataset and expected results to be output
Excerpt from sample input file ago.fa (FASTA format sequence file, either protein or nucleic acid)Excerpt from sample output file ago_aligned.fa (aligned FASTA format) generated using muscle -in ago.fa -out ago_aligned.fa>AGO1 | Arabidopsis thaliana | protein sequence | AGO1 Group (Dicots and Monocots) MVRKRRTDAPSEGGEGSGSREAGPVSGGGRGSQRGGFQQGGGQHQGGRGYTPQPQQGGRG GRGYGQPPQQQQQYGGPQEYQGRGRGGPPHQGGRGGYGGGRGGGPSSGPPQRQSVPELHQ ATSPTYQAVSSQPTLSEVSPTQVPEPTVLAQQFEQLSVEQGAPSQAIQPIPSSSKAFKFP MRPGKGQSGKRCIVKANHFFAELPDKDLHHYDVTITPEVTSRGVNRAVMKQLVDNYRDSH LGSRLPAYDGRKSLYTAGPLPFNSKEFRINLLDEEVGAGGQRREREFKVVIKLVARADLH HLGMFLEGKQSDAPQEALQVLDIVLRELPTSRYIPVGRSFYSPDIGKKQSLGDGLESWRG FYQSIRPTQMGLSLNIDMSSTAFIEANPVIQFVCDLLNRDISSRPLSDADRVKIKKALRG VKVEVTHRGNMRRKYRISGLTAVATRELTFPVDERNTQKSVVEYFHETYGFRIQHTQLPC LQVGNSNRPNYLPMEVCKIVEGQRYSKRLNERQITALLKVTCQRPIDREKDILQTVQLND YAKDNYAQEFGIKISTSLASVEARILPPPWLKYHESGREGTCLPQVGQWNMMNKKMINGG TVNNWICINFSRQVQDNLARTFCQELAQMCYVSGMAFNPEPVLPPVSARPEQVEKVLKTR YHDATSKLSQGKEIDLLIVILPDNNGSLYGDLKRICETELGIVSQCCLTKHVFKMSKQYM ANVALKINVKVGGRNTVLVDALSRRIPLVSDRPTIIFGADVTHPHPGEDSSPSIAAVVAS QDWPEITKYAGLVCAQAHRQELIQDLFKEWKDPQKGVVTGGMIKELLIAFRRSTGHKPLR IIFYRDGVSEGQFYQVLLYELDAIRKACASLEAGYQPPVTFVVVQKRHHTRLFAQNHNDR HSVDRSGNILPGTVVDSKICHPTEFDFYLCSHAGIQGTSRPAHYHVLWDENNFTADGLQS LTNNLCYTYARCTRSVSIVPPAYYAHLAAFRARFYMEPETSDSGSMASGSMARGGGMAGR STRGPNVNAAVRPLPALKENVKRVMFYC >AGO704 | Oryza sativa ssp. japonica | protein sequence | AGO1 Group (Dicots and Monocots) MEGGGGRGGYRGDGDGGYGRGGGGYHGDGERGYGRGGGGGGGGGGGYRGDDEGRSSYGRA RGGGGGGGGYHGDGEAGYGRGRGGRDYDGGRGGGGRRGGRGGGGSSYHQQPPPDLPQAPE PRLAAQYAREIDIAALRAQFKGLTTTTPGAASSQFPARPGFGAAGEECLVKVNHFFVGLK NDNFHHYDVAIAPDPVLKGLFRTIISKLVTERRHTDFGGRLPVYDGRANLYTAGELPFRS
>AGO704 | Oryza sativa ssp. japonica | protein sequence | AGO1 Group (Dicots and Monocots) ----MEGGGGRGGYRGDGDGGYGRGGGGYHGDGERGYGRGGGGGGGGGGGYRG------D DEGRSSYGRARGGGGGGGG------------------------YHGDGEAGY-------- --------------G---RGRGGRDYDGGRG---GGGRRGGRGGGGSSYHQQ--PPPDLP QAPEPRLAAQYA----------------REIDIAALRAQFKGLTTTTPGAAS-------- ------SQFPARPGFGAAGEECLVKVNHFFVGL---KNDNFHHYDVAIAPDPVLKGLFRT IISKLVTERRHTDFGGRLPVYDGRANLYTAGELPFRSRELEVEL--------------SG SRKFKVAIRHVAPVSLQDLRMVMAGCPAGIPSQALQLLDIVLRDMVLAERNDMGYVAFGR SYFSPGLGSRE-LDKGIFAWKGFYQSCRVTQQGLSLNIDMSSTAFIEPGRVLNFVEKAIG RRITNAITV-GYFLNNYGNELMRTLKGVKVEVTHRGNLRKKYRIAGFTEQSADVQTFTSS DG--IKTVKEYFNKKYNLKLAFGYLPCLQVGSKERPNYLPMELCNIVPGQRYKNRLSPTQ VSNLINITNDRPCDRESSIRQTVSSNQYNSTERADEFGIEVDSYPTTLKARVLKAPMLKY HDSGRVRVCTPEDGAWNMKDKKVVNGATIKSWACVNLCEGLDNRVVEAFCLQLVRTSKIT GLDFA-NVSLPILKADPHNVKTDLPMRYQEACSWSRDNK---ID-LLLVVMTDDKNNASL YGDVKRICETEIGVLSQCCRAKQVYKERNVQYCANVALKINAKAGGRNSVFLN-VEASLP VVSKSPTIIFGADVTHPGSFDESTPSIASVVASADWPEVTKYNSVVRMQASRKEIIQDL- ------------DSIVRELLNAFKRDSKMEPKQLIFYRDGVSEGQFQQVVESEIPEIEKA WKSLYAG-KPRITFIVVQKRHHTRLFPNNYNDPRGMDGTGNVRPGTVVDTVICHPREFDF FLCSQAGIKGTSRPSHYHVLRDDNNFTADQLQSVTNNLCYLYTSCTRSVSIPPPVYYAHK LAFRARFYLTQVPVAGG----------------------DPGAAKFQWVLPEIKEEVKKS MFFC >AGO716 | Oryza sativa ssp. japonica | protein sequence | AGO1 Group (Dicots and Monocots) MESQRMT-------------------------------------------WLY------D RHHSLKHNKAER------------------------------------------------ ------------------------------------------------------------ QAILSTYRLAKR------------------------------------------------ ---------PNLSSEGMIGESCIVRTNCFSVHLESLDDQTIYEYDVCVTPEV---GINRA
Set of parameters and command line switches that match the expected execution of the tool including the possible command line definitions according to the occurrence of optional parameters. Also, validation instructions for parameters are requested.
There are two types of command-line options: value options and flag options. Value options are followed by the value of the given parameter, for example --in <filename>; flag options just stand for themselves, such as --msf. All options are a dash (not two dashes!) followed by a long name; there are no single-letter equivalents. Value options must be separated from their values by white space in the command line. Thus, muscle does not follow Unix, Linux or Posix standards, for which we apologize. The order in which options are given is irrelevant unless two options contradict, in which case the right-most option silently wins.
actual command-line parameter |
name and brief description of the parameter |
required |
default value |
text, number, or name of file |
description of validation rules |
---|---|---|---|---|---|
anchorspacing |
minimum spacing between anchor columns; used for tree-dependent refinements |
Y |
32 |
integer |
>=0 |
center |
center paramater; used when specifying a protein substitituion matrix |
Y |
[1] |
floating point |
<=0 |
cluster1 |
Clustering method; cluster 1 is used in iteration 1 and 2, cluster 2 in later iterations |
Y |
upgmb |
text |
upgma |
clwout |
Write ouptut in CLUSTALW format to given file name |
N |
none |
file name |
|
diagbreak |
Maximum distance between two diagonols that allows them to merge into one diagonol |
Y |
1 |
integer |
>=1 |
diaglength |
Maximum length of diagonol |
Y |
24 |
integer |
>=1 |
diagmargin |
Discard this many positions at ends of diagonol |
Y |
5 |
integer |
>=0 |
distance1 |
Distance measure for iteration 1 |
Y |
kmer6_6 (amino) |
text |
kmer6_6 |
distance2 |
Distance measure for iterations 2, 3, ... |
Y |
pctid_kimura |
text |
pctid_kiumar |
fastaout |
Write output in FASTA format to given file |
N |
none |
file name |
|
gapopen |
That gap open score |
Y |
[1] |
floating point |
must be negative |
hydro |
Window size for determining whether a region is hydrophobic |
Y |
5 |
integer |
>=1 |
hydrofactor |
Multiplier for gap open/close penalities in hydrophobic regions |
Y |
1.2 |
floating point |
|
in |
Input file; file that contains the sequences to be aligned |
Y |
standard input |
file name |
|
in1 |
Input alignment; file that conatins an alignment to be refined or appended to the sequences |
N |
none |
file name |
|
in2 |
Input alignment; file that conatins an alignment to be refined or appended to the sequences |
N |
none |
file name |
|
log |
Log file name (delete existing file) |
N |
none |
file name |
|
loga |
Log file name (append to existing file) |
N |
none |
file name |
|
matrix |
File name for substitution matrix in NCBI or WU-BLAST format. If specified, must all specify -gapopen <g>, -gapextend <e>, -center 0.0 (<g> and <e> must be negative) |
N |
none |
file name |
|
maxhours |
Maximum time to run in hours. The actual time may exceed the requested limit |
N |
none |
floating point |
Decimals are allowed so 1.5 means one hour and 30 minutes |
maxiters |
Maximum number of iterations |
Y |
16 |
integer |
>=1 |
maxtrees |
Maximum number of new trees to build in interation 2 |
Y |
1 |
integer |
>=1 |
minbestcolscore |
Minimum score a column must have to be an achor |
Y |
[1] |
floating point |
|
minsmoothscore |
Minimum smoothed score a column must have to be an anchor |
Y |
[1] |
floating point |
|
msaout |
Write output to given file name in MSF format |
N |
none |
file name |
|
objscore |
Objective score used by tree dependent refinement |
Y |
spm |
text |
sp |
out |
Where to wrie the alignment |
Y |
standard out |
file name |
|
phyiout |
Write output in Phylip interleaved format to given file name |
N |
none |
file name |
|
physout |
Write output in Phylip sequential format to given file name |
N |
none |
file name |
|
refinewindow |
Length of window for -refinew |
Y |
200 |
integer |
|
root1 |
Method used to root tree; root1 is used in iteration 1 and 2, root2 in later iterations |
Y |
pseudo |
text |
pseduo |
scorefile |
File name wehre to write a score file. This contains one line for each column in the alignment. The line contains ths letters in the columns followed by the average BLOSUM62 score over pairs of letters in the column |
N |
none |
file name |
|
seqtype |
Sequence type found in input file |
Y |
auto |
text |
protein |
smoothscoreceil |
Maximum value of column score for smoothing purposes |
Y |
[1] |
floating point |
|
smoothwindow |
Window used for anchor column smoothing |
Y |
7 |
integer |
|
spscore |
Compute SP objective score of multiple alignment |
N |
none |
file name |
|
SUEFF |
Constant used in UPGMB clustering. Determines the relative fraction of average linkage (SUEFF) vs. nearest-neighbor linkage (1-SUEFF) |
Y |
0.1 |
floating point |
0<x<1 |
tree1 |
Save tree produced in first or second iteration to give file in Newick (Phylip-compatible) format |
N |
none |
file name |
|
usetree |
Use given tree as guide tree. Must be in Newick (Phylip-compatible) format |
N |
none |
file name |
|
weight1 |
Sequence weighting scheme. weight1 is used in iterations 1 and 2. weight2 is used for tree-dependent refinement |
Y |
clustalw |
text |
none |
[1] Default depends on the profile scoring function. To determine the default, use --verbose --log and check the log file.
Flags
Flag Option |
Set by default? |
Description |
---|---|---|
anchors |
Y |
Use anchor optimization in tree dependent refinement iterations |
brenner |
N |
Use Steven Brenner's method for computing the root alignment |
cluster |
N |
Perform fast clustering of input sequences. Use the -tree1 option to save the tree |
dimer |
N |
Use dimer approximation for the SP score (faster, slightly less accurate) |
clw |
N |
Write output in CLUSTALW format (default is FASTA) |
clwstrict |
N |
Write output in CLUSTALW format with the "CLUSTAL W (1.81)" header rather than the MUSCLE version |
core |
Y |
Do not catch exceptions |
diags |
N |
Use diagonol optimizations. Faster, escpecially for closely related sequences but may be less accurate |
diags1 |
N |
Use diagonol optimizations in first iteration |
diags2 |
N |
Use diagonol optimizations in second iteration |
fasta |
Y |
Write output in FASTA format |
group |
Y |
Group similar sequences together in output |
html |
N |
Write output in HTML format (default is FASTA) |
le |
? |
Use log-expectation profile score (VTML240). Alternatives are to use -sp or -sv. This is the default for amino acid sequences |
msf |
N |
Write output in MSF format. Designed to be compatible with the GCG package |
noanchors |
N |
Disable anchor optimization. Default is -anchors |
nocore |
N |
Catch exceptions and give an error message if possible |
phyi |
N |
Write output in Phylip interleaved format |
phys |
N |
Write output in Phylip sequential format |
profile |
N |
Compute profile-profile alignment. Input alignments must be given using -in1 and -in2 options |
quiet |
N |
Do not display progress messages |
refine |
N |
Input file is already aligned, skip first two iterations and begin tree dependent refinement |
refinew |
N |
Refine an alignment by dividing it into non-overlapping windows and re-aligning each window. Typically used for whole-genome nucleotide alignments |
sp |
N |
Use sum-of-pairs protein profile score (PAM200). Default is -le |
spscore |
N |
Compute alignment score of profile-profile alignment. Input alignments must be given using --in1 and --in2 options. These must be pre-aligned with gapped columns as needed, i.e. must be of the same length (have same number of columns). |
spn |
? |
Use sum-of-pairs nucleotide profile score. This is the only option for nucleotides, and is therefore the default. The substitution scores and gap penalty scores are "borrowed" from BLASTZ. |
stable |
N |
Preserve input order of sequences in output file. Default is to group sequences by similarity (--group). |
sv |
N |
Use sum-of-pairs profile score (VTML240). Default is --le. |
termgaps4 |
Y |
Use 4-way test for treatment of terminal gaps. (Cannot be disabled in this version). |
termgapsfull |
N |
Terminal gaps penalized with full penalty. |
termgapshalf |
Y |
Terminal gaps penalized with half penalty. |
termgapshalflonger |
N |
Terminal gaps penalized with half penalty if gap relative to longer sequence, other with full penalty. |
verbose |
N |
Write parameter settings and progress messages to log file. |
version |
N |
Write version string to stdout and exit. |
Example invocation of the command line application and its associated parameters such that it can perform an analysis
Simplest case:Refine an alignmentmuscle -in ago.fa -out ago_aligned.fa
Using a pre-computed guide treemuscle -in seqs.afa -out refined.afa -refine
Output alignment to multiple file formatsmuscle -in seqs.fa -out seqs.afa -usetree mytree.phy
muscle -in seqs.fa -fastaout seqs.afa -clwout seqs.aln
Reference
Edgar, R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput.Nucleic Acids Res. 32(5):1792-1797.
doi:10.1093/nar/gkh340