CLUSTALW_Workflow_Example

Navigate space

Purpose

  • This is an example workflow that demonstrates how to use CLUSTALW to do a multiple sequence alignment from the command line. It is also to demonstrate how to run this program in non-intractive mode, the first step to programmatic wrapping.
  • The starting point is DNA sequences

Prerequisites

  • Access to a linux/unix shell
  • This work flow assumes that you have the BioPerl libraries and the CLUSTALW binary executables compiled and installed in your path, either by putting them in /usr/local/bin or by editing your $PATH environmental variable.

    Note on CLUSTALW

    Note that there are many ways to do multiple sequence alignments. This is just one example.

The DNA sequences

  • I downloaded coding sequences (CDS) for actin genes from five metazoan species from NCBI.
  • A complete CDS starts at the start codon (ATG; Methionine) and ends at the stop codon (TAG, TGA or TAA).
  • Codons are three nucleotide units that encode particular amino acids or the stop-translation signal.
  • This is a sample CDS for C. elegans, in FASTA format:
    >c_elegans
    ATGTGTGACGACGAGGTTGCCGCTCTTGTTGTAGACAATGGATCCGGAATGTGCAAGGCCGGATTCGCCG
    GAGACGACGCTCCACGCGCCGTGTTCCCATCCATTGTCGGAAGACCACGTCATCAAGGAGTCATGGTCGG
    TATGGGACAGAAGGACTCGTACGTCGGAGACGAGGCCCAATCCAAGAGAGGTATCCTTACCCTCAAGTAC
    CCAATTGAGCACGGTATCGTCACCAACTGGGATGATATGGAGAAGATCTGGCATCACACCTTCTACAATG
    AGCTTCGTGTTGCCCCAGAAGAGCACCCAGTCCTCCTCACTGAAGCCCCACTCAATCCAAAGGCTAACCG
    TGAAAAGATGACCCAAATCATGTTCGAGACCTTCAACACCCCAGCCATGTATGTCGCCATCCAAGCTGTC
    CTCTCCCTCTACGCTTCCGGACGTACCACCGGAGTCGTCCTCGACTCTGGAGATGGTGTCACCCACACCG
    TCCCAATCTACGAAGGATATGCCCTCCCACACGCCATCCTCCGTCTTGACTTGGCTGGACGTGATCTTAC
    TGATTACCTCATGAAGATCCTTACCGAGCGTGGTTACTCTTTCACCACCACCGCTGAGCGTGAAATCGTC
    CGTGACATCAAGGAGAAGCTCTGCTACGTCGCCCTCGACTTCGAGCAAGAAATGGCCACCGCCGCTTCTT
    CCTCTTCCCTCGAGAAGTCCTACGAACTTCCTGACGGACAAGTCATCACCGTCGGAAACGAACGTTTCCG
    TTGCCCAGAGGCTATGTTCCAGCCATCCTTCTTGGGTATGGAGTCCGCCGGAATCCACGAGACTTCTTAC
    AACTCCATCATGAAGTGCGACATTGATATCCGTAAGGACTTGTACGCCAACACTGTTCTTTCCGGAGGAA
    CCACCATGTACCCAGGAATTGCTGATCGTATGCAGAAGGAAATCACCGCTCTTGCCCCATCAACCATGAA
    GATCAAGATCATCGCCCCACCAGAGCGCAAGTACTCCGTCTGGATCGGAGGATCTATCCTCGCTTCCCTC
    TCCACCTTCCAACAGATGTGGATCTCCAAGCAAGAATACGACGAGTCCGGCCCATCCATCGTTCACCGCA
    AGTGCTTCTAA
    
  • View the whole FASTA file

Doing the multiple sequence alignment with CLUSTALW

menu-driven interface

  • CLUSTALW can be run from the command line
  • It is a binary executable that uses interactive menus
  • A basic multiple sequence alignment starts with loading the file (select option 1, then enter the filename, actin.fa)
    $ clustalw 
    
     **************************************************************
     ******** CLUSTAL 2.0.9 Multiple Sequence Alignments  ********
     **************************************************************
    
    
         1. Sequence Input From Disc
         2. Multiple Alignments
         3. Profile / Structure Alignments
         4. Phylogenetic trees
    
         S. Execute a system command
         H. HELP
         X. EXIT (leave program)
    
    
    Your choice: 
    
    
  • Then do the alignment by choosing option 2.
    ****** MULTIPLE ALIGNMENT MENU ******
    
    
        1.  Do complete multiple alignment now Slow/Accurate
        2.  Produce guide tree file only
        3.  Do alignment using old guide tree file
    
        4.  Toggle Slow/Fast pairwise alignments = SLOW
    
        5.  Pairwise alignment parameters
        6.  Multiple alignment parameters
    
        7.  Reset gaps before alignment? = OFF
        8.  Toggle screen display          = ON
        9.  Output format options
        I. Iteration = NONE
    
        S.  Execute a system command
        H.  HELP
        or press [RETURN] to go back to main menu
    
    
    Your choice: 
    
  • Then select option 1, and choose the default output file names when prompted. The alignments will be performed and saved to a file as well as printed to the screen.
    Enter a name for the CLUSTAL output file  [actin.aln]: 
    Start of Pairwise alignments
    Aligning...
    
    Sequences (1:2) Aligned. Score:  85
    Sequences (1:3) Aligned. Score:  88
    Sequences (1:4) Aligned. Score:  90
    Sequences (1:5) Aligned. Score:  89
    Sequences (2:3) Aligned. Score:  83
    Sequences (2:4) Aligned. Score:  86
    Sequences (2:5) Aligned. Score:  85
    Sequences (3:4) Aligned. Score:  86
    Sequences (3:5) Aligned. Score:  86
    Sequences (4:5) Aligned. Score:  94
    Enter name for new GUIDE TREE           file   [actin.dnd]: 
    
    Guide tree file created:   [actin.dnd]
    
    There are 4 groups
    Start of Multiple Alignment
    
    Aligning...
    Group 1: Sequences:   2      Score:19741
    Group 2: Sequences:   2      Score:20738
    Group 3: Sequences:   4      Score:19601
    Group 4: Sequences:   5      Score:19209
    Alignment Score 74162
    
    Consensus length = 1131 
    
    CLUSTAL-Alignment file created  [actin.aln]
    
    
    CLUSTAL 2.0.9 multiple sequence alignment
    
    
    b_xylophilus        ATGTGTGACGAAGAAGTTGCCGCTCTTGTTGTGGACAATGGCTCCGGTATGTGCAAAGCC
    p_magellanicus      ATGTGTGACGACGAGGTAGCAGCTTTAGTAGTAGACAATGGCTCCGGTATGTGCAAGGCC
    c_elegans           ATGTGTGACGACGAGGTTGCCGCTCTTGTTGTAGACAATGGATCCGGAATGTGCAAGGCC
    c_briggsae          ATGTGTGACGACGAGGTTGCAGCTCTCGTAGTGGACAATGGCTCCGGAATGTGCAAGGCC
    c_oncophora         ATGTGTGACGACGAGGTTGCTGCTCTTGTGGTTGACAATGGATCCGGAATGTGCAAAGCC
                        *********** ** ** ** *** * ** ** ******** ***** ******** ***
    
    b_xylophilus        GGTTTCGCCGGAGATGATGCCCCACGTGCCGTCTTCCCCTCCATTGTCGGAAGACCCCGT
    p_magellanicus      GGGTTCGCCGGAGACGATGCTCCACGCGCTGTGTTCCCCTCCATTGTTGGAAGGCCCCGT
    c_elegans           GGATTCGCCGGAGACGACGCTCCACGCGCCGTGTTCCCATCCATTGTCGGAAGACCACGT
    c_briggsae          GGATTTGCCGGAGACGATGCTCCACGCGCCGTCTTCCCATCCATCGTTGGACGCCCAAGA
    c_oncophora         GGATTTGCCGGAGATGACGCTCCTCGAGCTGTCTTCCCCTCCATCGTCGGCCGACCCCGT
                        ** ** ******** ** ** ** ** ** ** ***** ***** ** **  * **  * 
    
    b_xylophilus        CATCAAGGTGTCATGGTCGGTATGGGACAGAAGGACTCCTATGTCGGAGACGAGGCCCAG
    p_magellanicus      CACCAGGGTGTCATGGTTGGTATGGGTCAGAAAGACAGCTACGTAGGAGATGAAGCTCAG
    c_elegans           CATCAAGGAGTCATGGTCGGTATGGGACAGAAGGACTCGTACGTCGGAGACGAGGCCCAA
    c_briggsae          CATCAAGGAGTCATGGTCGGTATGGGACAGAAGGACTCGTACGTCGGAGACGAGGCTCAA
    c_oncophora         CACCAGGGTGTCATGGTTGGTATGGGACAGAAGGACTCGTACGTAGGAGACGAGGCTCAG
    
    Press [RETURN] to continue or  X  to stop:
    
  • You are done, the alignment file is named actin.aln

Using CLUSTALW non-interactively

  • A menu driven-interface is not useful for pipeline or programatic access.
  • Fortunately, we can run the application by passing the commands via STDIN
  • This is accomplished by creating a text file with the sequence of commands in it.
    1
    actin.fa
    2
    1
    actin.aln
    actin.dnd
    X
    X
    X
    
  • Annotated version:
    • select menu option one, load the input file
      1
      actin.fa
      
    • select option 2 (multiple alignments); option 1 runs the alignment.
      2
      1
      
    • provide output file names for the alignments and guide tree files
      actin.aln
      actin.dnd
      
    • exit from alignment display; alignment menu; main menu
      X
      X
      X
      
  • To run the program non-interactively, save the commands as clustalw_commands.txt, then run CLUSTALW using this incantation:
    $ clustalw <clustalw_commands.txt
    
  • program output will scroll rapidly on screen and also save the multiple sequence alignments in actin.aln

Other ways to access CLUSTALW non-interactively