The program FALCON-formatter takes fastq and fasta files from a Pacific Biosciences sequencer and formats them for de novo assembly with FALCON.
Even though it is more convenient to store all reads in a single FASTA or FASTQ file on your system, Dazzler (and therefore FALCON) does not accept this kind of input. All inputs MUST be in FASTA format with files split by barcode, set, and part number. This means that fields 1-6 in the example below must be unique to each input file.
m140415_143853_42175_c100635972550000001823121909121417_s1_p0/553/3100_11230 1yymmdd_hhmmss 33333 4444444444444444444444444444444444 55 66 777 8888888888
- “m” = movie
- Time of Run Start (yymmdd_hhmmss)
- Instrument Serial Number
- SMRT Cell Barcode
- Set Number
- Part Number
- ZMW hole number*
- Subread Region (start_stop using polymerase read coordinates)*
- * These fields are only used in fasta/q headers
More information about file formats can be found at the SMRT-Analysis wiki.
Below is an example that demonstrates this requirement and process by correctly splitting the file Example.fasta.
In the 4 headers, there are two unique 1-6 field sets:
All subreads corresponding to these headers need to be in their own files, so Example.fasta would be split accordingly:
FALCON-formatter takes FASTA/Q files or folders of files as input, converts the FASTQ to FASTA and writes each read to a file corresponding to fields 1 through 6.
You first need to find the FALCON-formatter app in the HPC app catalog and launch it. Then, click on the “Inputs” drop down arrow to designate your inputs.
Then, click the browse button to open up a file explorer to choose your input.
Select either a single fastq/fastq file or a whole folder to process.
Click “Launch Analysis” to start your job. You’ll get notifications when the program starts and when it finishes.