HAMRLINC v1.0

Rationale and Background

HAMRLINC is a multipurpose toolbox that expedites the analysis pipeline of HAMR and Evolinc-i algorithms. The former was developed by Paul Ryvkin et al, and the latter by Andrew D.L. Nelson et al. HAMRLINC aims to make the original methods more accessible by automating the tedious pre-processing steps and expanding on their functionalities with its built-in post-processing steps, allowing users to perform transcript abundance quantification, lincRNAs identification, and RNA modification prediction with intuitive output formats. HAMRLINC is high-throughput and performs the above-mentioned analyses at a bioproject scale.

Minimum requirements

Reference genome (FASTA)
Reference annotation (GFF3)
Raw reads/SRA IDs (paired-end or single-end)
Input reads description file (CSV)

Pre-requisites

A CyVerse account (Register at https://user.cyver.org)
An updated web browser with java enabled.
The following mandatory fields:
1. Analysis Name, Comments, and Output Folder
  1. Use the default name shown below or edit to a more appropriate name
  2. Use the comment box to add additional description for the analysis run. This is optional.
  3. Select the folder where the analysis output folder will be saved to.

b. Inputs and Parameters

i. Select the location of the reference genome

ii. Input the base pair size of the reference genome

iii. Select the location of the reference annotation

iv. Select the location of the input reads description file

v. Input the length of the input read length

vi. Check the HAMRBOX box to activate RNA modification annotation analysis.

vii. Check the evolinc-i box to activate lincRNA identification

viii. Check the transcript abundance quantification box to activate featurecount.

Note: When any of the three boxes are left unchecked, that analysis arm of the pipeline is deactivated. At least one box must be checked for the pipeline to run.

c. Output folder name

i. Give a name to the analysis run output folder

d. Options

i. a few optional flags are provided, making it easier for users to control the stringency of each processing step of the pipeline. Please refer to HAMRLINC documentation website for a description of each optional flag.

Example Run:

An example run can be done by launching the HAMRLINC app with the default selections. For this example run, we use a subset of the RNA-Seq data generated by Yu et al 2021. For more information on the data, please check the paper. The subset run takes about 12 hours, depending on the computation resource allocated for the run.

Output

All outputs of HAMRLINC are organized in corresponding subdirectories of the user-defined output folder. When run with all three core processing enabled, HAMRLINC produces ten subdirectories in the output directory. Three subdirectories contain key intermediates like genome index files, sub-annotations, and trimmed fastq files, which can be used in various downstream processing of the user’s choice. Three subdirectories contain the raw output for each of the three core functionalities; one last subdirectory contains the visualizations and post-HAMR analysis results.