mako is in active development and not all features are supported yet. Key features that need to be implemented:
  • Certain QC visualisations, such as the metagene plot
  • Transcriptome to genome mapping
  • Post-differential expression calling visualisation
Bug reports are highly welcome and we would greatly appreciate they be sent to our GitHub Issues tracker.

Parameters

Input/output options

Define where the pipeline should find input data and save output data.

Parameter Description Default
dataset_name The name of this dataset, used in output files and within visualisations
samplesheet Path to comma-separated file containing information about the samples in the experiment.
outdir The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.
publish_dir_mode How to copy output from the work directory to the output directory (accepted: copy|symlink) copy

Pipeline parameters

Parameter Description Default
differential_model Model to use to call differentially modified sites, as a comma-separated list. (accepted: adaptive_binomial|homo_norm|hetero_norm|binomial|beta_binomial) adaptive_binomial
transcriptome Reference transcriptome, in .fasta format
gtf GTF annotation file

Modification calling parameters

Settings which control which sites are selected for differential calling.

Parameter Description Default
min_reads_per_sample How many reads each sample should have at a site in order for the site to be considered. 5
mod_threshold_dorado A passing threshold to use with Dorado data to determine a base modification. 0.5
mod_filter_dorado Reads with a modification probability within this interval will be filtered out and not used in the Dorado model. The format should be a,b i.e. 0.2,0.8. By default, no filtering is done.
mod_threshold_m6anet A passing threshold to use with m6Anet data to determine a base modification. 0.033379376
mod_filter_m6anet Reads with a modification probability within this interval will be filtered out and not used in the m6Anet model. The format should be a,b i.e. 0.2,0.8. By default, no filtering is done.

Samplesheet

The samplesheet is a CSV file which contains information about the samples to be analysed in the pipeline. It should have the following columns. A header is required. Optional columns should be left empty.

  • name: a unique name for each sample
  • group: the experimental group or condition for each sample
  • path_dorado: (optional) path to pre-basecalled Dorado modification data for each sample.
    The file should be a .bam file in ‘modbam’ format i.e. with tags MM and ML. See the Dorado documentation for more information.
  • path_m6anet: (optional) path to pre-basecalled m6Anet modification data for each sample
    The file should be a .csv file, most likely data.indiv_proba.csv, with columns transcript_id, transcript_position, read_index, probability_modified.

Two groups should be provided to call differential modifications between conditions. Group names should be alphanumeric and without spaces. The underlying models take the first group alphabetically as the reference level, and the second group alphabetically as the treatment level.

An example samplesheet is shown below:

name,group,path_dorado,path_m6anet
sample1,group1,/path/to/dorado/reads.bam,
namegrouppath_doradopath_m6anet
sample1group1/path/to/dorado/reads.bamnull

Mako is maintained by the Shim Lab @ the University of Melbourne.