mako is in active development and not all features are supported yet. Key features that need to be implemented:Bug reports are highly welcome and we would greatly appreciate they be sent to our GitHub Issues tracker.
- Certain QC visualisations, such as the metagene plot
- Transcriptome to genome mapping
- Post-differential expression calling visualisation
Parameters
Input/output options
Define where the pipeline should find input data and save output data.
| Parameter | Description | Default |
|---|---|---|
dataset_name | The name of this dataset, used in output files and within visualisations | |
samplesheet | Path to comma-separated file containing information about the samples in the experiment. | |
outdir | The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure. | |
publish_dir_mode | How to copy output from the work directory to the output directory (accepted: copy|symlink) | copy |
Pipeline parameters
| Parameter | Description | Default |
|---|---|---|
differential_model | Model to use to call differentially modified sites, as a comma-separated list. (accepted: adaptive_binomial|homo_norm|hetero_norm|binomial|beta_binomial) | adaptive_binomial |
transcriptome | Reference transcriptome, in .fasta format | |
gtf | GTF annotation file |
Modification calling parameters
Settings which control which sites are selected for differential calling.
| Parameter | Description | Default |
|---|---|---|
min_reads_per_sample | How many reads each sample should have at a site in order for the site to be considered. | 5 |
mod_threshold_dorado | A passing threshold to use with Dorado data to determine a base modification. | 0.5 |
mod_filter_dorado | Reads with a modification probability within this interval will be filtered out and not used in the Dorado model. The format should be a,b i.e. 0.2,0.8. By default, no filtering is done. | |
mod_threshold_m6anet | A passing threshold to use with m6Anet data to determine a base modification. | 0.033379376 |
mod_filter_m6anet | Reads with a modification probability within this interval will be filtered out and not used in the m6Anet model. The format should be a,b i.e. 0.2,0.8. By default, no filtering is done. |
Samplesheet
The samplesheet is a CSV file which contains information about the samples to be analysed in the pipeline. It should have the following columns. A header is required. Optional columns should be left empty.
name: a unique name for each samplegroup: the experimental group or condition for each samplepath_dorado: (optional) path to pre-basecalled Dorado modification data for each sample.
The file should be a .bam file in ‘modbam’ format i.e. with tagsMMandML. See the Dorado documentation for more information.path_m6anet: (optional) path to pre-basecalled m6Anet modification data for each sample
The file should be a .csv file, most likelydata.indiv_proba.csv, with columnstranscript_id, transcript_position, read_index, probability_modified.
Two groups should be provided to call differential modifications between conditions. Group names should be alphanumeric and without spaces. The underlying models take the first group alphabetically as the reference level, and the second group alphabetically as the treatment level.
An example samplesheet is shown below:
name,group,path_dorado,path_m6anet
sample1,group1,/path/to/dorado/reads.bam,
| name | group | path_dorado | path_m6anet |
|---|---|---|---|
| sample1 | group1 | /path/to/dorado/reads.bam | null |