nf-core/methylseq

Methylation (Bisulfite-Sequencing) analysis pipeline using Bismark or bwa-meth + MethylDackel

bisulfite-sequencingdna-methylationem-seqepigenomeepigenomicsmethyl-seqpbatrrbs

These pages are for an old version of the pipeline (1.6.1). The latest stable release is 3.0.0 .

Launch version 1.6.1 https://github.com/nf-core/methylseq

Define where the pipeline should find input data and save output data.

Input FastQ files.

type: string

Use this to specify the location of your input FastQ files. For example:

--input 'path/to/data/sample_*_{1,2}.fastq'

Please note the following requirements:

The path must be enclosed in quotes
The path must have at least one * wildcard character
When using the pipeline with paired end data, the path must use {1,2} notation to specify read pairs.

If left unspecified, a default pattern is used: data/*{1,2}.fastq.gz

Specifies that the input is single-end reads.

type: boolean

By default, the pipeline expects paired-end data. If you have single-end data, you need to specify --single_end on the command line when you launch the pipeline. A normal glob pattern, enclosed in quotation marks, can then be used for --input. For example:

--single_end --input '*.fastq'

It is not possible to run a mixture of single-end and paired-end files in one run.

The output directory where the results will be saved.

type: string

default: ./results

Email address for completion summary.

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (~/.nextflow/config) then you don't need to specify this on the command line for every run.

Alignment tool to use.

type: string

The nf-core/methylseq package is actually two pipelines in one. The default workflow uses Bismark with Bowtie2 as alignment tool: unless specified otherwise, nf-core/methylseq will run this pipeline.

Since bismark v0.21.0 it is also possible to use HISAT2 as alignment tool. To run this workflow, invoke the pipeline with the command line flag --aligner bismark_hisat. HISAT2 also supports splice-aware alignment if analysis of RNA is desired (e.g. SLAMseq experiments), a file containing a list of known splicesites can be provided with --known_splices.

The second workflow uses BWA-Meth and MethylDackel instead of Bismark. To run this workflow, run the pipeline with the command line flag --aligner bwameth.

Output information for all cytosine contexts.

type: boolean

By default, the pipeline only produces data for cytosine methylation states in CpG context. Specifying --comprehensive makes the pipeline give results for all cytosine contexts. Note that for large genomes (e.g. Human), these can be massive files. This is only recommended for small genomes (especially those that don't exhibit strong CpG context methylation specificity).

If specified, this flag instructs the Bismark methylation extractor to use the --comprehensive and --merge_non_CpG flags. This produces coverage files with information from about all strands and cytosine contexts merged into two files - one for CpG context and one for non-CpG context.

If using the bwa-meth workflow, the flag makes MethylDackel report CHG and CHH contexts as well.

Save aligned intermediates to results directory

hidden

type: boolean

Presets for working with specific bisulfite library preparation methods.

Preset for working with PBAT libraries.

type: boolean

Specify this parameter when working with PBAT (Post Bisulfite Adapter Tagging) libraries.

Using this parameter sets the --pbat flag when aligning with Bismark. This tells Bismark to align complementary strands (the opposite of --directional).

Additionally, this is a trimming preset equivalent to --clip_r1 6 --clip_r2 9 --three_prime_clip_r1 6 --three_prime_clip_r2 9

Turn on if dealing with MspI digested material.

type: boolean

Use this parameter when working with RRBS (Reduced Representation Bisulfite Sequencing) data, that is digested using MspI.

Specifying --rrbs will pass on the --rrbs parameter to TrimGalore! See the TrimGalore! documentation to read more about the effects of this option.

This parameter also makes the pipeline skip the deduplication step.

Run bismark in SLAM-seq mode.

type: boolean

Specify to run Bismark with the --slam flag to run bismark in SLAM-seq mode

NB: Only works with when using the bismark_hisat aligner (--aligner bismark_hisat)

Preset for EM-seq libraries.

type: boolean

Equivalent to --clip_r1 8 --clip_r2 8 --three_prime_clip_r1 8 --three_prime_clip_r2 8.

Also sets the --maxins flag to 1000 for Bismark.

Trimming preset for single-cell bisulfite libraries.

type: boolean

Equivalent to --clip_r1 6 --clip_r2 6 --three_prime_clip_r1 6 --three_prime_clip_r2 6.

Also sets the --non_directional flag for Bismark.

Trimming preset for the Accel kit.

type: boolean

Equivalent to --clip_r1 10 --clip_r2 15 --three_prime_clip_r1 10 --three_prime_clip_r2 10

Trimming preset for the CEGX bisulfite kit.

type: boolean

Equivalent to --clip_r1 6 --clip_r2 6 --three_prime_clip_r1 2 --three_prime_clip_r2 2

Trimming preset for the Epignome kit.

type: boolean

Equivalent to --clip_r1 8 --clip_r2 8 --three_prime_clip_r1 8 --three_prime_clip_r2 8

Trimming preset for the Zymo kit.

type: boolean

Equivalent to --clip_r1 10 --clip_r2 15 --three_prime_clip_r1 10 --three_prime_clip_r2 10.

Also sets the --non_directional flag for Bismark.

Options for the reference genome indices used to align reads.

Name of iGenomes reference.

type: string

If using a reference genome configured in the pipeline using iGenomes, use this parameter to give the ID for the reference. This is then used to build the full paths for all required reference genome files e.g. --genome GRCh38.

See the nf-core website docs for more details.

Path to FASTA genome file.

type: string

If you have no genome reference available, the pipeline can build one using a FASTA file. This requires additional time and resources, so it's better to use a pre-build index if possible. You can use the command line option --save_reference to keep the generated references so that they can be added to your config and used again in the future.

Note that the bwa-meth workflow always needs a FASTA file, for methylation calling.

Path to Fasta index file.

type: string

The FASTA index file (.fa.fai) is only needed when using the bwa_meth aligner. It is used by MethylDackel. If using Bismark this parameter is ignored.

Path to a directory containing a Bismark reference index.

type: string

bwameth index filename base

type: string

The base filename for a bwa-meth genome reference index. Only used when using the bwa-meth aligner.

Note that this is not a complete path, but rather a common filename base. For example, if you have file paths such as /path/to/ref/genome.fa.bwameth.c2t.bwt, you should specify /path/to/ref/genome.fa.

Save reference(s) to results directory

type: boolean

Directory / URL base for iGenomes references.

hidden

type: string

default: s3://ngi-igenomes/igenomes/

Do not load the iGenomes reference config.

hidden

type: boolean

Do not load igenomes.config when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in igenomes.config.

Bisulfite libraries often require additional base pairs to be removed from the ends of the reads before alignment.

Trim bases from the 5' end of read 1 (or single-end reads).

type: integer

Trim bases from the 5' end of read 2 (paired-end only).

type: integer

Trim bases from the 3' end of read 1 AFTER adapter/quality trimming.

type: integer

Trim bases from the 3' end of read 2 AFTER adapter/quality trimming

type: integer

Save trimmed reads to results directory.

hidden

type: boolean

By default, trimmed FastQ files will not be saved to the results directory. Specify this flag (or set to true in your config file) to copy these files to the results directory when complete.

Parameters specific to the Bismark workflow

Run alignment against all four possible strands.

type: boolean

By default, Bismark assumes that libraries are directional and does not align against complementary strands. If your library prep was not directional, use --non_directional to align against all four possible strands.

Note that the --single_cell and --zymo parameters both set the --non_directional workflow flag automatically.

Output stranded cytosine report during Bismark's bismark_methylation_extractor step.

type: boolean

By default, Bismark does not produce stranded calls. With this option the output considers all Cs on both forward and reverse strands and reports their position, strand, trinucleotide context and methylation state.

Turn on to relax stringency for alignment (set allowed penalty with --num_mismatches).

type: boolean

By default, Bismark is pretty strict about which alignments it accepts as valid. If you have good reason to believe that your reads will contain more mismatches than normal, this flags can be used to relax the stringency that Bismark uses when accepting alignments. This can greatly improve the number of aligned reads you get back, but may negatively impact the quality of your data.

Bismark uses the Bowtie alignment scoring mechanism to filter reads. Mismatches cost -6, gap opening -5 and gap extension -2. So, a threshold of-60 would allow 10 mismatches or ~ 8 x 1-2bp indels. The threshold is dependent on the length of reads, so a penalty value is used where penalty * bp read length = threshold.

The penalty value used by Bismark by default is 0.2, so for 100bp reads this would be a threshold of -20.

If you specifying the --relax_mismatches pipeline flag, Bismark instead uses 0.6, or a threshold of -60. This adds the Bismark flag --score_min L,0,-0.6 to the alignment command.

The penalty value can be modified using the --num_mismatches pipeline option.

0.6 will allow a penalty of bp * -0.6 - for 100bp reads (bismark default is 0.2)

type: number

default: 0.6

Customise the penalty in the function used to filter reads based on mismatches. The parameter --relax_mismatches must also be specified.

See the parameter documentation for --relax_mismatches for an explanation.

Save unmapped reads to FastQ files

type: boolean

Use the --unmapped flag to set the --unmapped flag with Bismark align and save the unmapped reads to FastQ files.

Specify a minimum read coverage to report a methylation call

type: integer

Use to discard any methylation calls with less than a given read coverage depth (in fold coverage) during Bismark's bismark_methylation_extractor step.

Supply a .gtf file containing known splice sites (bismark_hisat only).

type: string

Specify to run Bismark with the --known-splicesite-infile flag to run splice-aware alignment using HISAT2. A .gtf file has to be provided from which a list of known splicesites is created by the pipeline

NB: This only works when using the bismark_hisat aligner with --align

Allow soft-clipping of reads (potentially useful for single-cell experiments).

type: boolean

Specify to run Bismark with the --local flag to allow soft-clipping of reads. This should only be used with care in certain single-cell applications or PBAT libraries, which may produce chimeric read pairs. (See Wu et al.).

The minimum insert size for valid paired-end alignments.

type: integer

For example, if --minins 60 is specified and a paired-end alignment consists of two 20-bp alignments in the appropriate orientation with a 20-bp gap between them, that alignment is considered valid (as long as --maxins is also satisfied). A 19-bp gap would not be valid in that case.

Default: no flag (Bismark default: 0).

The maximum insert size for valid paired-end alignments.

type: integer

For example, if --maxins 100 is specified and a paired-end alignment consists of two 20-bp alignments in the proper orientation with a 60-bp gap between them, that alignment is considered valid (as long as --minins is also satisfied). A 61-bp gap would not be valid in that case.

Default: not specified. Bismark default: 500.

Specify how many CPUs are required per --multicore for bismark align

hidden

type: integer

The pipeline makes use of the --multicore option for Bismark align. When using this option, Bismark uses a large number of CPUs for every --multicore specified. The pipeline calculates the number of --multicore based on the resources available to the task. It divides the available CPUs by 3, or by 5 if any of --single_cell, --zymo or --non_directional are specified. This is based on usage for a typical mouse genome.

You may find when running the pipeline that Bismark is not using this many CPUs. To fine tune the usage and speed, you can specify an integer with --bismark_align_cpu_per_multicore and the pipeline will divide the available CPUs by this value instead.

See the bismark documentation for more information.

Specify how much memory is required per --multicore for bismark align

hidden

type: string

Exactly the same as with --bismark_align_cpu_per_multicore, but for memory. By default, the pipeline divides the available memory by 13.GB, or 18.GB if any of --single_cell, --zymo or --non_directional are specified.

Note that the final --multicore value is based on the lowest limiting factor of both CPUs and memory.

Specify a minimum read coverage for MethylDackel to report a methylation call.

type: integer

MethylDackel - ignore SAM flags

type: boolean

Run MethylDackel with the --ignore_flags option, to ignore SAM flags.

Save files for use with methylKit

type: boolean

Run MethylDackel with the --methyl_kit option, to produce files suitable for use with the methylKit R package.

Skip read trimming.

type: boolean

Skip deduplication step after alignment.

type: boolean

Deduplication removes PCR duplicate reads after alignment. Specifying this option will skip this step, leaving duplicate reads in your data.

Note that this is turned on automatically if --rrbs is specified.

Less common options for the pipeline, typically set in a config file.

Display help text.

hidden

type: boolean

Method used to save pipeline results to output directory.

hidden

type: string

The Nextflow publishDir option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See Nextflow docs for details.

Boolean whether to validate parameters against the schema at runtime

hidden

type: boolean

default: true

Email address for completion summary, only when pipeline fails.

hidden

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

This works exactly as with --email, except emails are only sent if the workflow is not successful.

Send plain-text email instead of HTML.

hidden

type: boolean

Set to receive plain-text e-mails instead of HTML formatted.

File size limit when attaching MultiQC reports to summary emails.

hidden

type: string

default: 25.MB

If file generated by pipeline exceeds the threshold, it will not be attached.

Do not use coloured log outputs.

hidden

type: boolean

Set to disable colourful command line output and live life in monochrome.

Custom config file to supply to MultiQC.

hidden

type: string

Directory to keep pipeline Nextflow logs and reports.

hidden

type: string

default: ${params.outdir}/pipeline_info

Show all params when using --help

hidden

type: boolean

By default, parameters set as hidden in the schema are not shown on the command line when a user runs with --help. Specifying this option will tell the pipeline to show all parameters.

Set the top limit for requested resources for any single job.

Maximum number of CPUs that can be requested for any single job.

hidden

type: integer

default: 16

Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. --max_cpus 1

Maximum amount of memory that can be requested for any single job.

hidden

type: string

default: 128.GB

pattern: ^[\d\.]+\s*.(K|M|G|T)?B$

Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. --max_memory '8.GB'

Maximum amount of time that can be requested for any single job.

hidden

type: string

default: 240.h

pattern: ^[\d\.]+\.*(s|m|h|d)$

Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. --max_time '2.h'

Parameters used to describe centralised config profiles. These should not be edited.

Git commit id for Institutional configs.

hidden

type: string

default: master

Provide git commit id for custom Institutional configs hosted at nf-core/configs. This was implemented for reproducibility purposes. Default: master.

## Download and use config file with following git commit id
--custom_config_version d52db660777c4bf36546ddb188ec530c3ada1b96

Base directory for Institutional configs.

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/configs/master

If you're running offline, nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell nextflow where to find them with the custom_config_base option. For example:

## Download and unzip the config files
cd /path/to/my/configs
wget https://github.com/nf-core/configs/archive/master.zip
unzip master.zip

## Run the pipeline
cd /path/to/my/data
nextflow run /path/to/pipeline/ --custom_config_base /path/to/my/configs/configs-master/

Note that the nf-core/tools helper package has a download command to download all required pipeline files + singularity containers + institutional configs in one go for you, to make this process easier.

Institutional configs hostname.

hidden

type: string

Institutional config name.

hidden

type: string

Institutional config description.

hidden

type: string

Institutional config contact information.

hidden

type: string

Institutional config URL link.

hidden

type: string

On this page