nf-core/methylseq
Methylation (Bisulfite-Sequencing) analysis pipeline using Bismark or bwa-meth + MethylDackel
1.6.1
). The latest
stable release is
2.7.1
.
Define where the pipeline should find input data and save output data.
Input FastQ files.
string
Use this to specify the location of your input FastQ files. For example:
--input 'path/to/data/sample_*_{1,2}.fastq'
Please note the following requirements:
- The path must be enclosed in quotes
- The path must have at least one
*
wildcard character - When using the pipeline with paired end data, the path must use
{1,2}
notation to specify read pairs.
If left unspecified, a default pattern is used: data/*{1,2}.fastq.gz
Specifies that the input is single-end reads.
boolean
By default, the pipeline expects paired-end data. If you have single-end data, you need to specify --single_end
on the command line when you launch the pipeline. A normal glob pattern, enclosed in quotation marks, can then be used for --input
. For example:
--single_end --input '*.fastq'
It is not possible to run a mixture of single-end and paired-end files in one run.
The output directory where the results will be saved.
string
./results
Email address for completion summary.
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (~/.nextflow/config
) then you don't need to specify this on the command line for every run.
Alignment tool to use.
string
The nf-core/methylseq package is actually two pipelines in one. The default workflow uses Bismark with Bowtie2 as alignment tool: unless specified otherwise, nf-core/methylseq will run this pipeline.
Since bismark v0.21.0 it is also possible to use HISAT2 as alignment tool. To run this workflow, invoke the pipeline with the command line flag --aligner bismark_hisat
. HISAT2 also supports splice-aware alignment if analysis of RNA is desired (e.g. SLAMseq experiments), a file containing a list of known splicesites can be provided with --known_splices
.
The second workflow uses BWA-Meth and MethylDackel instead of Bismark. To run this workflow, run the pipeline with the command line flag --aligner bwameth
.
Output information for all cytosine contexts.
boolean
By default, the pipeline only produces data for cytosine methylation states in CpG context. Specifying --comprehensive
makes the pipeline give results for all cytosine contexts. Note that for large genomes (e.g. Human), these can be massive files. This is only recommended for small genomes (especially those that don't exhibit strong CpG context methylation specificity).
If specified, this flag instructs the Bismark methylation extractor to use the --comprehensive
and --merge_non_CpG
flags. This produces coverage files with information from about all strands and cytosine contexts merged into two files - one for CpG context and one for non-CpG context.
If using the bwa-meth workflow, the flag makes MethylDackel report CHG and CHH contexts as well.
Save aligned intermediates to results directory
boolean
Presets for working with specific bisulfite library preparation methods.
Preset for working with PBAT libraries.
boolean
Specify this parameter when working with PBAT (Post Bisulfite Adapter Tagging) libraries.
Using this parameter sets the --pbat
flag when aligning with Bismark. This tells Bismark to align complementary strands (the opposite of --directional
).
Additionally, this is a trimming preset equivalent to --clip_r1 6
--clip_r2 9
--three_prime_clip_r1 6
--three_prime_clip_r2 9
Turn on if dealing with MspI digested material.
boolean
Use this parameter when working with RRBS (Reduced Representation Bisulfite Sequencing) data, that is digested using MspI.
Specifying --rrbs
will pass on the --rrbs
parameter to TrimGalore! See the TrimGalore! documentation to read more about the effects of this option.
This parameter also makes the pipeline skip the deduplication step.
Run bismark in SLAM-seq mode.
boolean
Specify to run Bismark with the --slam
flag to run bismark in SLAM-seq mode
NB: Only works with when using the
bismark_hisat
aligner (--aligner bismark_hisat
)
Preset for EM-seq libraries.
boolean
Equivalent to --clip_r1 8
--clip_r2 8
--three_prime_clip_r1 8
--three_prime_clip_r2 8
.
Also sets the --maxins
flag to 1000
for Bismark.
Trimming preset for single-cell bisulfite libraries.
boolean
Equivalent to --clip_r1 6
--clip_r2 6
--three_prime_clip_r1 6
--three_prime_clip_r2 6
.
Also sets the --non_directional
flag for Bismark.
Trimming preset for the Accel kit.
boolean
Equivalent to --clip_r1 10
--clip_r2 15
--three_prime_clip_r1 10
--three_prime_clip_r2 10
Trimming preset for the CEGX bisulfite kit.
boolean
Equivalent to --clip_r1 6
--clip_r2 6
--three_prime_clip_r1 2
--three_prime_clip_r2 2
Trimming preset for the Epignome kit.
boolean
Equivalent to --clip_r1 8
--clip_r2 8
--three_prime_clip_r1 8
--three_prime_clip_r2 8
Trimming preset for the Zymo kit.
boolean
Equivalent to --clip_r1 10
--clip_r2 15
--three_prime_clip_r1 10
--three_prime_clip_r2 10
.
Also sets the --non_directional
flag for Bismark.
Options for the reference genome indices used to align reads.
Name of iGenomes reference.
string
If using a reference genome configured in the pipeline using iGenomes, use this parameter to give the ID for the reference. This is then used to build the full paths for all required reference genome files e.g. --genome GRCh38
.
See the nf-core website docs for more details.
Path to FASTA genome file.
string
If you have no genome reference available, the pipeline can build one using a FASTA file. This requires additional time and resources, so it's better to use a pre-build index if possible. You can use the command line option --save_reference
to keep the generated references so that they can be added to your config and used again in the future.
Note that the bwa-meth
workflow always needs a FASTA file, for methylation calling.
Path to Fasta index file.
string
The FASTA index file (.fa.fai
) is only needed when using the bwa_meth aligner. It is used by MethylDackel. If using Bismark this parameter is ignored.
Path to a directory containing a Bismark reference index.
string
bwameth index filename base
string
The base filename for a bwa-meth genome reference index. Only used when using the bwa-meth aligner.
Note that this is not a complete path, but rather a common filename base. For example, if you have file paths such as /path/to/ref/genome.fa.bwameth.c2t.bwt
, you should specify /path/to/ref/genome.fa
.
Save reference(s) to results directory
boolean
Directory / URL base for iGenomes references.
string
s3://ngi-igenomes/igenomes/
Do not load the iGenomes reference config.
boolean
Do not load igenomes.config
when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in igenomes.config
.
Bisulfite libraries often require additional base pairs to be removed from the ends of the reads before alignment.
Trim bases from the 5' end of read 1 (or single-end reads).
integer
Trim bases from the 5' end of read 2 (paired-end only).
integer
Trim bases from the 3' end of read 1 AFTER adapter/quality trimming.
integer
Trim bases from the 3' end of read 2 AFTER adapter/quality trimming
integer
Save trimmed reads to results directory.
boolean
By default, trimmed FastQ files will not be saved to the results directory. Specify this flag (or set to true in your config file) to copy these files to the results directory when complete.
Parameters specific to the Bismark workflow
Run alignment against all four possible strands.
boolean
By default, Bismark assumes that libraries are directional and does not align against complementary strands. If your library prep was not directional, use --non_directional
to align against all four possible strands.
Note that the --single_cell
and --zymo
parameters both set the --non_directional
workflow flag automatically.
Output stranded cytosine report during Bismark's bismark_methylation_extractor step.
boolean
By default, Bismark does not produce stranded calls. With this option the output considers all Cs on both forward and reverse strands and reports their position, strand, trinucleotide context and methylation state.
Turn on to relax stringency for alignment (set allowed penalty with --num_mismatches).
boolean
By default, Bismark is pretty strict about which alignments it accepts as valid. If you have good reason to believe that your reads will contain more mismatches than normal, this flags can be used to relax the stringency that Bismark uses when accepting alignments. This can greatly improve the number of aligned reads you get back, but may negatively impact the quality of your data.
Bismark uses the Bowtie alignment scoring mechanism to filter reads. Mismatches cost -6
, gap opening -5
and gap extension -2
. So, a threshold of-60
would allow 10 mismatches or ~ 8 x 1-2bp indels. The threshold is dependent on the length of reads, so a penalty value is used where penalty * bp read length = threshold
.
The penalty value used by Bismark by default is 0.2
, so for 100bp reads this would be a threshold of -20
.
If you specifying the --relax_mismatches
pipeline flag, Bismark instead uses 0.6
, or a threshold of -60
. This adds the Bismark flag --score_min L,0,-0.6
to the alignment command.
The penalty value can be modified using the --num_mismatches
pipeline option.
0.6 will allow a penalty of bp * -0.6 - for 100bp reads (bismark default is 0.2)
number
0.6
Customise the penalty in the function used to filter reads based on mismatches. The parameter --relax_mismatches
must also be specified.
See the parameter documentation for --relax_mismatches
for an explanation.
Save unmapped reads to FastQ files
boolean
Use the --unmapped
flag to set the --unmapped
flag with Bismark align and save the unmapped reads to FastQ files.
Specify a minimum read coverage to report a methylation call
integer
Use to discard any methylation calls with less than a given read coverage depth (in fold coverage) during Bismark's bismark_methylation_extractor
step.
Supply a .gtf file containing known splice sites (bismark_hisat only).
string
Specify to run Bismark with the --known-splicesite-infile
flag to run splice-aware alignment using HISAT2. A .gtf
file has to be provided from which a list of known splicesites is created by the pipeline
NB: This only works when using the
bismark_hisat
aligner with--align
Allow soft-clipping of reads (potentially useful for single-cell experiments).
boolean
Specify to run Bismark with the --local
flag to allow soft-clipping of reads. This should only be used with care in certain single-cell applications or PBAT libraries, which may produce chimeric read pairs. (See Wu et al.).
The minimum insert size for valid paired-end alignments.
integer
For example, if --minins 60
is specified and a paired-end alignment consists of two 20-bp alignments in the appropriate orientation with a 20-bp gap between them, that alignment is considered valid (as long as --maxins
is also satisfied). A 19-bp gap would not be valid in that case.
Default: no flag (Bismark default: 0
).
The maximum insert size for valid paired-end alignments.
integer
For example, if --maxins 100
is specified and a paired-end alignment consists of two 20-bp alignments in the proper orientation with a 60-bp gap between them, that alignment is considered valid (as long as --minins
is also satisfied). A 61-bp gap would not be valid in that case.
Default: not specified. Bismark default: 500
.
Specify how many CPUs are required per --multicore for bismark align
integer
The pipeline makes use of the --multicore
option for Bismark align. When using this option, Bismark uses a large number of CPUs for every --multicore
specified. The pipeline calculates the number of --multicore
based on the resources available to the task. It divides the available CPUs by 3, or by 5 if any of --single_cell
, --zymo
or --non_directional
are specified. This is based on usage for a typical mouse genome.
You may find when running the pipeline that Bismark is not using this many CPUs. To fine tune the usage and speed, you can specify an integer with --bismark_align_cpu_per_multicore
and the pipeline will divide the available CPUs by this value instead.
See the bismark documentation for more information.
Specify how much memory is required per --multicore for bismark align
string
Exactly the same as with --bismark_align_cpu_per_multicore
, but for memory. By default, the pipeline divides the available memory by 13.GB
, or 18.GB
if any of --single_cell
, --zymo
or --non_directional
are specified.
Note that the final --multicore
value is based on the lowest limiting factor of both CPUs and memory.
Specify a minimum read coverage for MethylDackel to report a methylation call.
integer
MethylDackel - ignore SAM flags
boolean
Run MethylDackel with the --ignore_flags
option, to ignore SAM flags.
Save files for use with methylKit
boolean
Run MethylDackel with the --methyl_kit
option, to produce files suitable for use with the methylKit R package.
Skip read trimming.
boolean
Skip deduplication step after alignment.
boolean
Deduplication removes PCR duplicate reads after alignment. Specifying this option will skip this step, leaving duplicate reads in your data.
Note that this is turned on automatically if --rrbs
is specified.
Less common options for the pipeline, typically set in a config file.
Display help text.
boolean
Method used to save pipeline results to output directory.
string
The Nextflow publishDir
option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See Nextflow docs for details.
Boolean whether to validate parameters against the schema at runtime
boolean
true
Email address for completion summary, only when pipeline fails.
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
This works exactly as with --email
, except emails are only sent if the workflow is not successful.
Send plain-text email instead of HTML.
boolean
Set to receive plain-text e-mails instead of HTML formatted.
File size limit when attaching MultiQC reports to summary emails.
string
25.MB
If file generated by pipeline exceeds the threshold, it will not be attached.
Do not use coloured log outputs.
boolean
Set to disable colourful command line output and live life in monochrome.
Custom config file to supply to MultiQC.
string
Directory to keep pipeline Nextflow logs and reports.
string
${params.outdir}/pipeline_info
Show all params when using --help
boolean
Set the top limit for requested resources for any single job.
Maximum number of CPUs that can be requested for any single job.
integer
16
Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. --max_cpus 1
Maximum amount of memory that can be requested for any single job.
string
128.GB
^[\d\.]+\s*.(K|M|G|T)?B$
Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. --max_memory '8.GB'
Maximum amount of time that can be requested for any single job.
string
240.h
^[\d\.]+\.*(s|m|h|d)$
Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. --max_time '2.h'
Parameters used to describe centralised config profiles. These should not be edited.
Git commit id for Institutional configs.
string
master
Provide git commit id for custom Institutional configs hosted at nf-core/configs
. This was implemented for reproducibility purposes. Default: master
.
## Download and use config file with following git commit id
--custom_config_version d52db660777c4bf36546ddb188ec530c3ada1b96
Base directory for Institutional configs.
string
https://raw.githubusercontent.com/nf-core/configs/master
If you're running offline, nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell nextflow where to find them with the custom_config_base
option. For example:
## Download and unzip the config files
cd /path/to/my/configs
wget https://github.com/nf-core/configs/archive/master.zip
unzip master.zip
## Run the pipeline
cd /path/to/my/data
nextflow run /path/to/pipeline/ --custom_config_base /path/to/my/configs/configs-master/
Note that the nf-core/tools helper package has a
download
command to download all required pipeline files + singularity containers + institutional configs in one go for you, to make this process easier.
Institutional configs hostname.
string
Institutional config name.
string
Institutional config description.
string
Institutional config contact information.
string
Institutional config URL link.
string