Difference between revisions of "EpiTALE"

From Jstacs
Jump to navigationJump to search
(Created page with "== Tools == === Bed2Bismark === '''Bed2Bismark''' Converts methylation information in bedMethyl format to bismark format. The input of '''Bed2Bismark''' is a file in bedMet...")
 
Line 3: Line 3:
=== Bed2Bismark ===
=== Bed2Bismark ===


'''Bed2Bismark''' Converts methylation information in bedMethyl format to bismark format.
'''Bed2Bismark''' converts methylation information in bedMethyl format to Bismark format.


The input of '''Bed2Bismark''' is a file in bedMethyl format.
The input of '''Bed2Bismark''' is a file in bedMethyl format.
Line 43: Line 43:
=== BismarkMerge2Files ===
=== BismarkMerge2Files ===


'''BismarkMerge2Files''' Merges files generated by [https://www.bioinformatics.babraham.ac.uk/projects/bismark/ Bismark methylation extractor] with parameters <code>–bedGraph –CX -p</code>.
'''BismarkMerge2Files''' merges files generated by [https://www.bioinformatics.babraham.ac.uk/projects/bismark/ Bismark methylation extractor] with parameters <code>–bedGraph –CX -p</code>.
The output contains a coverage file, which contains the tab-separated columns:
The output contains a coverage file, which contains the tab-separated columns:
<code>chromosome, start_position, end_position, methylation_percentage, count_methylated, count_unmethylated</code>.
<code>chromosome, start_position, end_position, methylation_percentage, count_methylated, count_unmethylated</code>.


The input of '''BismarkMerge2Files''' are two bismark coverage files.
The input of '''BismarkMerge2Files''' are two Bismark coverage files.


If you experience problems using '''BismarkMerge2Files''', please [mailto:grau@informatik.uni-halle.de contact] us.
If you experience problems using '''BismarkMerge2Files''', please [mailto:grau@informatik.uni-halle.de contact] us.
Line 91: Line 91:
=== BismarkConvertToPromoter ===
=== BismarkConvertToPromoter ===


'''BismarkConvertToPromoter''' converts the bismark output file to promoter search.
'''BismarkConvertToPromoter''' converts the Bismark output file to promoter coordinates.


The input of '''BismarkConvertToPromoter''' is  
The input of '''BismarkConvertToPromoter''' is  
1. a bismark coverage output file, which contains tab-separated columns:  
1. a Bismark coverage output file, which contains tab-separated columns:  
<code>chromosome, start_position, end_position, methylation_percentage, count_methylated, count_unmethylated</code> and  
<code>chromosome, start_position, end_position, methylation_percentage, count_methylated, count_unmethylated</code> and  
2. the promoter sequences in FastA format with headers like:
2. the promoter sequences in FastA format with headers like:
Line 183: Line 183:
=== NormalizePileupOutput ===
=== NormalizePileupOutput ===


'''NormalizePileupOutput''' Normalizes the pileup output file, that contains the coverage with 5’ ATAC-seq or DNase-seq reads at each position. It normalizes the coverage relative to the mean of a 10000 bp sliding window.
'''NormalizePileupOutput''' normalizes the pileup output file, that contains the coverage with 5’ ATAC-seq or DNase-seq reads at each position. It normalizes the coverage relative to the mean of a 10000 bp sliding window.


The input of '''NormalizePileupOutput''' is a pileup output file from '''Chromatin pileup''' tool.
The input of '''NormalizePileupOutput''' is a pileup output file from '''Chromatin pileup''' tool.
Line 223: Line 223:
=== PileupConvertToPromoter ===
=== PileupConvertToPromoter ===


'''PileupConvertToPromoter''' converts the pileup output file to promoter search.
'''PileupConvertToPromoter''' converts the pileup output file to promoter coordinates.


The input of '''PileupConvertToPromoter''' is  
The input of '''PileupConvertToPromoter''' is  
Line 273: Line 273:
=== NarrowPeakConvertToPromoter ===
=== NarrowPeakConvertToPromoter ===


'''NarrowPeakConvertToPromoter''' converts the narrowPeak file to promoter search.
'''NarrowPeakConvertToPromoter''' converts the narrowPeak containing peaks of chromatin accessibility file to promoter coordinates.


The input of '''NarrowPeakConvertToPromoter''' is  
The input of '''NarrowPeakConvertToPromoter''' is  
Line 323: Line 323:
=== EpiTALE prediction ===
=== EpiTALE prediction ===


'''EpiTALE''' predicts TALE target boxes using a novel model learned from quantitative data based on the RVD sequence of a TALE and optionally considers methylation state of the target box during prediction as DNA methylation affects the binding specificity of RVDs.  
'''EpiTALE''' predicts TALE target boxes using a novel model learned from quantitative data based on the RVD sequence of a TALE and optionally considers the methylation state of the target box during prediction, as DNA methylation affects the binding specificity of RVDs.  
Additionally, EpiTALE optionally annotates chromatin accessibility of predicted target sites using output of the '''NormalizePileupOutput''' tool and result of peak-calling of DNase-seq and ATAC-seq data to the predictions of '''EpiTALE'''.
Additionally, EpiTALE optionally annotates chromatin accessibility of predicted target sites using output of the '''NormalizePileupOutput''' tool and result of peak-calling of DNase-seq and ATAC-seq data to the predictions of '''EpiTALE'''.


Line 330: Line 330:
1. a set of sequences that are scanned for putative TALE target boxes. These sequences could be promoters of genes but also complete genomic sequences (FastA format).  
1. a set of sequences that are scanned for putative TALE target boxes. These sequences could be promoters of genes but also complete genomic sequences (FastA format).  


2. For computing p-values, EpiTALE additional needs a background set of sequences, which is by default generated as a sub-sample of the original input data.
2. For computing p-values, EpiTALE additionally needs a background set of sequences, which is by default generated as a sub-sample of the original input data.


3. The prediction threshold may be defined either by means of a p-values or an approximate number of expected sites. The latter will also be converted to a p-value, internally, and the defined number of expected sites in not met exactly, in general.
3. The prediction threshold may be defined either by means of a p-values or an approximate number of expected sites. The latter will also be converted to a p-value, internally, and the defined number of expected sites in not met exactly, in general.
Line 338: Line 338:
5. It can be specified if both strands or only one of the strands are scanned where, in the former case, a penalty may be assigned to predictions on the reverse strand. While this penalty may be reasonable when scanning promoters, it should usually be set to <code>0</code> in case of genome-wide predictions.
5. It can be specified if both strands or only one of the strands are scanned where, in the former case, a penalty may be assigned to predictions on the reverse strand. While this penalty may be reasonable when scanning promoters, it should usually be set to <code>0</code> in case of genome-wide predictions.


6. As optional input '''EpiTALE''' considers methylation during prediction, if bismark output is provided. With [https://www.bioinformatics.babraham.ac.uk/projects/bismark/ Bismark methylation extractor] with  parameters <code>–bedGraph –CX -p</code> you can generate a coverage file, which contains the tab-separated columns:  
6. As optional input '''EpiTALE''' considers methylation during prediction, if Bismark output is provided. With [https://www.bioinformatics.babraham.ac.uk/projects/bismark/ Bismark methylation extractor] with  parameters <code>–bedGraph –CX -p</code> you can generate a coverage file, which contains the tab-separated columns:  
<code>chromosome, start_position, end_position, methylation_percentage, count_methylated, count_unmethylated</code> (file.cov.gz).  
<code>chromosome, start_position, end_position, methylation_percentage, count_methylated, count_unmethylated</code> (file.cov.gz).  
You can alternatively use the tool '''Bed2Bismark''', which converts data in BedMethyl format to bismark format.  
You can alternatively use the tool '''Bed2Bismark''', which converts data in BedMethyl format to Bismark format.  


7.
7.
(i) The chromatin accessibility of the input sequences can optionally be provided in narrowPeak format. By mapping ATAC-seq or DNase-seq data to the corresponding genome and then performing peak calling, e.g. with [https://github.com/mahmoudibrahim/JAMM JAMM]. In case of promoter sequences as input, you should run the tool '''NarrowPeakConvertToPromoter''' to convert the narrowPeak-File to promoter positions.  
(i) The chromatin accessibility of the input sequences can optionally be provided in narrowPeak format. By mapping ATAC-seq or DNase-seq data to the corresponding genome and then performing peak calling, e.g. with [https://github.com/mahmoudibrahim/JAMM JAMM]. In case of promoter sequences as input, you should run the tool '''NarrowPeakConvertToPromoter''' to convert the narrowPeak-File to promoter positions.  
(ii) Additionally, you can calculate a coverage pileup of 5' ends of mapped reads with '''Chromatin pileup''' and normalize it with '''NormalizePileupOutput'''. In case of promoter sequences as input, you should run the tool '''PileupConvertToPromoter''' to convert to promoter positions.  
(ii) Additionally, you can calculate a coverage pileup of 5' ends of mapped reads with '''Chromatin pileup''' and normalize it with '''NormalizePileupOutput'''. In case of promoter sequences as input, you should run the tool '''PileupConvertToPromoter''' to convert to promoter coordinates.  


8.
8.

Revision as of 21:55, 10 May 2021

Tools

Bed2Bismark

Bed2Bismark converts methylation information in bedMethyl format to Bismark format.

The input of Bed2Bismark is a file in bedMethyl format.

If you experience problems using Bed2Bismark, please contact us.


Bed2Bismark may be called with

java -jar EpiTALEcli-0.1.jar bed2bismark

and has the following parameters

name comment type

b BedMethyl file (Methylationinformation in bedMethyl format, type = bed.gz,bed) FILE
outdir The output directory, defaults to the current working directory (.) STRING

Example:

java -jar EpiTALEcli-0.1.jar bed2bismark b=<BedMethyl_file>


BismarkMerge2Files

BismarkMerge2Files merges files generated by Bismark methylation extractor with parameters –bedGraph –CX -p. The output contains a coverage file, which contains the tab-separated columns: chromosome, start_position, end_position, methylation_percentage, count_methylated, count_unmethylated.

The input of BismarkMerge2Files are two Bismark coverage files.

If you experience problems using BismarkMerge2Files, please contact us.



BismarkMerge2Files may be called with

java -jar EpiTALEcli-0.1.jar bismerger

and has the following parameters

name comment type

b Bismark file 1 (Methylationinformation in bismark format file 1, type = cov.gz,cov) FILE
bf2 Bismark file 2 (Methylationinformation in bismark format file 2, type = cov.gz,cov) FILE
outdir The output directory, defaults to the current working directory (.) STRING

Example:

java -jar EpiTALEcli-0.1.jar bismerger b=<Bismark_file_1> bf2=<Bismark_file_2>


BismarkConvertToPromoter

BismarkConvertToPromoter converts the Bismark output file to promoter coordinates.

The input of BismarkConvertToPromoter is 1. a Bismark coverage output file, which contains tab-separated columns: chromosome, start_position, end_position, methylation_percentage, count_methylated, count_unmethylated and 2. the promoter sequences in FastA format with headers like: > id chromosomeName:start-end:strand e.g. > Os01g01010.1 Chr1:2602-3102:+.

If you experience problems using BismarkConvertToPromoter, please contact us.


BismarkConvertToPromoter may be called with

java -jar EpiTALEcli-0.1.jar bis2prom

and has the following parameters

name comment type

b Bismark file (Methylationinformation in bismark format, type = cov.gz,cov) FILE
p Promoter fasta file (Promoter fastA file, type = fa,fasta) FILE
outdir The output directory, defaults to the current working directory (.) STRING

Example:

java -jar EpiTALEcli-0.1.jar bis2prom b=<Bismark_file> p=<Promoter_fasta_file>


Chromatin pileup

Chromatin pileup takes as input a BAM file of mapped reads from an DNase-seq or ATAC-seq experiment and computes a coverage pileup of 5' ends of mapped reads, and outputs a simple tab-separated file with columns: chromosome, position, and pileup value (number of reads with a 5' end at this position).

If you experience problems using Chromatin pileup, please contact us.


Chromatin pileup may be called with

java -jar EpiTALEcli-0.1.jar pileup

and has the following parameters

name comment type

b BAM file (Mapped reads from DNase-seq or ATAC-seq experiment, type = bam) FILE
outdir The output directory, defaults to the current working directory (.) STRING

Example:

java -jar EpiTALEcli-0.1.jar pileup b=<BAM_file>


NormalizePileupOutput

NormalizePileupOutput normalizes the pileup output file, that contains the coverage with 5’ ATAC-seq or DNase-seq reads at each position. It normalizes the coverage relative to the mean of a 10000 bp sliding window.

The input of NormalizePileupOutput is a pileup output file from Chromatin pileup tool.

If you experience problems using NormalizePileupOutput, please contact us.


NormalizePileupOutput may be called with

java -jar EpiTALEcli-0.1.jar normpileup

and has the following parameters

name comment type

p Pileup output file (Pileup output file., type = tsv.gz,tsv,txt) FILE
outdir The output directory, defaults to the current working directory (.) STRING

Example:

java -jar EpiTALEcli-0.1.jar normpileup p=<Pileup_output_file>


PileupConvertToPromoter

PileupConvertToPromoter converts the pileup output file to promoter coordinates.

The input of PileupConvertToPromoter is 1. a normalized pileup output file from NormalizePileupOutput tool and 2. the promoter sequences in FastA format with headers like: > id chromosomeName:start-end:strand e.g. > Os01g01010.1 Chr1:2602-3102:+.

If you experience problems using PileupConvertToPromoter, please contact us.


PileupConvertToPromoter may be called with

java -jar EpiTALEcli-0.1.jar pile2prom

and has the following parameters

name comment type

n Normalized pileup output file (Normalized pileup output file., type = tsv.gz,tsv) FILE
p Promoter fasta file (Promoter fastA file, type = fa,fasta) FILE
outdir The output directory, defaults to the current working directory (.) STRING

Example:

java -jar EpiTALEcli-0.1.jar pile2prom n=<Normalized_pileup_output_file> p=<Promoter_fasta_file>


NarrowPeakConvertToPromoter

NarrowPeakConvertToPromoter converts the narrowPeak containing peaks of chromatin accessibility file to promoter coordinates.

The input of NarrowPeakConvertToPromoter is 1. a narrowPeak file and 2. the promoter sequences in FastA format with headers like: > id chromosomeName:start-end:strand e.g. > Os01g01010.1 Chr1:2602-3102:+.

If you experience problems using NarrowPeakConvertToPromoter, please contact us.


NarrowPeakConvertToPromoter may be called with

java -jar EpiTALEcli-0.1.jar peak2Prom

and has the following parameters

name comment type

n NarrowPeak file (Peak-calling output in narrowPeak format., type = narrowPeak,narrowPeak.gz) FILE
p Promoter fasta file (Promoter fastA file, type = fa,fasta) FILE
outdir The output directory, defaults to the current working directory (.) STRING

Example:

java -jar EpiTALEcli-0.1.jar peak2Prom n=<NarrowPeak_file> p=<Promoter_fasta_file>


EpiTALE prediction

EpiTALE predicts TALE target boxes using a novel model learned from quantitative data based on the RVD sequence of a TALE and optionally considers the methylation state of the target box during prediction, as DNA methylation affects the binding specificity of RVDs. Additionally, EpiTALE optionally annotates chromatin accessibility of predicted target sites using output of the NormalizePileupOutput tool and result of peak-calling of DNase-seq and ATAC-seq data to the predictions of EpiTALE.

As input, EpiTALE requires

1. a set of sequences that are scanned for putative TALE target boxes. These sequences could be promoters of genes but also complete genomic sequences (FastA format).

2. For computing p-values, EpiTALE additionally needs a background set of sequences, which is by default generated as a sub-sample of the original input data.

3. The prediction threshold may be defined either by means of a p-values or an approximate number of expected sites. The latter will also be converted to a p-value, internally, and the defined number of expected sites in not met exactly, in general.

4. TALEs are specified by a FastA file containing their RVD sequences, where individual RVDs are separated by dashes (-). This is the same format also output by the TALE Analysis tool of AnnoTALE.

5. It can be specified if both strands or only one of the strands are scanned where, in the former case, a penalty may be assigned to predictions on the reverse strand. While this penalty may be reasonable when scanning promoters, it should usually be set to 0 in case of genome-wide predictions.

6. As optional input EpiTALE considers methylation during prediction, if Bismark output is provided. With Bismark methylation extractor with parameters –bedGraph –CX -p you can generate a coverage file, which contains the tab-separated columns: chromosome, start_position, end_position, methylation_percentage, count_methylated, count_unmethylated (file.cov.gz). You can alternatively use the tool Bed2Bismark, which converts data in BedMethyl format to Bismark format.

7. (i) The chromatin accessibility of the input sequences can optionally be provided in narrowPeak format. By mapping ATAC-seq or DNase-seq data to the corresponding genome and then performing peak calling, e.g. with JAMM. In case of promoter sequences as input, you should run the tool NarrowPeakConvertToPromoter to convert the narrowPeak-File to promoter positions. (ii) Additionally, you can calculate a coverage pileup of 5' ends of mapped reads with Chromatin pileup and normalize it with NormalizePileupOutput. In case of promoter sequences as input, you should run the tool PileupConvertToPromoter to convert to promoter coordinates.

8. (i) In case of genomic search the parameter calculate coverage area should be surround target site and you can set the number of positions before target site with coverage before value (default: 300) and the positions after target site coverage after value (default: 200). (ii) In case of promoter search the parameter calculate coverage area may set to on complete sequence or surround target site. The number of positions before and after binding site in peak profile can be set by Peak before value (default: 300) and Peak after value (default: 50).

In case of genomic search you can filter predictions of TALE target boxes by the presence of differentially expressed regions in a defined vicinity around a predicted target box. with the tool DerTALE of AnnoTALE suite.

If you experience problems using EpiTALE, please contact us.




EpiTALE prediction may be called with

java -jar EpiTALEcli-0.1.jar epitale

and has the following parameters

name comment type

s Sequences (The sequences (e.g., a genome) to scan for binding sites, type = fa,fas,fasta) FILE
b Background sample (The sequences for determining the prediction threshold. Either a sub-sample of the input sequences or a dedicated background data set., range={sub-sample, background sequences}, default = sub-sample) STRING
No parameters for selection "sub-sample"
Parameters for selection "background sequences":
bs Background sequences (The sequences (e.g., a genome) for determining the prediction threshold, type = fa,fas,fasta) FILE
t Threshold specification (The way of defining the prediction threshold. Either by explicitly defining a significance level or by specifying the number of expected sites, range={significance level, number of sites}, default = significance level) STRING
Parameters for selection "significance level":
sl Significance level (The significance level for determining the prediction threshold, valid range = [0.0, 0.01], default = 1.0E-4) DOUBLE
Parameters for selection "number of sites":
n Number of sites (The number of expected binding sites for determining the prediction threshold, valid range = [1, 1000000], default = 10000) INT
TALEs TALEs (The RVD sequences of the TALE, separated by dashes, in FastA format, type = fasta,fas,fa) FILE
Strand Strand (Prediction target sites on both strands, or the forward or reverse strand, range={both strands, forward strand, reverse strand}, default = both strands) STRING
Parameters for selection "both strands":
r Reverse penalty (Penalty for predictions on the reverse strand, valid range = [0.0, 1.7976931348623157E308], default = 0.01) DOUBLE
No parameters for selection "forward strand"
No parameters for selection "reverse strand"
bf Bismark file (The bedGraph output of bismark (file.cov.gz) containig <chromosome> <start position> <end position> <methylation percentage> <count methylated> <count unmethylated>, type = cov,cov.gz, OPTIONAL) FILE
nf NarrowPeak file (The output of a peak caller (all.peaks.narrowPeak), type = narrowPeak,narrowPeak.gz, OPTIONAL) FILE
npo Normalized pileup output (The normalized output of pileup with values larger than zero (file.txt) containig <chromosome> <position> <coverage>, type = tsv,tsv.gz, OPTIONAL) FILE
c Calculate coverage area (Calculate coverage area surround target site, or on complete sequence, range={surround target site, on complete sequence}, default = surround target site, OPTIONAL) STRING
Parameters for selection "surround target site":
cbv Coverage before value (Number of positions before target site in coverage profile, valid range = [1, 500], default = 300, OPTIONAL) INT
cav Coverage after value (Number of positions after target site in coverage profile, valid range = [1, 500], default = 200, OPTIONAL) INT
No parameters for selection "on complete sequence"
p Peak before value (Number of positions before target site in peak profile, valid range = [1, 500], default = 300, OPTIONAL) INT
pav Peak after value (Number of positions after target site in peak profile, valid range = [1, 500], default = 50, OPTIONAL) INT
outdir The output directory, defaults to the current working directory (.) STRING

Example:

java -jar EpiTALEcli-0.1.jar epitale s=<Sequences> TALEs=<TALEs>