PrediTALE

From Jstacs
Revision as of 11:59, 16 January 2019 by Grau (talk | contribs)
Jump to navigationJump to search

PrediTALE predicts TALE target boxes using a novel model learned from quantitative data based on the RVD sequence of a TALE. A pre-print describing the method behind PrediTALE and comparing its performance to other tools for TALE target prediction is available from biorxiv (doi:). In addition to PrediTALE, we also provide DerTALE, a tool for filtering genome-wide target site predictions by mapped RNA-seq data after Xanthomonas infection. Both tools are described in more detail below.

PrediTALE and DerTALE are available as a command line application, but have also been integrated into AnnoTALE, which is available with a graphical user interface.

PrediTALE is also available as a web-application at http://galaxy.informatik.uni-halle.de.

Command line tool

PrediTALE is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.

PrediTALE and DerTALE are packaged in one runnable JAR that may be run from the command line with

java -jar PrediTALE.jar

which lists the tools available and usage information

Available tools:

	preditale - PrediTALE
	dertale - DerTALE

Syntax: java -jar PrediTALE.jar <toolname> [<parameter=value> ...]

Further info about the tools is given with
	java -jar PrediTALE.jar <toolname> info

Tool parameters are listed with
	java -jar PrediTALE.jar <toolname>

You get a list of the tool parameters by calling PrediTALE.jar with the corresponding tool name, e.g.,

java -jar PrediTALE.jar preditale

The meaning of the individual tool parameters is described below.

Source code

Source code of PrediTALE and DerTALE is available from github.

PrediTALE

As input, PrediTALE requires a set of sequences that are scanned for putative TALE target boxes. These sequences could be promoters of genes but also complete genomic sequences (FastA format). For computing p-values, PrediTALE additional needs a background set of sequences, which is by default generated as a sub-sample of the original input data. The prediction threshold may be defined either by means of a p-values or an approximate number of expected sites. The latter will also be converted to a p-value, internally, and the defined number of expected sites in not met exactly, in general. TALEs are specified by a FastA file containing their RVD sequences, where individual RVDs are separated by dashes (-). This is the same format also output by the *TALE Analysis* tool of AnnoTALE. Finally, it can be specified if both strands or only one of the strands are scanned where, in the former case, a penalty may be assigned to predictions on the reverse strand. While this penalty may be reasonable when scanning promoters, it should usually be set to ``0`` in case of genome-wide predictions.

DerTALE

As input, DerTALE requires a list of target box predictions as generated by the Predict and Intersect Targets tool of AnnotALE or by PrediTALE. Besides, DerTALE also accepts prediction outputs of other tools like TALE-NT or Talvez.

For determining differentially expressed regions, DerTALE also needs mapped RNA-seq data after Xanthomonas infection (treatment) and control in BAM format, which is the standard output format of most mappers, and may be generated from the SAM format using samtools. For each BAM file, DerTALE also needs an index file with the same base name as the BAM file but additional extension .ba (as generated by samtools).

Further parameters that can be specified include the number of predictions in the list that are considered (counting from top), the width of the region in which differential expression is considered, the width of the window that needs to be differentially expressed, a pseudo count on the count profile, the measure for comparing replicated, and a threshold on the log (base 2) differential abundance (e.g., 1 for a two-fold induction).