TALENoffer

From Jstacs
Jump to navigationJump to search

by Jan Grau, Jens Boch, and Stefan Posch.

TALENoffer is a tool for genome-wide prediction of TAL effector nuclease (TALEN) off-target sites. TALENoffer is based on the same statistical model as TALgetter and features a substantially improved runtime, which allows for scanning complete genomes for TALEN off-target sites within a few minutes.

We provide TALENoffer as a public web-server, a web-application that can be installed in a local Galaxy server, and as a command line program.

Paper

If you use TALENoffer, please cite

J. Grau, J. Boch, and S. Posch. TALENoffer: genome-wide TALEN off-target prediction. Bioinformatics, 2013, doi: 10.1093/bioinformatics/btt501.

TALENoffer web-server

TALENoffer is available as a public web-server at galaxy.informatik.uni-halle.de.

Download

TALENoffer is implemented in Java using Jstacs. You can download the command line application as a Jar. In addition, we provide the Jar of the Galaxy web-application for installing it in your local Galaxy server.

TALENoffer will be part of the next public release of the Jstacs library. As (future) part of Jstacs, TALENoffer will be released under GPL 3.

Running the command line application

For running the command line application, Java v1.6 or later is required.

The arguments of the command line application have the following meaning:


name comment type

input Input sequences (The sequences to scan for TALEN targets, FastA) String
annotation Annotation file (A file containing genomic annotations (e.g., genes, mRNAs, exons) in GFF, GTF, or UCSC known genes BED format, OPTIONAL) String
rvdl First RVD sequence (The sequence of RVDs of the first TALEN monomer, seperated by '-', default = NI-HD-HD-NG-NN-NK-NK) String
rvdr Second RVD sequence (The sequence of RVDs of the second TALEN monomer, seperated by '-', default = NI-HD-HD-NG-NN-NK-NK) String
nterml N-Terminal first (For the first RVD sequence, consider the architecture, where the endonuclease domain is used to the N-terminus instead of the standard C-terminal architecture, default = false) Boolean
ntermr N-Terminal second (For the second RVD sequence, consider the architecture, where the endonuclease domain is used to the N-terminus instead of the standard C-terminal architecture, default = false) Boolean
heterodimers Hetero-dimers only (Consider only hetero-dimers of TALEN monomers instead of the standard search for TALEN hetero and homo-dimers, default = false) Boolean
min Minimum distance (Minimum distance between TALEN monomer target sites, valid range = [0, 100], default = 12) Integer
max Maximum distance (Maximum distance between TALEN monomer target sites, valid range = [0, 100], default = 24) Integer
model Model type (TALgetter is the default model that uses individual binding specificities for each RVD. TALgetter13 uses binding specificities that only depend on amino acid 13, i.e., the second amino acid of the repat.While TALgetter is recommended in most cases, the use of TALgetter13 may be beneficial if you search for target sites of TAL effector with many rare RVDs, for instance YG, HH, or S*., range={TALgetter, TALgetter13}, default = TALgetter) String
addrvds RVD specificities (File defining additional or overriding existing RVD specificities, Example file setting specificity of position 0 to T, defining new specificities for NG and HG, and introducing a new RVD XX, OPTIONAL) String
filter Filter (Filter off-targets using different thresholds on the score relative to the best- matching site. Typical values are Loose (q=0.35), Medium-Loose (q=0.375), Medium (q=0.4), Medium-Strict (q=0.45), Strict (q=0.5), valid range = [0.35, 1.0], default = 0.4) String
top Maximum number of targets (Limits the total number of reported targets in all input sequences, ranked by their score, valid range = [1, 100000], default = 100) Integer
out Additional output (Path to a GFF3/GFF2 file to which predictions are written in addition to the default output, extension defines format (.gff3/.gff), OPTIONAL) String
numThreads Number of threads (Number of threads used by TALoffer. More than 3 threads typically do not lead to an additional speed-up., valid range = [1, 8], default = 3) Integer

For instance, for scanning the FastA-file path/to/myGenome.fa for the top 100 off-target sites of the two TALENs with RVD sequences NN-HD-HD-NI-NN-NG-NN-NG-HD-HD-NG-HD-NI-HD-NG and NN-NG-HD-HD-NG-HD-HD-NI-HD-NI-NI-NN-HD-HD-NG with a distance between monomer target sites of 10 to 20 bp, you start TALENoffer with

java -jar TALENoffer.jar input=path/to/myGenome.fa rvdl="NN-HD-HD-NI-NN-NG-NN-NG-HD-HD-NG-HD-NI-HD-NG" \
rvdr="NN-NG-HD-HD-NG-HD-HD-NI-HD-NI-NI-NN-HD-HD-NG" min=10 max=20 top=100

If you analyze large data sets, for instance complete mammalian genomes, TALENoffer may require a larger amount of memory than is the default in Java. You can increase the memory available to TALENoffer by additional parameters to the Java virtual machine. If you want to start TALENoffer with 512 MB of memory initially, which may be increased to at most 2 GB during the TALENoffer execution, you call

java -Xms512M -Xmx2G -jar TALENoffer.jar input=path/to/myGenome.fa rvdl="NN-HD-HD-NI-NN-NG-NN-NG-HD-HD-NG-HD-NI-HD-NG" \
rvdr="NN-NG-HD-HD-NG-HD-HD-NI-HD-NI-NI-NN-HD-HD-NG" min=10 max=20 top=100

Installing the web-application

The command-line program behind the web-application is a Jar as well, so Java is required on the server running Galaxy. To install this command line program in Galaxy, copy it to the desired destination in the Galaxy tools directory.

The command line application writes its Galaxy tool definition file itself. If you are in the directory containing the command-line program for Galaxy, you can create the tool definition file by calling

java -jar TALENofferWeb.jar --create TALENofferWeb.xml

Afterwards, this directory contains the tool definition file TALENofferWeb.xml. Now you can register TALENoffer in the Galaxy tool_conf.xml file. For details, see the Galaxy tutorial for adding new tools.