TALgetter: Difference between revisions

From Jstacs
Jump to navigationJump to search
Line 84: Line 84:


== Installing the web-application ==
== Installing the web-application ==
The command-line program behind the web-application is a Jar as well, so Java is required on the server running Galaxy.
To install this command line program in Galaxy, copy it to the desired destination in the Galaxy <code>tools</code> directory.
The command line application writes its Galaxy tool definition file itself. If you are in the directory containing the command-line program for Galaxy, you can create the tool definition file by calling
<code>java -jar TALgetterWeb.jar --create TALgetterWeb.xml</code>
Afterwards, this directory contains the tool definition file <code>TALgetterWeb.xml</code>. Now you can register TALgetter in the Galaxy <code>tool_conf.xml</code> file. For details, see the [http://wiki.g2.bx.psu.edu/Admin/Tools/Add%20Tool%20Tutorial Galaxy tutorial for adding new tools].

Revision as of 21:05, 15 August 2012

TALgetter allows you to scan input sequences for putative target sites of a given TAL (transcription activator like) effector as typically expressed by many Xanthomonas bacteria. TALgetter uses a local mixture model, which assumes that the nucleotide at each position of a putative target site may either be determined by the binding specificity of the RVD at that position (if binding occurs at that position) or by the genomic context (if no binding occurs). Binding specificities and importance of the individual RVDs has been trained on known TAL effector - target site pairs. Nucleotide preferences of the genomic context are learned from (putative) promoter sequences of A. thaliana and O. sativa.

Web-application

TALgetter is available as a web-application at galaxy.informatik.uni-halle.de:8976. Here, you can also download a command line application that is easily scriptable.

Download

TALgetter is implemented in Java using Jstacs. Here, can download the Jar of the command line application. In addition, we provide the Jar of the Galaxy web-application for installing it in your local Galaxy server.

Running the command line application

For running the command line application, Java v1.6 or later is required.

The arguments of the command line application have the following meaning:

name comment type

input Input sequences (The sequences to scan for TAL binding sites, FastA) String
tal TAL sequence (Sequence of RVDs, seperated by '-', default = NI-HD-HD-NG-NN-NK-NK) String
fp First position (First position (counted from 5' end) considered for search, default = 0) Integer
do Downstream offset (Number of positions counted from 3' end that are not considered, default = 0) Integer
top Top N (Limit the number of reported hits in all input sequences to at most N, valid range = [1, 10000], default = 100) Integer
pval PVals (Computation of p-Values, range={NONE, COARSE, FINE}, default = COARSE) {NONE, COARSE, FINE}
pthresh p-Value (Filter the reported hits by a maximum p-Value. A value of 0 or 1 switches off the filter., valid range = [0.0, 1.0], default = 1.0) Double
model Model type (TALgetter is the default model that uses individual binding specificities for each RVD. TALgetter13 uses binding specificities that only depend on amino acid 13, i.e., the second amino acid of the repat.While TALgetter is recommended in most cases, the use of TALgetter13 may be beneficial if you search for target sites of TAL effector with many rare RVDs, for instance YG, HH, or S*., range={TALgetter, TALgetter13}, default = TALgetter) {TALgetter, TALgetter13}
train Training sequences (The sequence to use for training the model, annotated FastA, OPTIONAL) String

If, for instance, you want to scan the FastA-file path/to/myPromoters.fa for the top 100 target sites of the TAL effector Talc, you start TALgetter with

java -jar TALgetter.jar input=path/to/myPromoters.fa tal="NS-NG-NS-HD-NI-NG-NN-NG-HD-NI-NN-N*-NI-NN-HD-NG-NI-NN-N*-HD-NN-NG"

Optionally, you can also train the TALgetter model using your custom training data. Here, we provide an example file of input sequence. Basically, the input format is an annotated FastA-File of the form

>seq:<RVD-sequence>; weight: <w>
<DNA-sequence including position 0>
...

for instance:

>seq:NI-NG-NN-NN-NI-HD-HD-NN-NG-NN-NG; weight:0.0476190476190476
TATGGACCGTGT

The specification of the weight is optional.

Installing the web-application

The command-line program behind the web-application is a Jar as well, so Java is required on the server running Galaxy. To install this command line program in Galaxy, copy it to the desired destination in the Galaxy tools directory.

The command line application writes its Galaxy tool definition file itself. If you are in the directory containing the command-line program for Galaxy, you can create the tool definition file by calling

java -jar TALgetterWeb.jar --create TALgetterWeb.xml

Afterwards, this directory contains the tool definition file TALgetterWeb.xml. Now you can register TALgetter in the Galaxy tool_conf.xml file. For details, see the Galaxy tutorial for adding new tools.