Dessert: Alignments, Utils, and goodies: Difference between revisions

From Jstacs
Jump to navigationJump to search
(Created page with " In this section, we present a motley composition of interesting classes of Jstacs. == Alignments == In this subsection, we present how to compute Alignments using Jstacs. I...")
 
No edit summary
 
Line 1: Line 1:
 
<span id="goodies"> </span>
 


In this section, we present a motley composition of interesting classes of Jstacs.
In this section, we present a motley composition of interesting classes of Jstacs.
Line 12: Line 11:


<source lang="java5" enclose="div">
<source lang="java5" enclose="div">
Costs costs = new SimpleCosts( 0, 1, 0.5 );
Costs costs = new SimpleCosts( 0, 1, 0.5 );
</source>
</source>


Line 20: Line 19:


<source lang="java5" enclose="div">
<source lang="java5" enclose="div">
Alignment align = new Alignment( AlignmentType.GLOBAL, costs );
Alignment align = new Alignment( AlignmentType.GLOBAL, costs );
</source>
</source>


Line 30: Line 29:


<source lang="java5" enclose="div">
<source lang="java5" enclose="div">
System.out.println( align.getAlignment( seq1, seq2 ) );
System.out.println( align.getAlignment( seq1, seq2 ) );
</source>
</source>


Line 40: Line 39:


<source lang="java5" enclose="div">
<source lang="java5" enclose="div">
 
costs = new AffineCosts( 1, costs );
costs = new AffineCosts( 1, costs );
align = new Alignment( AlignmentType.GLOBAL, costs );
align = new Alignment( AlignmentType.GLOBAL, costs );
System.out.println( align.getAlignment( seq1, seq2 ) );
System.out.println( align.getAlignment( seq1, seq2 ) );
</source>
</source>


Line 56: Line 55:


<source lang="java5" enclose="div">
<source lang="java5" enclose="div">
REnvironment re = new REnvironment();
REnvironment re = new REnvironment();
</source>
</source>


Line 66: Line 65:


<source lang="java5" enclose="div">
<source lang="java5" enclose="div">
re.createVector( "values", values );
re.createVector( "values", values );
</source>
</source>


Line 74: Line 73:


<source lang="java5" enclose="div">
<source lang="java5" enclose="div">
re.voidEval( "values=rnorm(length(values));" );
re.voidEval( "values=rnorm(length(values));" );
</source>
</source>


Line 82: Line 81:


<source lang="java5" enclose="div">
<source lang="java5" enclose="div">
re.plotToPDF( "plot(values,t=\"l\");", "values.pdf", true );
re.plotToPDF( "plot(values,t=\"l\");", "values.pdf", true );
</source>
</source>


Line 94: Line 93:


<source lang="java5" enclose="div">
<source lang="java5" enclose="div">
double[][] twodim = new double[5][5];
double[][] twodim = new double[5][5];
</source>
</source>


Line 102: Line 101:


<source lang="java5" enclose="div">
<source lang="java5" enclose="div">
double[][] clone = ArrayHandler.clone( twodim );
double[][] clone = ArrayHandler.clone( twodim );
</source>
</source>


Line 110: Line 109:


<source lang="java5" enclose="div">
<source lang="java5" enclose="div">
TrainableStatisticalModel pwm = TrainableStatisticalModelFactory.createPWM( DNAAlphabetContainer.SINGLETON, 10, 4.0 );
TrainableStatisticalModel pwm = TrainableStatisticalModelFactory.createPWM( DNAAlphabetContainer.SINGLETON, 10, 4.0 );
TrainableStatisticalModel[] models = ArrayHandler.createArrayOf( pwm, 10 );
TrainableStatisticalModel[] models = ArrayHandler.createArrayOf( pwm, 10 );
</source>
</source>


Line 119: Line 118:


<source lang="java5" enclose="div">
<source lang="java5" enclose="div">
Object[] m = new Object[]{
Object[] m = new Object[]{
    TrainableStatisticalModelFactory.createPWM( DNAAlphabetContainer.SINGLETON, 10, 4.0 ),
    TrainableStatisticalModelFactory.createPWM( DNAAlphabetContainer.SINGLETON, 10, 4.0 ),
    TrainableStatisticalModelFactory.createHomogeneousMarkovModel( DNAAlphabetContainer.SINGLETON, 40.0, (byte)0 )
    TrainableStatisticalModelFactory.createHomogeneousMarkovModel( DNAAlphabetContainer.SINGLETON, 40.0, (byte)0 )
};
};
TrainableStatisticalModel[] sms = ArrayHandler.cast( TrainableStatisticalModel.class, models );
TrainableStatisticalModel[] sms = ArrayHandler.cast( TrainableStatisticalModel.class, models );
</source>
</source>


Line 133: Line 132:


<source lang="java5" enclose="div">
<source lang="java5" enclose="div">
double max = ToolBox.max( values );
double max = ToolBox.max( values );
</source>
</source>


Line 139: Line 138:


<source lang="java5" enclose="div">
<source lang="java5" enclose="div">
double sum = ToolBox.sum( values );
double sum = ToolBox.sum( values );
</source>
</source>


Line 145: Line 144:


<source lang="java5" enclose="div">
<source lang="java5" enclose="div">
int maxIndex = ToolBox.getMaxIndex( values );
int maxIndex = ToolBox.getMaxIndex( values );
</source>
</source>


Line 155: Line 154:


<source lang="java5" enclose="div">
<source lang="java5" enclose="div">
double logSum = Normalisation.getLogSum( values );
double logSum = Normalisation.getLogSum( values );
</source>
</source>


Line 163: Line 162:


<source lang="java5" enclose="div">
<source lang="java5" enclose="div">
Normalisation.logSumNormalisation( values );
Normalisation.logSumNormalisation( values );
</source>
</source>


Line 171: Line 170:


<source lang="java5" enclose="div">
<source lang="java5" enclose="div">
Normalisation.sumNormalisation( values );
Normalisation.sumNormalisation( values );
</source>
</source>


Line 180: Line 179:


<source lang="java5" enclose="div">
<source lang="java5" enclose="div">
OutputStream stream = SafeOutputStream.getSafeOutputStream( System.out );
OutputStream stream = SafeOutputStream.getSafeOutputStream( System.out );
</source>
</source>


Line 188: Line 187:


<source lang="java5" enclose="div">
<source lang="java5" enclose="div">
LinkedList<Class<? extends TrainableStatisticalModel>> list = SubclassFinder.findInstantiableSubclasses( TrainableStatisticalModel.class, "de.jstacs" );
LinkedList<Class<? extends TrainableStatisticalModel>> list = SubclassFinder.findInstantiableSubclasses( TrainableStatisticalModel.class, "de.jstacs" );
</source>
</source>


and obtain a linked list containing all such classes. Other methods in [http://www.jstacs.de/api-2.0//de/jstacs/utils/SubclassFinder.html SubclassFinder] allow for searching for general sub-types including interfaces and abstract classes, or for filtering the results by further required interfaces.
and obtain a linked list containing all such classes. Other methods in [http://www.jstacs.de/api-2.0//de/jstacs/utils/SubclassFinder.html SubclassFinder] allow for searching for general sub-types including interfaces and abstract classes, or for filtering the results by further required interfaces.

Latest revision as of 14:26, 2 February 2012

In this section, we present a motley composition of interesting classes of Jstacs.

Alignments

In this subsection, we present how to compute Alignments using Jstacs.

If we like to compute an alignment, we first have to define the costs for match, mismatch, and gaps. In Jstacs, we provide the interface Costs that declares all necessary method used during the alignment. In this example, we restrict to simple costs that are 0 for match, 1 for match, 0.5 for a gap.


Costs costs = new SimpleCosts( 0, 1, 0.5 );


Second, we have to provide an instance of Alignment. This instance contains all information needed for an alignment and stores for instance matrices used for dynamic programming. When creating an instance, we have to specify which kind of alignment we like to have. Jstacs supports local, global and semi-global alignments (cf. Alignment.AlignmentType).


Alignment align = new Alignment( AlignmentType.GLOBAL, costs );


In second constructor it is also possible to specify the number of off-diagonals to be used in the alignment leading to a potential speedup.

Finally, we can compute the optimal alignment between two Sequence s and write the result to the standard output.


System.out.println( align.getAlignment( seq1, seq2 ) );


The alignment instance can be reused for aligning further sequences.

In Jstacs, we also provide the possibility of computing optimal alignments with affine gap costs. For this reason, we implement the class AffineCosts that is used to specify the cost for a gap opening. The costs for gap elongation are given by the gap costs of the internally used instance of Costs.


costs = new AffineCosts( 1, costs );
align = new Alignment( AlignmentType.GLOBAL, costs );
System.out.println( align.getAlignment( seq1, seq2 ) );


REnvironment: Connection to R

In this subsection, we show how to access R (cf. http://www.r-project.org/) from Jstacs. R is a project for statistical computing that allows for performing complex computations and creating nice plots.

In some cases, it is reasonable to use R from within Jstacs. To do so, we have to create a connection to R. We utilize the package Rserve (cf. http://www.rforge.net/Rserve/) of R that allows to communicate between Java and R. Having a running instance of Rserve, we can create a connection via


REnvironment re = new REnvironment();


However, in some cases we have to specify the login name, a password, and the port for the communication which is possible via alternative constructors.

Now, we are able to do diverse things in R. Here, we only present three methods, but REnvironment provides more functionality. First, we copy an array of doubles from Java to R


re.createVector( "values", values );


and second, we modify it


re.voidEval( "values=rnorm(length(values));" );


Finally, the REnvironment allows to create plots as PDF, TeX, or BufferedImage.


re.plotToPDF( "plot(values,t=\"l\");", "values.pdf", true );


ArrayHandler: Handling arrays

In this subsection, we present a way to easily handle arrays in Java, i.e., to cast, clone, and create arrays with elements of generic type. To this end, we implement the class ArrayHandler in Jstacs.

Let's assume we have a two dimensional array of either primitives of some Java class and we like to create a deep clone as it is necessary for member fields in clone methods.


double[][] twodim = new double[5][5];


Traditionally, we would have to implement for-loops to do so. However, the ArrayHandler implements this functionality in a generic manner providing one method for this purpose.


double[][] clone = ArrayHandler.clone( twodim );


A second use case, is the creation of arrays, where each and every entry is a clone of some instance.


TrainableStatisticalModel pwm = TrainableStatisticalModelFactory.createPWM( DNAAlphabetContainer.SINGLETON, 10, 4.0 );
TrainableStatisticalModel[] models = ArrayHandler.createArrayOf( pwm, 10 );


The third use case is to cast an array. Even if all elements of the array are from the same class, the component type of the array might be different (some super class). A simple cast will fail in those cases. However, the ArrayHandler provides two methods for casting arrays. Here, we present the more important method, which allows to specify the array component type and performs the cast operation.


Object[] m = new Object[]{
    TrainableStatisticalModelFactory.createPWM( DNAAlphabetContainer.SINGLETON, 10, 4.0 ),
    TrainableStatisticalModelFactory.createHomogeneousMarkovModel( DNAAlphabetContainer.SINGLETON, 40.0, (byte)0 )
};
TrainableStatisticalModel[] sms = ArrayHandler.cast( TrainableStatisticalModel.class, models );


ToolBox

The class ToolBox contains several static methods for recurring tasks. For example, you can compute the maximum of an array of doubles

double max = ToolBox.max( values );

or the sum of the values

double sum = ToolBox.sum( values );

or you can obtain the index of the first maximum value in the provided array

int maxIndex = ToolBox.getMaxIndex( values );


Normalisation

Another frequently needed functionality is the handling of log-values. Assume that we have an array values containing a number of log-probabilities [math]l_i[/math]. What we want to compute is the logarithm of the sum of the original probabilities, i.e., [math]\log(\sum_i \exp(l_i))[/math]. The naive computation of this sum often results in numerical problems, especially, if the original probabilities are very different. A more exact solution is provided by the static method getLogSum of the class Normalisation, which can be accessed by calling

double logSum = Normalisation.getLogSum( values );

Of course, this method does not only work for probabilities, but for general log-values.

Sometimes, we also want to normalize the given probabilities. That means, given the log-probabilities [math]l_i[/math], we want to obtain normalized probabilities [math]p_i = \frac{\exp(l_i)}{\sum_j \exp(l_j)}[/math]. This normalization is performed by calling

Normalisation.logSumNormalisation( values );

and after the call, the array values contains the normalized probabilities (not log-probabilities!).

Finally, we might want to do the same for probabilities, i.e. given probabilities [math]q_i[/math] in an array values, we want to compute [math]p_i = \frac{q_i}{\sum_j q_j}[/math] using

Normalisation.sumNormalisation( values );

A typical application for the last two methods are (log) joint probabilities that we want use to compute conditional probabilities by dividing by a marginal probability.

Goodies

The class SafeOutputStream is a simple way to switch between writing outputs of a program to standard out, to a file, or to completely suppress output. This class is basically a wrapper for other output streams that can handle null values. You can create a SafeOutputStream writing to standard out with

OutputStream stream = SafeOutputStream.getSafeOutputStream( System.out );

If you provided null to the factory method instead, output would be suppressed, while no modifications in code using this SafeOutputStream †would be necessary.

Finally, the class SubclassFinder can be used to search for all subclasses of a given class in a specified package and its sub-packages. For example, if we want to find all concrete sub-classes of TrainableStatisticalModel, i.e., classes that are not abstract and can be instantiated, in all sub-packages of de.jstacs, we call

LinkedList<Class<? extends TrainableStatisticalModel>> list = SubclassFinder.findInstantiableSubclasses( TrainableStatisticalModel.class, "de.jstacs" );

and obtain a linked list containing all such classes. Other methods in SubclassFinder allow for searching for general sub-types including interfaces and abstract classes, or for filtering the results by further required interfaces.