de.jstacs.sequenceScores.statisticalModels.trainable.discrete.inhomogeneous
Class MEManager

java.lang.Object
  extended by de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
      extended by de.jstacs.sequenceScores.statisticalModels.trainable.discrete.DiscreteGraphicalTrainSM
          extended by de.jstacs.sequenceScores.statisticalModels.trainable.discrete.inhomogeneous.InhomogeneousDGTrainSM
              extended by de.jstacs.sequenceScores.statisticalModels.trainable.discrete.inhomogeneous.MEManager
All Implemented Interfaces:
InstantiableFromParameterSet, SequenceScore, StatisticalModel, TrainableStatisticalModel, Storable, Cloneable
Direct Known Subclasses:
FSMEManager

public abstract class MEManager
extends InhomogeneousDGTrainSM

This class is the super class for all maximum entropy models

Author:
Jens Keilwagen

Field Summary
protected  MEM[] factors
          The independent maximum entropy models.
 
Fields inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.discrete.inhomogeneous.InhomogeneousDGTrainSM
DEFAULT_STREAM, sostream
 
Fields inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.discrete.DiscreteGraphicalTrainSM
params, trained
 
Fields inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
alphabets, length
 
Constructor Summary
MEManager(MEManagerParameterSet params)
          Creates a new MEManager from a given MEManagerParameterSet.
MEManager(StringBuffer stringBuff)
          The standard constructor for the interface Storable.
 
Method Summary
 MEManager clone()
          Follows the conventions of Object's clone()-method.
 DataSet emitDataSet(int n, int... lengths)
          This method returns a DataSet object containing artificial sequence(s).
protected  MEM[] getFactors(ArrayList<int[]> list, boolean reduce, ConstraintManager.Decomposition decomposition)
          This method returns an array of independent maximum entropy models parsed from the given constraints.
protected  MEM[] getFactors(String constraints, boolean reduce, ConstraintManager.Decomposition decomposition)
          This method returns an array of independent maximum entropy models parsed from the given constraints.
protected  StringBuffer getFurtherModelInfos()
          Returns further model information as a StringBuffer.
 double getLogPriorTerm()
          Returns a value that is proportional to the log of the prior.
 double getLogProbFor(Sequence sequence, int startpos, int endpos)
          Returns the logarithm of the probability of (a part of) the given sequence given the model.
 NumericalResultSet getNumericalCharacteristics()
          Returns the subset of numerical values that are also returned by SequenceScore.getCharacteristics().
 String getStructure()
          Returns a String representation of the underlying graph.
protected  void setFurtherModelInfos(StringBuffer xml)
          This method replaces the internal model information with those from a StringBuffer.
 String toString(NumberFormat nf)
          This method returns a String representation of the instance.
protected  void trainFactors(DataSet data, double[] weights)
          This method trains the internal MEM array, i.e., it optimizes the parameters of the underlying MEMConstraints.
 
Methods inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.discrete.inhomogeneous.InhomogeneousDGTrainSM
check, set, setOutputStream
 
Methods inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.discrete.DiscreteGraphicalTrainSM
fromXML, getCurrentParameterSet, getDescription, getESS, getXMLTag, isInitialized, toXML
 
Methods inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
getAlphabetContainer, getCharacteristics, getLength, getLogProbFor, getLogProbFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getMaximalMarkovOrder, toString, train
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel
train
 
Methods inherited from interface de.jstacs.sequenceScores.SequenceScore
getInstanceName
 

Field Detail

factors

protected MEM[] factors
The independent maximum entropy models.

Constructor Detail

MEManager

public MEManager(MEManagerParameterSet params)
          throws CloneNotSupportedException,
                 IllegalArgumentException,
                 NonParsableException
Creates a new MEManager from a given MEManagerParameterSet.

Parameters:
params - the given parameter set
Throws:
CloneNotSupportedException - if the parameter set could not be cloned
IllegalArgumentException - if the parameter set is not instantiated
NonParsableException - if the parameter set is not parsable
See Also:
InhomogeneousDGTrainSM.InhomogeneousDGTrainSM(de.jstacs.sequenceScores.statisticalModels.trainable.discrete.inhomogeneous.parameters.IDGTrainSMParameterSet)

MEManager

public MEManager(StringBuffer stringBuff)
          throws NonParsableException
The standard constructor for the interface Storable. Creates a new MEManager out of its XML representation.

Parameters:
stringBuff - the XML representation as StringBuffer
Throws:
NonParsableException - if the MEManager could not be reconstructed out of the XML representation (the StringBuffer could not be parsed)
See Also:
Storable, InhomogeneousDGTrainSM.InhomogeneousDGTrainSM(StringBuffer)
Method Detail

clone

public MEManager clone()
                throws CloneNotSupportedException
Description copied from class: AbstractTrainableStatisticalModel
Follows the conventions of Object's clone()-method.

Specified by:
clone in interface SequenceScore
Specified by:
clone in interface TrainableStatisticalModel
Overrides:
clone in class InhomogeneousDGTrainSM
Returns:
an object, that is a copy of the current AbstractTrainableStatisticalModel (the member-AlphabetContainer isn't deeply cloned since it is assumed to be immutable). The type of the returned object is defined by the class X directly inherited from AbstractTrainableStatisticalModel. Hence X's clone()-method should work as:
1. Object o = (X)super.clone();
2. all additional member variables of o defined by X that are not of simple data-types like int, double, ... have to be deeply copied
3. return o
Throws:
CloneNotSupportedException - if something went wrong while cloning

emitDataSet

public DataSet emitDataSet(int n,
                           int... lengths)
                    throws NotTrainedException,
                           Exception
Description copied from interface: StatisticalModel
This method returns a DataSet object containing artificial sequence(s).

There are two different possibilities to create a data set for a model with length 0 (homogeneous models).
  1. emitDataSet( int n, int l ) should return a data set with n sequences of length l.
  2. emitDataSet( int n, int[] l ) should return a data set with n sequences which have a sequence length corresponding to the entry in the given array l.

There are two different possibilities to create a data set for a model with length greater than 0 (inhomogeneous models).
emitDataSet( int n ) and emitDataSet( int n, null ) should return a data set with n sequences of length of the model ( SequenceScore.getLength()).

The standard implementation throws an Exception.

Specified by:
emitDataSet in interface StatisticalModel
Overrides:
emitDataSet in class AbstractTrainableStatisticalModel
Parameters:
n - the number of sequences that should be contained in the returned data set
lengths - the length of the sequences for a homogeneous model; for an inhomogeneous model this parameter should be null or an array of size 0.
Returns:
a DataSet containing the artificial sequence(s)
Throws:
NotTrainedException - if the model is not trained yet
Exception - if the emission did not succeed
See Also:
DataSet

getLogPriorTerm

public double getLogPriorTerm()
                       throws Exception
Description copied from interface: StatisticalModel
Returns a value that is proportional to the log of the prior. For maximum likelihood (ML) 0 should be returned.

Returns:
a value that is proportional to the log of the prior
Throws:
Exception - if something went wrong

getLogProbFor

public double getLogProbFor(Sequence sequence,
                            int startpos,
                            int endpos)
                     throws NotTrainedException,
                            Exception
Description copied from interface: StatisticalModel
Returns the logarithm of the probability of (a part of) the given sequence given the model. If at least one random variable is continuous the value of density function is returned.

It extends the possibility given by the method StatisticalModel.getLogProbFor(Sequence, int) by the fact, that the model could be e.g. homogeneous and therefore the length of the sequences, whose probability should be returned, is not fixed. Additionally, the end position of the part of the given sequence is given and the probability of the part from position startpos to endpos (inclusive) should be returned.
The length and the alphabets define the type of data that can be modeled and therefore both has to be checked.

Parameters:
sequence - the given sequence
startpos - the start position within the given sequence
endpos - the last position to be taken into account
Returns:
the logarithm of the probability or the value of the density function of (the part of) the given sequence given the model
Throws:
NotTrainedException - if the model is not trained yet
Exception - if the sequence could not be handled (e.g. startpos > , endpos > sequence.length, ...) by the model

getNumericalCharacteristics

public NumericalResultSet getNumericalCharacteristics()
Description copied from interface: SequenceScore
Returns the subset of numerical values that are also returned by SequenceScore.getCharacteristics().

Returns:
the numerical characteristics of the current instance

getStructure

public String getStructure()
                    throws NotTrainedException
Description copied from class: InhomogeneousDGTrainSM
Returns a String representation of the underlying graph.

Specified by:
getStructure in class InhomogeneousDGTrainSM
Returns:
a String representation of the underlying graph
Throws:
NotTrainedException - if the structure is not set, this can only be the case if the model is not trained

toString

public String toString(NumberFormat nf)
Description copied from interface: SequenceScore
This method returns a String representation of the instance.

Specified by:
toString in interface SequenceScore
Overrides:
toString in class DiscreteGraphicalTrainSM
Parameters:
nf - the NumberFormat for the String representation of parameters or probabilities
Returns:
a String representation of the instance

getFactors

protected MEM[] getFactors(String constraints,
                           boolean reduce,
                           ConstraintManager.Decomposition decomposition)
This method returns an array of independent maximum entropy models parsed from the given constraints.

Parameters:
constraints - the constraints to build the maximum entropy model
reduce - a switch whether redundant constraint should be removed
decomposition - a switch how to decompose the complete model if possible
Returns:
an array of independent maximum entropy models

getFactors

protected MEM[] getFactors(ArrayList<int[]> list,
                           boolean reduce,
                           ConstraintManager.Decomposition decomposition)
This method returns an array of independent maximum entropy models parsed from the given constraints.

Parameters:
list - a list of positions arrays that build the constraints
reduce - a switch whether redundant constraint should be removed
decomposition - a switch how to decompose the complete model if possible
Returns:
an array of independent maximum entropy models

getFurtherModelInfos

protected StringBuffer getFurtherModelInfos()
Description copied from class: DiscreteGraphicalTrainSM
Returns further model information as a StringBuffer.

Specified by:
getFurtherModelInfos in class DiscreteGraphicalTrainSM
Returns:
further model information like parameters of the distribution etc. in XML format
See Also:
DiscreteGraphicalTrainSM.toXML()

trainFactors

protected void trainFactors(DataSet data,
                            double[] weights)
                     throws Exception
This method trains the internal MEM array, i.e., it optimizes the parameters of the underlying MEMConstraints.

Parameters:
data - the data
weights - the weights for the data, can be null
Throws:
Exception - if some error occurs in the training process

setFurtherModelInfos

protected void setFurtherModelInfos(StringBuffer xml)
                             throws NonParsableException
Description copied from class: DiscreteGraphicalTrainSM
This method replaces the internal model information with those from a StringBuffer.

Specified by:
setFurtherModelInfos in class DiscreteGraphicalTrainSM
Parameters:
xml - contains the model information like parameters of the distribution etc. in XML format
Throws:
NonParsableException - if the StringBuffer could not be parsed
See Also:
DiscreteGraphicalTrainSM.fromXML(StringBuffer)