public abstract class AbstractTrainableStatisticalModel extends Object implements Cloneable, Storable, TrainableStatisticalModel
StringBuffer
to or from a file (
fromXML(StringBuffer)
, Storable.toXML()
) you can use the class
FileManager
.FileManager
Modifier and Type | Field and Description |
---|---|
protected AlphabetContainer |
alphabets
The underlying alphabets
|
protected int |
length
The length of the sequences the model can classify.
|
Constructor and Description |
---|
AbstractTrainableStatisticalModel(AlphabetContainer alphabets,
int length)
|
AbstractTrainableStatisticalModel(StringBuffer stringBuff)
The standard constructor for the interface
Storable . |
Modifier and Type | Method and Description |
---|---|
protected void |
check(Sequence sequence,
int startpos,
int endpos)
This method checks all parameters before a probability can be computed for a sequence.
|
AbstractTrainableStatisticalModel |
clone()
Follows the conventions of
Object 's clone() -method. |
DataSet |
emitDataSet(int numberOfSequences,
int... seqLength)
This method returns a
DataSet object containing artificial
sequence(s). |
protected abstract void |
fromXML(StringBuffer xml)
This method should only be used by the constructor that works on a
StringBuffer . |
AlphabetContainer |
getAlphabetContainer()
Returns the container of alphabets that were used when constructing the instance.
|
ResultSet |
getCharacteristics()
Returns some information characterizing or describing the current
instance.
|
int |
getLength()
Returns the length of sequences this instance can score.
|
double |
getLogProbFor(Sequence sequence)
Returns the logarithm of the probability of the given sequence given the
model.
|
double |
getLogProbFor(Sequence sequence,
int startpos)
Returns the logarithm of the probability of (a part of) the given
sequence given the model.
|
double[] |
getLogScoreFor(DataSet data)
This method computes the logarithm of the scores of all sequences
in the given data set.
|
void |
getLogScoreFor(DataSet data,
double[] res)
This method computes and stores the logarithm of the scores for
any sequence in the data set in the given
double -array. |
double |
getLogScoreFor(Sequence sequence)
Returns the logarithmic score for the
Sequence seq . |
double |
getLogScoreFor(Sequence sequence,
int startpos)
|
double |
getLogScoreFor(Sequence sequence,
int startpos,
int endpos)
|
byte |
getMaximalMarkovOrder()
This method returns the maximal used Markov order, if possible.
|
String |
toString()
Should give a simple representation (text) of the model as
String . |
void |
train(DataSet data)
Trains the
TrainableStatisticalModel object given the data as DataSet . |
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
train
getLogPriorTerm, getLogProbFor
getInstanceName, getNumericalCharacteristics, isInitialized, toString
protected int length
protected AlphabetContainer alphabets
public AbstractTrainableStatisticalModel(AlphabetContainer alphabets, int length)
length
and
the AlphabetContainer
to alphabets
.
length
gives the length of the sequences the
model can classify. Models that can only classify sequences of defined
length are e.g. PWM or inhomogeneous Markov models. If the model can
classify sequences of arbitrary length, e.g. homogeneous Markov models,
this parameter must be set to 0 (zero).
length
and alphabets
define the type of
data that can be modeled and therefore both has to be checked before any
evaluation (e.g. getLogScoreFor(Sequence)
)alphabets
- the alphabets in an AlphabetContainer
length
- the length of the sequences a model can classify, 0 for
arbitrary lengthpublic AbstractTrainableStatisticalModel(StringBuffer stringBuff) throws NonParsableException
Storable
.
Creates a new AbstractTrainableStatisticalModel
out of a StringBuffer
.stringBuff
- the StringBuffer
to be parsedNonParsableException
- is thrown if the StringBuffer
could not be parsedpublic AbstractTrainableStatisticalModel clone() throws CloneNotSupportedException
Object
's clone()
-method.clone
in interface SequenceScore
clone
in interface TrainableStatisticalModel
clone
in class Object
AbstractTrainableStatisticalModel
(the member-AlphabetContainer
isn't deeply cloned since
it is assumed to be immutable). The type of the returned object
is defined by the class X
directly inherited from
AbstractTrainableStatisticalModel
. Hence X
's
clone()
-method should work as:Object o = (X)super.clone();
o
defined by
X
that are not of simple data-types like
int
, double
, ... have to be deeply
copied return o
CloneNotSupportedException
- if something went wrong while cloningpublic void train(DataSet data) throws Exception
TrainableStatisticalModel
TrainableStatisticalModel
object given the data as DataSet
. train(data1)
; train(data2)
should be a fully trained model over data2
and not over
data1+data2
. All parameters of the model were given by the
call of the constructor.train
in interface TrainableStatisticalModel
data
- the given sequences as DataSet
Exception
- if the training did not succeedDataSet.getElementAt(int)
,
DataSet.ElementEnumerator
public double getLogProbFor(Sequence sequence) throws Exception
StatisticalModel
length
and the alphabets
define the type of
data that can be modeled and therefore both has to be checked.getLogProbFor
in interface StatisticalModel
sequence
- the given sequence for which the logarithm of the
probability/the value of the density function should be
returnedException
- if the sequence could not be handled by the modelNotTrainedException
- if the model is not trained yetStatisticalModel.getLogProbFor(Sequence, int, int)
public double getLogProbFor(Sequence sequence, int startpos) throws Exception
StatisticalModel
startpos
. E.g. the fixed length is 12. The length
of the given sequence is 30 and the startpos
=15 the logarithm
of the probability of the part from position 15 to 26 (inclusive) given
the model should be returned.
length
and the alphabets
define the type of
data that can be modeled and therefore both has to be checked.getLogProbFor
in interface StatisticalModel
sequence
- the given sequencestartpos
- the start position within the given sequenceException
- if the sequence could not be handled by the modelNotTrainedException
- if the model is not trained yetStatisticalModel.getLogProbFor(Sequence, int, int)
protected void check(Sequence sequence, int startpos, int endpos) throws NotTrainedException, IllegalArgumentException
StatisticalModel.getLogProbFor(Sequence, int, int)
.sequence
- the given sequencestartpos
- the start position within the given sequenceendpos
- the last position to be taken into accountIllegalArgumentException
- if the sequence could not be handled (e.g.
startpos >
, endpos
> sequence.length
, ...) by the modelNotTrainedException
- if the model is not trained yetpublic double getLogScoreFor(Sequence sequence)
SequenceScore
Sequence
seq
.getLogScoreFor
in interface SequenceScore
sequence
- the sequencepublic double getLogScoreFor(Sequence sequence, int startpos)
SequenceScore
getLogScoreFor
in interface SequenceScore
sequence
- the Sequence
startpos
- the start position in the Sequence
Sequence
public double getLogScoreFor(Sequence sequence, int startpos, int endpos)
SequenceScore
getLogScoreFor
in interface SequenceScore
sequence
- the Sequence
startpos
- the start position in the Sequence
endpos
- the end position (inclusive) in the Sequence
Sequence
public double[] getLogScoreFor(DataSet data) throws Exception
SequenceScore
SequenceScore.getLogScoreFor(Sequence)
.getLogScoreFor
in interface SequenceScore
data
- the data set of sequencesException
- if something went wrongSequenceScore.getLogScoreFor(Sequence)
public void getLogScoreFor(DataSet data, double[] res) throws Exception
SequenceScore
double
-array.
SequenceScore.getLogScoreFor(Sequence)
.getLogScoreFor
in interface SequenceScore
data
- the data set of sequencesres
- the array for the results, has to have length
data.getNumberOfElements()
(which returns the
number of sequences in the data set)Exception
- if something went wrongSequenceScore.getLogScoreFor(Sequence)
,
SequenceScore.getLogScoreFor(DataSet)
public DataSet emitDataSet(int numberOfSequences, int... seqLength) throws NotTrainedException, Exception
StatisticalModel
DataSet
object containing artificial
sequence(s).
emitDataSet( int n, int l )
should return a data set with
n
sequences of length l
.
emitDataSet( int n, int[] l )
should return a data set with
n
sequences which have a sequence length corresponding to
the entry in the given array l
.
emitDataSet( int n )
and
emitDataSet( int n, null )
should return a data set with
n
sequences of length of the model (
SequenceScore.getLength()
).
Exception
.emitDataSet
in interface StatisticalModel
numberOfSequences
- the number of sequences that should be contained in the
returned data setseqLength
- the length of the sequences for a homogeneous model; for an
inhomogeneous model this parameter should be null
or an array of size 0.DataSet
containing the artificial sequence(s)NotTrainedException
- if the model is not trained yetException
- if the emission did not succeedDataSet
public final AlphabetContainer getAlphabetContainer()
SequenceScore
getAlphabetContainer
in interface SequenceScore
public final int getLength()
SequenceScore
getLength
in interface SequenceScore
public byte getMaximalMarkovOrder() throws UnsupportedOperationException
StatisticalModel
getMaximalMarkovOrder
in interface StatisticalModel
UnsupportedOperationException
- if the model can't give a proper answerpublic ResultSet getCharacteristics() throws Exception
SequenceScore
StorableResult
.getCharacteristics
in interface SequenceScore
Exception
- if some of the characteristics could not be definedStorableResult
protected abstract void fromXML(StringBuffer xml) throws NonParsableException
StringBuffer
. It is the counter part of Storable.toXML()
.xml
- the XML representation of the modelNonParsableException
- if the StringBuffer
is not parsable or the
representation is conflictingAbstractTrainableStatisticalModel(StringBuffer)
public final String toString()
TrainableStatisticalModel
String
.toString
in interface TrainableStatisticalModel
toString
in class Object
String