|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES All Classes | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectde.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
public abstract class AbstractTrainableStatisticalModel
Abstract class for a model for pattern recognition.
For writing or reading a StringBuffer to or from a file (
fromXML(StringBuffer), Storable.toXML()) you can use the class
FileManager.
FileManager| Field Summary | |
|---|---|
protected AlphabetContainer |
alphabets
The underlying alphabets |
protected int |
length
The length of the sequences the model can classify. |
| Constructor Summary | |
|---|---|
AbstractTrainableStatisticalModel(AlphabetContainer alphabets,
int length)
Constructor that sets the length of the model to length and
the AlphabetContainer to alphabets. |
|
AbstractTrainableStatisticalModel(StringBuffer stringBuff)
The standard constructor for the interface Storable. |
|
| Method Summary | |
|---|---|
protected void |
check(Sequence sequence,
int startpos,
int endpos)
This method checks all parameters before a probability can be computed for a sequence. |
AbstractTrainableStatisticalModel |
clone()
Follows the conventions of Object's clone()-method. |
DataSet |
emitDataSet(int numberOfSequences,
int... seqLength)
This method returns a DataSet object containing artificial
sequence(s). |
protected abstract void |
fromXML(StringBuffer xml)
This method should only be used by the constructor that works on a StringBuffer. |
AlphabetContainer |
getAlphabetContainer()
Returns the container of alphabets that were used when constructing the instance. |
ResultSet |
getCharacteristics()
Returns some information characterizing or describing the current instance. |
int |
getLength()
Returns the length of sequences this instance can score. |
double |
getLogProbFor(Sequence sequence)
Returns the logarithm of the probability of the given sequence given the model. |
double |
getLogProbFor(Sequence sequence,
int startpos)
Returns the logarithm of the probability of (a part of) the given sequence given the model. |
double[] |
getLogScoreFor(DataSet data)
This method computes the logarithm of the scores of all sequences in the given sample. |
void |
getLogScoreFor(DataSet data,
double[] res)
This method computes and stores the logarithm of the scores for any sequence in the sample in the given double-array. |
double |
getLogScoreFor(Sequence sequence)
Returns the logarithmic score for the Sequence seq. |
double |
getLogScoreFor(Sequence sequence,
int startpos)
Returns the logarithmic score for the Sequence seq
beginning at position start in the Sequence. |
double |
getLogScoreFor(Sequence sequence,
int startpos,
int endpos)
Returns the logarithmic score for the Sequence seq
beginning at position start in the Sequence. |
byte |
getMaximalMarkovOrder()
This method returns the maximal used Markov order, if possible. |
void |
train(DataSet data)
Trains the TrainableStatisticalModel object given the data as DataSet. |
| Methods inherited from class java.lang.Object |
|---|
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Methods inherited from interface de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel |
|---|
toString, train |
| Methods inherited from interface de.jstacs.sequenceScores.statisticalModels.StatisticalModel |
|---|
getLogPriorTerm, getLogProbFor |
| Methods inherited from interface de.jstacs.sequenceScores.SequenceScore |
|---|
getInstanceName, getNumericalCharacteristics, isInitialized |
| Methods inherited from interface de.jstacs.Storable |
|---|
toXML |
| Field Detail |
|---|
protected int length
protected AlphabetContainer alphabets
| Constructor Detail |
|---|
public AbstractTrainableStatisticalModel(AlphabetContainer alphabets,
int length)
length and
the AlphabetContainer to alphabets.
length gives the length of the sequences the
model can classify. Models that can only classify sequences of defined
length are e.g. PWM or inhomogeneous Markov models. If the model can
classify sequences of arbitrary length, e.g. homogeneous Markov models,
this parameter must be set to 0 (zero).
length and alphabets define the type of
data that can be modeled and therefore both has to be checked before any
evaluation (e.g. getLogScoreFor(Sequence))
alphabets - the alphabets in an AlphabetContainerlength - the length of the sequences a model can classify, 0 for
arbitrary lengthpublic AbstractTrainableStatisticalModel(StringBuffer stringBuff)
throws NonParsableException
Storable.
Creates a new AbstractTrainableStatisticalModel out of a StringBuffer.
stringBuff - the StringBuffer to be parsed
NonParsableException - is thrown if the StringBuffer could not be parsed| Method Detail |
|---|
public AbstractTrainableStatisticalModel clone()
throws CloneNotSupportedException
Object's clone()-method.
clone in interface SequenceScoreclone in interface TrainableStatisticalModelclone in class ObjectAbstractTrainableStatisticalModel
(the member-AlphabetContainer isn't deeply cloned since
it is assumed to be immutable). The type of the returned object
is defined by the class X directly inherited from
AbstractTrainableStatisticalModel. Hence X's
clone()-method should work as:Object o = (X)super.clone(); o defined by
X that are not of simple data-types like
int, double, ... have to be deeply
copied return o
CloneNotSupportedException - if something went wrong while cloningpublic void train(DataSet data)
throws Exception
TrainableStatisticalModelTrainableStatisticalModel object given the data as DataSet. train(data1); train(data2)
should be a fully trained model over data2 and not over
data1+data2. All parameters of the model were given by the
call of the constructor.
train in interface TrainableStatisticalModeldata - the given sequences as DataSet
Exception - if the training did not succeedDataSet.getElementAt(int),
DataSet.ElementEnumeratorpublic double getLogProbFor(Sequence sequence)
throws Exception
StatisticalModellength and the alphabets define the type of
data that can be modeled and therefore both has to be checked.
getLogProbFor in interface StatisticalModelsequence - the given sequence for which the logarithm of the
probability/the value of the density function should be
returned
Exception - if the sequence could not be handled by the model
NotTrainedException - if the model is not trained yetStatisticalModel.getLogProbFor(Sequence, int, int)public double getLogProbFor(Sequence sequence,
int startpos)
throws Exception
StatisticalModelstartpos. E.g. the fixed length is 12. The length
of the given sequence is 30 and the startpos=15 the logarithm
of the probability of the part from position 15 to 26 (inclusive) given
the model should be returned.
length and the alphabets define the type of
data that can be modeled and therefore both has to be checked.
getLogProbFor in interface StatisticalModelsequence - the given sequencestartpos - the start position within the given sequence
Exception - if the sequence could not be handled by the model
NotTrainedException - if the model is not trained yetStatisticalModel.getLogProbFor(Sequence, int, int)protected void check(Sequence sequence,
int startpos,
int endpos)
throws NotTrainedException,
IllegalArgumentException
StatisticalModel.getLogProbFor(Sequence, int, int).
sequence - the given sequencestartpos - the start position within the given sequenceendpos - the last position to be taken into account
IllegalArgumentException - if the sequence could not be handled (e.g.
startpos > , endpos
> sequence.length, ...) by the model
NotTrainedException - if the model is not trained yetpublic double getLogScoreFor(Sequence sequence)
SequenceScoreSequence seq.
getLogScoreFor in interface SequenceScoresequence - the sequence
public double getLogScoreFor(Sequence sequence,
int startpos)
SequenceScoreSequence seq
beginning at position start in the Sequence.
getLogScoreFor in interface SequenceScoresequence - the Sequencestartpos - the start position in the Sequence
Sequencepublic double getLogScoreFor(Sequence sequence,
int startpos,
int endpos)
SequenceScoreSequence seq
beginning at position start in the Sequence.
getLogScoreFor in interface SequenceScoresequence - the Sequencestartpos - the start position in the Sequenceendpos - the end position (inclusive) in the Sequence
Sequencepublic double[] getLogScoreFor(DataSet data)
throws Exception
SequenceScoreSequenceScore.getLogScoreFor(Sequence).
getLogScoreFor in interface SequenceScoredata - the sample of sequences
Exception - if something went wrongSequenceScore.getLogScoreFor(Sequence)public void getLogScoreFor(DataSet data,
double[] res)
throws Exception
SequenceScoredouble-array.
SequenceScore.getLogScoreFor(Sequence).
getLogScoreFor in interface SequenceScoredata - the sample of sequencesres - the array for the results, has to have length
data.getNumberOfElements() (which returns the
number of sequences in the sample)
Exception - if something went wrongSequenceScore.getLogScoreFor(Sequence),
SequenceScore.getLogScoreFor(DataSet)public DataSet emitDataSet(int numberOfSequences,
int... seqLength)
throws NotTrainedException,
Exception
StatisticalModelDataSet object containing artificial
sequence(s).
emitDataSet( int n, int l ) should return a data set with
n sequences of length l.
emitDataSet( int n, int[] l ) should return a data set with
n sequences which have a sequence length corresponding to
the entry in the given array l.
emitDataSet( int n ) and
emitDataSet( int n, null ) should return a sample with
n sequences of length of the model (
SequenceScore.getLength()).
Exception.
emitDataSet in interface StatisticalModelnumberOfSequences - the number of sequences that should be contained in the
returned sampleseqLength - the length of the sequences for a homogeneous model; for an
inhomogeneous model this parameter should be null
or an array of size 0.
DataSet containing the artificial sequence(s)
NotTrainedException - if the model is not trained yet
Exception - if the emission did not succeedDataSetpublic final AlphabetContainer getAlphabetContainer()
SequenceScore
getAlphabetContainer in interface SequenceScorepublic final int getLength()
SequenceScore
getLength in interface SequenceScorepublic byte getMaximalMarkovOrder()
throws UnsupportedOperationException
StatisticalModel
getMaximalMarkovOrder in interface StatisticalModelUnsupportedOperationException - if the model can't give a proper answerpublic ResultSet getCharacteristics()
throws Exception
SequenceScoreStorableResult.
getCharacteristics in interface SequenceScoreException - if some of the characteristics could not be definedStorableResultprotected abstract void fromXML(StringBuffer xml)
throws NonParsableException
StringBuffer. It is the counter part of Storable.toXML().
xml - the XML representation of the model
NonParsableException - if the StringBuffer is not parsable or the
representation is conflictingAbstractTrainableStatisticalModel(StringBuffer)
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES All Classes | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||