public abstract class HomogeneousTrainSM extends DiscreteGraphicalTrainSM
HomogeneousTrainSMParameterSet| Modifier and Type | Class and Description |
|---|---|
protected class |
HomogeneousTrainSM.HomCondProb
This class handles the (conditional) probabilities of a homogeneous model
in a fast way.
|
| Modifier and Type | Field and Description |
|---|---|
protected byte |
order
The order of the model.
|
protected int[] |
powers
The powers of the alphabet length.
|
params, trainedalphabets, length| Constructor and Description |
|---|
HomogeneousTrainSM(HomogeneousTrainSMParameterSet params)
Creates a homogeneous model from a parameter set.
|
HomogeneousTrainSM(StringBuffer stringBuff)
The standard constructor for the interface
Storable. |
| Modifier and Type | Method and Description |
|---|---|
protected void |
check(Sequence sequence,
int startpos,
int endpos)
Checks some constraints, these are in general conditions on the
AlphabetContainer of a (sub)Sequence
between startpos und endpos. |
protected int |
chooseFromDistr(Constraint distr,
int start,
int end,
double randNo)
Chooses a value in
[0,end-start] according to the
distribution encoded in the frequencies of distr between the
indices start and end. |
protected HomogeneousTrainSM.HomCondProb[] |
cloneHomProb(HomogeneousTrainSM.HomCondProb[] p)
Clones the given array of conditional probabilities.
|
DataSet |
emitDataSet(int no,
int... length)
|
double |
getLogProbFor(Sequence sequence,
int startpos,
int endpos)
Returns the logarithm of the probability of (a part of) the given
sequence given the model.
|
byte |
getMaximalMarkovOrder()
This method returns the maximal used Markov order, if possible.
|
NumericalResultSet |
getNumericalCharacteristics()
Returns the subset of numerical values that are also returned by
SequenceScore.getCharacteristics(). |
protected abstract Sequence |
getRandomSequence(Random r,
int length)
This method creates a random
Sequence from a trained homogeneous
model. |
protected abstract double |
logProbFor(Sequence sequence,
int startpos,
int endpos)
This method computes the logarithm of the probability of the given
Sequence in the given interval. |
protected void |
set(DGTrainSMParameterSet params,
boolean trained)
Sets the parameters as internal parameters and does some essential
computations.
|
void |
train(DataSet[] data)
Trains the homogeneous model on all given
DataSets. |
abstract void |
train(DataSet[] data,
double[][] weights)
Trains the homogeneous model using an array of weighted
DataSets. |
clone, fromXML, getCurrentParameterSet, getDescription, getESS, getFurtherModelInfos, getXMLTag, isInitialized, setFurtherModelInfos, toString, toXMLgetAlphabetContainer, getCharacteristics, getLength, getLogProbFor, getLogProbFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, toString, trainequals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waittraingetLogPriorTermgetInstanceNameprotected int[] powers
protected byte order
public HomogeneousTrainSM(HomogeneousTrainSMParameterSet params) throws CloneNotSupportedException, IllegalArgumentException, NonParsableException
params - the parameter setCloneNotSupportedException - if the parameter set could not be clonedIllegalArgumentException - if the parameter set is not instantiatedNonParsableException - if the parameter set is not parsableHomogeneousTrainSMParameterSet,
DiscreteGraphicalTrainSM.DiscreteGraphicalTrainSM(DGTrainSMParameterSet)public HomogeneousTrainSM(StringBuffer stringBuff) throws NonParsableException
Storable.
Creates a new HomogeneousTrainSM out of its XML representation.stringBuff - the XML representation as StringBufferNonParsableException - if the HomogeneousTrainSM could not be reconstructed
out of the XML representation (the StringBuffer could
not be parsed)Storable,
DiscreteGraphicalTrainSM.DiscreteGraphicalTrainSM(StringBuffer)public final DataSet emitDataSet(int no, int... length) throws NotTrainedException, IllegalArgumentException, EmptyDataSetException, WrongAlphabetException, WrongSequenceTypeException
emitDataSet in interface StatisticalModelemitDataSet in class AbstractTrainableStatisticalModelno - the number of Sequences that should be in the
DataSetlength - the length of all Sequences or an array of lengths
with the Sequence with index i having
length length[i]DataSetNotTrainedException - if the model was not trainedIllegalArgumentException - if the dimension of length is neither 1 nor
noEmptyDataSetException - if no == 0WrongSequenceTypeException - if the Sequence type is not suitable (for the
AlphabetContainer)WrongAlphabetException - if something is wrong with the alphabetDataSet.DataSet(String, Sequence...)protected abstract Sequence getRandomSequence(Random r, int length) throws WrongAlphabetException, WrongSequenceTypeException
Sequence from a trained homogeneous
model.r - the random generatorlength - the length of the SequenceSequenceWrongSequenceTypeException - if the Sequence type is not suitable (for the
AlphabetContainer)WrongAlphabetException - if something is wrong with the alphabetpublic byte getMaximalMarkovOrder()
StatisticalModelgetMaximalMarkovOrder in interface StatisticalModelgetMaximalMarkovOrder in class AbstractTrainableStatisticalModelpublic NumericalResultSet getNumericalCharacteristics() throws Exception
SequenceScoreSequenceScore.getCharacteristics().Exception - if some of the characteristics could not be definedpublic final double getLogProbFor(Sequence sequence, int startpos, int endpos) throws NotTrainedException, Exception
StatisticalModelStatisticalModel.getLogProbFor(Sequence, int) by the fact, that the model could be
e.g. homogeneous and therefore the length of the sequences, whose
probability should be returned, is not fixed. Additionally, the end
position of the part of the given sequence is given and the probability
of the part from position startpos to endpos
(inclusive) should be returned.
length and the alphabets define the type of
data that can be modeled and therefore both has to be checked.sequence - the given sequencestartpos - the start position within the given sequenceendpos - the last position to be taken into accountNotTrainedException - if the model is not trained yetException - if the sequence could not be handled (e.g.
startpos > , endpos
> sequence.length, ...) by the modelpublic void train(DataSet[] data) throws Exception
DataSets.data - the given DataSetsException - if something went wrongtrain(DataSet[], double[][])protected void set(DGTrainSMParameterSet params, boolean trained) throws CloneNotSupportedException, NonParsableException
DiscreteGraphicalTrainSMfromParameterSet-methods.set in class DiscreteGraphicalTrainSMparams - the new ParameterSettrained - indicates if the model is trained or notCloneNotSupportedException - if the parameter set could not be clonedNonParsableException - if the parameters of the model could not be parsedprotected void check(Sequence sequence, int startpos, int endpos) throws NotTrainedException, IllegalArgumentException
AlphabetContainer of a (sub)Sequence
between startpos und endpos.check in class DiscreteGraphicalTrainSMsequence - the Sequencestartpos - the start position within the Sequenceendpos - the end position within the SequenceNotTrainedException - if the model is not trainedIllegalArgumentException - if some arguments are wrongDiscreteGraphicalTrainSM.check(Sequence, int, int)protected final int chooseFromDistr(Constraint distr, int start, int end, double randNo)
[0,end-start] according to the
distribution encoded in the frequencies of distr between the
indices start and end.
distr is not changed in the process.distr - the distributionstart - the start indexend - the end indexrandNo - a random number in [0,1]Constraint.getFreq(int)protected abstract double logProbFor(Sequence sequence, int startpos, int endpos)
Sequence in the given interval. The method is only used in
StatisticalModel.getLogProbFor(Sequence, int, int) after
the method check(Sequence, int, int) has been
invoked.sequence - the Sequencestartpos - the start position within the Sequenceendpos - the end position within the Sequencecheck(Sequence, int, int),
StatisticalModel.getLogProbFor(Sequence, int, int)protected HomogeneousTrainSM.HomCondProb[] cloneHomProb(HomogeneousTrainSM.HomCondProb[] p)
p - the original conditional probabilities