de.jstacs.classifiers.trainSMBased
Class TrainSMBasedClassifier

java.lang.Object
  extended by de.jstacs.classifiers.AbstractClassifier
      extended by de.jstacs.classifiers.AbstractScoreBasedClassifier
          extended by de.jstacs.classifiers.trainSMBased.TrainSMBasedClassifier
All Implemented Interfaces:
Storable, Cloneable
Direct Known Subclasses:
SharedStructureClassifier

public class TrainSMBasedClassifier
extends AbstractScoreBasedClassifier

Classifier that works on TrainableStatisticalModels for each of the different classes. By calling the AbstractClassifier.train(DataSet...) method of this classifier, all internal TrainableStatisticalModels are learned on the DataSet for the corresponding class using their own TrainableStatisticalModel.train(DataSet) method. In addition, the a-priori class probabilities are estimated. After training, the method AbstractScoreBasedClassifier.getScore(Sequence, int) returns the joint probability (likelihood) of the provided Sequence and the specified class. For two-class problems, the method getScores(DataSet) returns the log-likelihood ratios for all Sequences in the provided DataSet. The methods AbstractScoreBasedClassifier.classify(Sequence) and classify(DataSet) use the likelihoods of Sequence and class and report the class yielding the maximum likelihood.

Author:
Jens Keilwagen
See Also:
TrainableStatisticalModel

Nested Class Summary
 
Nested classes/interfaces inherited from class de.jstacs.classifiers.AbstractScoreBasedClassifier
AbstractScoreBasedClassifier.DoubleTableResult
 
Field Summary
protected  TrainableStatisticalModel[] models
          The internal TrainableStatisticalModels.
 
Constructor Summary
protected TrainSMBasedClassifier(boolean cloneModels, TrainableStatisticalModel... models)
          This constructor creates a new instance with the given TrainableStatisticalModels and clones these if necessary.
  TrainSMBasedClassifier(StringBuffer xml)
          The standard constructor for the interface Storable.
  TrainSMBasedClassifier(TrainableStatisticalModel... models)
          The default constructor that creates a new instance with the given TrainableStatisticalModels.
 
Method Summary
 byte[] classify(DataSet s)
          This method classifies all sequences of a data set and returns an array of indices of the classes to which the respective sequences are assigned with for each index i in the array 0 < i < getNumberOfClasses().
 TrainSMBasedClassifier clone()
           
protected  void extractFurtherClassifierInfosFromXML(StringBuffer xml)
          Extracts further information of a classifier from an XML representation.
 ResultSet getCharacteristics()
          Returns some information characterizing or describing the current instance of the classifier.
 CategoricalResult[] getClassifierAnnotation()
          Returns an array of Results of dimension AbstractClassifier.getNumberOfClasses() that contains information about the classifier and for each class.

res[0] = new CategoricalResult( "classifier", "the kind of classifier", getInstanceName() );
res[1] = new CategoricalResult( "class info 0", "some information about the class", "info0" );
res[2] = new CategoricalResult( "class info 1", "some information about the class", "info1" );
...
protected  StringBuffer getFurtherClassifierInfos()
          This method returns further information of a classifier as a StringBuffer.
 String getInstanceName()
          Returns a short description of the classifier.
 TrainableStatisticalModel getModel(int classIndex)
          Returns a clone of the TrainableStatisticalModel for a specified class.
 NumericalResultSet getNumericalCharacteristics()
          Returns the subset of numerical values that are also returned by AbstractClassifier.getCharacteristics().
static int getPossibleLength(TrainableStatisticalModel... models)
          This method returns the possible length of a classifier that would use the given TrainableStatisticalModels.
protected  double getScore(Sequence seq, int i, boolean check)
          This method returns the score for a given Sequence and a given class.
 double[] getScores(DataSet s)
          This method returns the scores of the classifier for any Sequence in the DataSet.
protected  String getXMLTag()
          Returns the String that is used as tag for the XML representation of the classifier.
 boolean isInitialized()
          This method gives information about the state of the classifier.
 void train(DataSet[] s, double[][] weights)
          This method trains a classifier over an array of weighted DataSet s.
 
Methods inherited from class de.jstacs.classifiers.AbstractScoreBasedClassifier
check, check, classify, classify, createDefaultClassWeights, getClassWeight, getClassWeights, getMultiClassScores, getNumberOfClasses, getPValue, getPValue, getResults, getScore, setClassWeights, setClassWeights, setThresholdClassWeights
 
Methods inherited from class de.jstacs.classifiers.AbstractClassifier
evaluate, evaluate, getAlphabetContainer, getLength, toXML, train
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

models

protected TrainableStatisticalModel[] models
The internal TrainableStatisticalModels. TrainableStatisticalModel 0 handles class 0; TrainableStatisticalModel 1 handles class 1 ... etc.

Constructor Detail

TrainSMBasedClassifier

protected TrainSMBasedClassifier(boolean cloneModels,
                                 TrainableStatisticalModel... models)
                          throws IllegalArgumentException,
                                 CloneNotSupportedException,
                                 ClassDimensionException
This constructor creates a new instance with the given TrainableStatisticalModels and clones these if necessary.

Parameters:
cloneModels - a switch to decide whether to clone the TrainableStatisticalModel or not
models - the TrainableStatisticalModels
Throws:
IllegalArgumentException - if the TrainableStatisticalModels do not describe a common domain of sequences
CloneNotSupportedException - if at least one TrainableStatisticalModel could not be cloned
ClassDimensionException - if the number of classes is below 2
See Also:
AbstractScoreBasedClassifier.AbstractScoreBasedClassifier(AlphabetContainer, int, int, double)

TrainSMBasedClassifier

public TrainSMBasedClassifier(TrainableStatisticalModel... models)
                       throws IllegalArgumentException,
                              CloneNotSupportedException,
                              ClassDimensionException
The default constructor that creates a new instance with the given TrainableStatisticalModels.

Parameters:
models - the TrainableStatisticalModels
Throws:
IllegalArgumentException - if the TrainableStatisticalModels do not describe a common domain of sequences
CloneNotSupportedException - if at least one TrainableStatisticalModel could not be cloned
ClassDimensionException - if the number of classes is below 2
See Also:
TrainSMBasedClassifier(boolean, TrainableStatisticalModel...)

TrainSMBasedClassifier

public TrainSMBasedClassifier(StringBuffer xml)
                       throws NonParsableException
The standard constructor for the interface Storable. Constructs a TrainSMBasedClassifier out of its XML representation.

Parameters:
xml - the XML representation as StringBuffer
Throws:
NonParsableException - if the TrainSMBasedClassifier could not be reconstructed out of the XML representation (the StringBuffer could not be parsed)
See Also:
AbstractScoreBasedClassifier.AbstractScoreBasedClassifier(StringBuffer), Storable
Method Detail

getPossibleLength

public static int getPossibleLength(TrainableStatisticalModel... models)
                             throws IllegalArgumentException
This method returns the possible length of a classifier that would use the given TrainableStatisticalModels.

Parameters:
models - the TrainableStatisticalModels that will be tested
Returns:
the length of a classifier that would use the given models
Throws:
IllegalArgumentException - if no classifier could be created since the TrainableStatisticalModels have incompatible lengths

clone

public TrainSMBasedClassifier clone()
                             throws CloneNotSupportedException
Overrides:
clone in class AbstractScoreBasedClassifier
Throws:
CloneNotSupportedException

getCharacteristics

public ResultSet getCharacteristics()
                             throws Exception
Description copied from class: AbstractClassifier
Returns some information characterizing or describing the current instance of the classifier. This could be for instance the number of edges for a Bayesian network or an image showing some representation of the model of a class. The set of characteristics should always include the XML representation of the classifier. The corresponding result type is StorableResult.

Overrides:
getCharacteristics in class AbstractClassifier
Returns:
the characteristics of the current instance of the classifier
Throws:
Exception - if some of the characteristics could not be defined
See Also:
StorableResult, AbstractClassifier.getNumericalCharacteristics(), ResultSet.ResultSet(de.jstacs.results.Result[][])

getInstanceName

public String getInstanceName()
Description copied from class: AbstractClassifier
Returns a short description of the classifier.

Specified by:
getInstanceName in class AbstractClassifier
Returns:
a short description of the classifier

getModel

public TrainableStatisticalModel getModel(int classIndex)
                                   throws CloneNotSupportedException
Returns a clone of the TrainableStatisticalModel for a specified class.

Parameters:
classIndex - the index of the specified class
Returns:
a clone of the TrainableStatisticalModel of the specified class
Throws:
CloneNotSupportedException - if the TrainableStatisticalModel could not be cloned
See Also:
TrainableStatisticalModel.clone()

getNumericalCharacteristics

public NumericalResultSet getNumericalCharacteristics()
                                               throws Exception
Description copied from class: AbstractClassifier
Returns the subset of numerical values that are also returned by AbstractClassifier.getCharacteristics().

Specified by:
getNumericalCharacteristics in class AbstractClassifier
Returns:
the numerical characteristics
Throws:
Exception - if some of the characteristics could not be defined

isInitialized

public boolean isInitialized()
Description copied from class: AbstractClassifier
This method gives information about the state of the classifier.

Specified by:
isInitialized in class AbstractClassifier
Returns:
true if the classifier is initialized and therefore able to classify sequences, otherwise false

train

public void train(DataSet[] s,
                  double[][] weights)
           throws Exception
Description copied from class: AbstractClassifier
This method trains a classifier over an array of weighted DataSet s. That is why the following has to be fulfilled: This method should work non-incrementally as the method AbstractClassifier.train(DataSet...).

This method should check that the DataSets are defined over the underlying alphabet and length.

Specified by:
train in class AbstractClassifier
Parameters:
s - an array of DataSets
weights - the weights for the DataSets
Throws:
Exception - if the weights are incorrect or the training did not succeed
See Also:
AbstractClassifier.train(DataSet...)

getFurtherClassifierInfos

protected StringBuffer getFurtherClassifierInfos()
Description copied from class: AbstractClassifier
This method returns further information of a classifier as a StringBuffer. This method is used by the method AbstractClassifier.toXML() and should not be made public.

Overrides:
getFurtherClassifierInfos in class AbstractScoreBasedClassifier
Returns:
further information of a classifier as a StringBuffer
See Also:
AbstractClassifier.toXML()

getScore

protected double getScore(Sequence seq,
                          int i,
                          boolean check)
                   throws Exception
Description copied from class: AbstractScoreBasedClassifier
This method returns the score for a given Sequence and a given class.

Specified by:
getScore in class AbstractScoreBasedClassifier
Parameters:
seq - the Sequence
i - the index of the class
check - the switch to decide whether to check AlphabetContainer and the length of the Sequence or not
Returns:
the score for a given Sequence and a given class
Throws:
NotTrainedException - if the classifier is not trained
IllegalArgumentException - if something is wrong with the Sequence seq
Exception - if something went wrong

getScores

public double[] getScores(DataSet s)
                   throws Exception
Description copied from class: AbstractScoreBasedClassifier
This method returns the scores of the classifier for any Sequence in the DataSet. The scores are stored in the array according to the index of the Sequence in the DataSet.

Only for 2-class-classifiers.

Overrides:
getScores in class AbstractScoreBasedClassifier
Parameters:
s - the DataSet
Returns:
the array of scores
Throws:
Exception - if something went wrong

classify

public byte[] classify(DataSet s)
                throws Exception
Description copied from class: AbstractClassifier
This method classifies all sequences of a data set and returns an array of indices of the classes to which the respective sequences are assigned with for each index i in the array 0 < i < getNumberOfClasses().

Overrides:
classify in class AbstractClassifier
Parameters:
s - the data set to be classified
Returns:
an array of class assignments
Throws:
Exception - if something went wrong during the classification

getXMLTag

protected String getXMLTag()
Description copied from class: AbstractClassifier
Returns the String that is used as tag for the XML representation of the classifier. This method is used by the methods AbstractClassifier.fromXML(StringBuffer) and AbstractClassifier.toXML().

Specified by:
getXMLTag in class AbstractClassifier
Returns:
the String that is used as tag for the XML representation of the classifier

extractFurtherClassifierInfosFromXML

protected void extractFurtherClassifierInfosFromXML(StringBuffer xml)
                                             throws NonParsableException
Description copied from class: AbstractClassifier
Extracts further information of a classifier from an XML representation. This method is used by the method AbstractClassifier.fromXML(StringBuffer) and should not be made public.

Overrides:
extractFurtherClassifierInfosFromXML in class AbstractScoreBasedClassifier
Parameters:
xml - the XML representation as StringBuffer
Throws:
NonParsableException - if the information could not be parsed out of the XML representation (the StringBuffer could not be parsed)
See Also:
AbstractClassifier.fromXML(StringBuffer)

getClassifierAnnotation

public CategoricalResult[] getClassifierAnnotation()
Description copied from class: AbstractClassifier
Returns an array of Results of dimension AbstractClassifier.getNumberOfClasses() that contains information about the classifier and for each class.

res[0] = new CategoricalResult( "classifier", "the kind of classifier", getInstanceName() );
res[1] = new CategoricalResult( "class info 0", "some information about the class", "info0" );
res[2] = new CategoricalResult( "class info 1", "some information about the class", "info1" );
...

Specified by:
getClassifierAnnotation in class AbstractClassifier
Returns:
an array of Results that contains information about the classifier