de.jstacs.classifier
Class AbstractClassifier

java.lang.Object
  extended by de.jstacs.classifier.AbstractClassifier
All Implemented Interfaces:
Storable, Cloneable
Direct Known Subclasses:
AbstractScoreBasedClassifier

public abstract class AbstractClassifier
extends Object
implements Storable, Cloneable

The super class for any classifier.

The order of the classes is never changed inside the classifier. The samples you put in the methods like train, test and evaluate should always have the same order that you have used while instantiation of the object.

For two classes it is highly recommended to set the foreground as first class and the second class as background.

Author:
Jens Keilwagen, Jan Grau

Constructor Summary
AbstractClassifier(AlphabetContainer abc)
          The constructor for a homogeneous classifier.
AbstractClassifier(AlphabetContainer abc, int length)
          The constructor for an inhomogeneous classifier.
AbstractClassifier(StringBuffer xml)
          The standard constructor for the interface Storable.
 
Method Summary
 byte[] classify(Sample s)
          This method classifies all sequences of a sample and returns an array of indices of the classes to which the respective sequences are assigned with for each index i in the array 0 < i < getNumberOfClasses().
abstract  byte classify(Sequence seq)
          This method classifies a sequence and returns the index i of the class to which the sequence is assigned with 0 < i < getNumberOfClasses().
 AbstractClassifier clone()
           
 NumericalResultSet evaluate(MeasureParameters params, boolean exceptionIfNotComputeable, Sample... s)
          This method evaluates the classifier and computes all numerical results as, for instance, the sensitivity for a given specificity, the area under the ROC curve and so on.
 ResultSet evaluateAll(MeasureParameters params, boolean exceptionIfNotComputeable, Sample... s)
          This method evaluates the classifier and computes all results.
protected abstract  void extractFurtherClassifierInfosFromXML(StringBuffer xml)
          Extracts further information of a classifier from an XML representation.
 AlphabetContainer getAlphabetContainer()
          This method returns the container of alphabets that is used in the classifier.
 ResultSet getCharacteristics()
          Returns some information characterizing or describing the current instance of the classifier.
protected  NumericalResult getClassificationRate(Sample[] s)
          This method computes the classification rate for a given array of samples.
abstract  CategoricalResult[] getClassifierAnnotation()
          Returns an array of Results of dimension getNumberOfClasses() that contains information about the classifier and for each class.
protected abstract  StringBuffer getFurtherClassifierInfos()
          This method returns further information of a classifier as a StringBuffer.
abstract  String getInstanceName()
          Returns a short description of the classifier.
 int getLength()
          Returns the length of the sequences this classifier can handle or 0 for sequences of arbitrary length.
static MeasureParameters getMeasuresForEvaluate()
          Returns an object of the parameters for the method evaluate(MeasureParameters, boolean, Sample...).
static MeasureParameters getMeasuresForEvaluateAll()
          Returns an object of the parameters for the method evaluateAll(MeasureParameters, boolean, Sample...).
abstract  int getNumberOfClasses()
          Returns the number of classes that can be distinguished.
abstract  NumericalResultSet getNumericalCharacteristics()
          Returns the subset of numerical values that are also returned by getCharacteristics().
protected  LinkedList<? extends Result> getResults(Sample[] s, MeasureParameters params, boolean exceptionIfNotComputeable, boolean all)
          This method computes the results for any evaluation of the classifier.
protected abstract  String getXMLTag()
          Returns the String that is used as tag for the XML representation of the classifier.
abstract  boolean isTrained()
          This method gives information about the state of the classifier.
 boolean setNewAlphabetContainerInstance(AlphabetContainer abc)
          This method tries to set a new instance of an AlphabetContainer for the current classifier.
 ConfusionMatrix test(Sample... testData)
          This method computes the confusion matrix for a given array of test data.
 StringBuffer toXML()
          This method returns an XML representation as StringBuffer of an instance of the implementing class.
 void train(Sample... s)
          Trains the AbstractClassifier object given the data as Samples.
abstract  void train(Sample[] s, double[][] weights)
          This method trains a classifier over an array of weighted Sample s.
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

AbstractClassifier

public AbstractClassifier(AlphabetContainer abc)
The constructor for a homogeneous classifier. Such a classifier can handle sequences of arbitrary length.

Parameters:
abc - the alphabets that are used
See Also:
AbstractClassifier(AlphabetContainer, int)

AbstractClassifier

public AbstractClassifier(AlphabetContainer abc,
                          int length)
                   throws IllegalArgumentException
The constructor for an inhomogeneous classifier. Such a classifier can handle sequences of fixed length.

Parameters:
abc - the alphabets that are used
length - the length of the sequences that can be classified
Throws:
IllegalArgumentException - if the length and the possible length of the AlphabetContainer does not match

AbstractClassifier

public AbstractClassifier(StringBuffer xml)
                   throws NonParsableException
The standard constructor for the interface Storable. Creates a new AbstractClassifier out of its XML representation.

Parameters:
xml - the XML representation as StringBuffer
Throws:
NonParsableException - if the AbstractClassifier could not be reconstructed out of the XML representation (the StringBuffer could not be parsed)
See Also:
Storable
Method Detail

getMeasuresForEvaluate

public static final MeasureParameters getMeasuresForEvaluate()
                                                      throws ParameterException
Returns an object of the parameters for the method evaluate(MeasureParameters, boolean, Sample...). The parameters can be set and the measures can be switched on or off.

Returns:
an object of the parameters for the method evaluate(MeasureParameters, boolean, Sample...)
Throws:
ParameterException - if something went wrong while constructing the parameter object
See Also:
evaluate(MeasureParameters, boolean, Sample...), MeasureParameters.MeasureParameters(boolean)

getMeasuresForEvaluateAll

public static final MeasureParameters getMeasuresForEvaluateAll()
                                                         throws ParameterException
Returns an object of the parameters for the method evaluateAll(MeasureParameters, boolean, Sample...). The parameters can be set and the measures can be switched on or off.

Returns:
an object of the parameters for the method evaluateAll(MeasureParameters, boolean, Sample...)
Throws:
ParameterException - if something went wrong while constructing the parameter object
See Also:
evaluateAll(MeasureParameters, boolean, Sample...), MeasureParameters.MeasureParameters(boolean)

classify

public abstract byte classify(Sequence seq)
                       throws Exception
This method classifies a sequence and returns the index i of the class to which the sequence is assigned with 0 < i < getNumberOfClasses().

This method should check that the sequence is defined over the underlying alphabet and length.

Parameters:
seq - the sequence to be classified
Returns:
the index of the class to which the sequence is assigned
Throws:
Exception - if the classifier is not trained or something is wrong with the sequence

classify

public byte[] classify(Sample s)
                throws Exception
This method classifies all sequences of a sample and returns an array of indices of the classes to which the respective sequences are assigned with for each index i in the array 0 < i < getNumberOfClasses().

Parameters:
s - the sample to be classified
Returns:
an array of class assignments
Throws:
Exception - if something went wrong during the classification

clone

public AbstractClassifier clone()
                         throws CloneNotSupportedException
Overrides:
clone in class Object
Throws:
CloneNotSupportedException

evaluate

public final NumericalResultSet evaluate(MeasureParameters params,
                                         boolean exceptionIfNotComputeable,
                                         Sample... s)
                                  throws Exception
This method evaluates the classifier and computes all numerical results as, for instance, the sensitivity for a given specificity, the area under the ROC curve and so on. This method should be used in any kind of classifier assessment as, for instance, crossvalidation, hold out sampling, ... .

For two classes it is highly recommended to set the foreground as first class and the second class as background, i.e. the first sample should be the foreground sample and the second should be background sample. See also this comment.

Parameters:
params - the current parameters
exceptionIfNotComputeable - indicates the method throws an Exception if a measure could not be computed
s - the array of Samples
Returns:
a set of numerical results
Throws:
Exception - if something went wrong
See Also:
NumericalResultSet.NumericalResultSet(LinkedList), getResults(Sample[], MeasureParameters, boolean, boolean), ClassifierAssessment.assess(MeasureParameters, de.jstacs.classifier.assessment.ClassifierAssessmentAssessParameterSet, de.jstacs.utils.ProgressUpdater, Sample...), ClassifierAssessment.assess(MeasureParameters, de.jstacs.classifier.assessment.ClassifierAssessmentAssessParameterSet, Sample...), ClassifierAssessment.assess(MeasureParameters, de.jstacs.classifier.assessment.ClassifierAssessmentAssessParameterSet, de.jstacs.utils.ProgressUpdater, Sample[][][])

evaluateAll

public final ResultSet evaluateAll(MeasureParameters params,
                                   boolean exceptionIfNotComputeable,
                                   Sample... s)
                            throws Exception
This method evaluates the classifier and computes all results.

Parameters:
params - the current parameters
exceptionIfNotComputeable - indicates the method throws an Exception if a measure could not be computed
s - the array of Samples
Returns:
a set of results
Throws:
Exception - if something went wrong
See Also:
ResultSet.ResultSet(java.util.Collection), getResults(Sample[], MeasureParameters, boolean, boolean)

getResults

protected LinkedList<? extends Result> getResults(Sample[] s,
                                                  MeasureParameters params,
                                                  boolean exceptionIfNotComputeable,
                                                  boolean all)
                                           throws Exception
This method computes the results for any evaluation of the classifier.

Parameters:
s - the array of Samples
params - the current parameters
exceptionIfNotComputeable - indicates the method throws an Exception if a measure could not be computed
all - indicates the method computes all results or only the numerical results
Returns:
a list of results
Throws:
Exception - if something went wrong
See Also:
evaluate(MeasureParameters, boolean, Sample...), evaluateAll(MeasureParameters, boolean, Sample...)

getClassificationRate

protected final NumericalResult getClassificationRate(Sample[] s)
                                               throws Exception
This method computes the classification rate for a given array of samples.

Parameters:
s - the array of Samples; Sample 0 contains only elements of class 0; Sample 1 ...
Returns:
the classification rate for the samples
Throws:
Exception - if something went wrong during the classification

getAlphabetContainer

public final AlphabetContainer getAlphabetContainer()
This method returns the container of alphabets that is used in the classifier.

Returns:
the used container of alphabets

getCharacteristics

public ResultSet getCharacteristics()
                             throws Exception
Returns some information characterizing or describing the current instance of the classifier. This could be for instance the number of edges for a Bayesian network or an image showing some representation of the model of a class. The set of characteristics should always include the XML representation of the classifier. The corresponding result type is StorableResult.

Returns:
the characteristics of the current instance of the classifier
Throws:
Exception - if some of the characteristics could not be defined
See Also:
StorableResult, getNumericalCharacteristics(), ResultSet.ResultSet(de.jstacs.results.Result[][])

getInstanceName

public abstract String getInstanceName()
Returns a short description of the classifier.

Returns:
a short description of the classifier

getClassifierAnnotation

public abstract CategoricalResult[] getClassifierAnnotation()
Returns an array of Results of dimension getNumberOfClasses() that contains information about the classifier and for each class.

res[0] = new CategoricalResult( "classifier", "the kind of classifier", getInstanceName() );
res[1] = new CategoricalResult( "class info 0", "some information about the class", "info0" );
res[2] = new CategoricalResult( "class info 1", "some information about the class", "info1" );
...

Returns:
an array of Results that contains information about the classifier

getLength

public final int getLength()
Returns the length of the sequences this classifier can handle or 0 for sequences of arbitrary length.

Returns:
the length of the sequences the classifier can handle

getNumericalCharacteristics

public abstract NumericalResultSet getNumericalCharacteristics()
                                                        throws Exception
Returns the subset of numerical values that are also returned by getCharacteristics().

Returns:
the numerical characteristics
Throws:
Exception - if some of the characteristics could not be defined

getNumberOfClasses

public abstract int getNumberOfClasses()
Returns the number of classes that can be distinguished. For example if distinguishing between foreground and background this method should return 2, even if you use a mixture model for either foreground or background.

Returns:
the number of classes that can be distinguished

isTrained

public abstract boolean isTrained()
This method gives information about the state of the classifier.

Returns:
true if the classifier is trained and therefore able to classify sequences, otherwise false

setNewAlphabetContainerInstance

public boolean setNewAlphabetContainerInstance(AlphabetContainer abc)
This method tries to set a new instance of an AlphabetContainer for the current classifier.
This instance has to be consistent with the underlying instance of an AlphabetContainer.

This method can be very useful to save time.

Parameters:
abc - the alphabets
Returns:
true if the new AlphabetContainer could be set, false otherwise
See Also:
getAlphabetContainer(), AlphabetContainer.checkConsistency(AlphabetContainer)

test

public ConfusionMatrix test(Sample... testData)
                     throws Exception,
                            ClassDimensionException
This method computes the confusion matrix for a given array of test data.

Parameters:
testData - the given array of test data
Returns:
the confusion matrix
Throws:
ClassDimensionException - if the number of Samples is incorrect
Exception - if something else went wrong

train

public void train(Sample... s)
           throws Exception
Trains the AbstractClassifier object given the data as Samples.
This method should work non-incrementally. That means the result of the following series: train(data1); train(data2); should be a fully trained model over data2 and not over data1, data2.

This method should check that the Samples are defined over the underlying alphabet and length.

Parameters:
s - the data
  • either an array of Samples: train( new Sample[]{s1,s2,s3}) or
  • an enumeration of Samples: train(s1,s2,s3)
Throws:
Exception - if the training did not succeed
See Also:
train(Sample[], double[][])

train

public abstract void train(Sample[] s,
                           double[][] weights)
                    throws Exception
This method trains a classifier over an array of weighted Sample s. That is why the following has to be fulfilled: This method should work non-incrementally as the method train(Sample...).

This method should check that the Samples are defined over the underlying alphabet and length.

Parameters:
s - an array of Samples
weights - the weights for the Samples
Throws:
Exception - if the weights are incorrect or the training did not succeed
See Also:
train(Sample...)

getXMLTag

protected abstract String getXMLTag()
Returns the String that is used as tag for the XML representation of the classifier. This method is used by the methods fromXML(StringBuffer) and toXML().

Returns:
the String that is used as tag for the XML representation of the classifier

extractFurtherClassifierInfosFromXML

protected abstract void extractFurtherClassifierInfosFromXML(StringBuffer xml)
                                                      throws NonParsableException
Extracts further information of a classifier from an XML representation. This method is used by the method fromXML(StringBuffer) and should not be made public.

Parameters:
xml - the XML representation as StringBuffer
Throws:
NonParsableException - if the information could not be parsed out of the XML representation (the StringBuffer could not be parsed)
See Also:
fromXML(StringBuffer)

toXML

public final StringBuffer toXML()
Description copied from interface: Storable
This method returns an XML representation as StringBuffer of an instance of the implementing class.

Specified by:
toXML in interface Storable
Returns:
the XML representation

getFurtherClassifierInfos

protected abstract StringBuffer getFurtherClassifierInfos()
This method returns further information of a classifier as a StringBuffer. This method is used by the method toXML() and should not be made public.

Returns:
further information of a classifier as a StringBuffer
See Also:
toXML()