de.jstacs.classifiers
Class AbstractClassifier

java.lang.Object
  extended by de.jstacs.classifiers.AbstractClassifier
All Implemented Interfaces:
Storable, Cloneable
Direct Known Subclasses:
AbstractScoreBasedClassifier

public abstract class AbstractClassifier
extends Object
implements Storable, Cloneable

The super class for any classifier.

The order of the classes is never changed inside the classifier. The samples you put in the methods like train, test and evaluate should always have the same order that you have used while instantiation of the object.

For two classes it is highly recommended to set the foreground as first class and the second class as background.

Author:
Jens Keilwagen, Jan Grau

Constructor Summary
AbstractClassifier(AlphabetContainer abc)
          The constructor for a homogeneous classifier.
AbstractClassifier(AlphabetContainer abc, int length)
          The constructor for an inhomogeneous classifier.
AbstractClassifier(StringBuffer xml)
          The standard constructor for the interface Storable.
 
Method Summary
 byte[] classify(DataSet s)
          This method classifies all sequences of a sample and returns an array of indices of the classes to which the respective sequences are assigned with for each index i in the array 0 < i < getNumberOfClasses().
abstract  byte classify(Sequence seq)
          This method classifies a sequence and returns the index i of the class to which the sequence is assigned with 0 < i < getNumberOfClasses().
 AbstractClassifier clone()
           
 ResultSet evaluate(PerformanceMeasureParameterSet params, boolean exceptionIfNotComputeable, DataSet... s)
          This method evaluates the classifier and computes, for instance, the sensitivity for a given specificity, the area under the ROC curve and so on.
protected abstract  void extractFurtherClassifierInfosFromXML(StringBuffer xml)
          Extracts further information of a classifier from an XML representation.
 AlphabetContainer getAlphabetContainer()
          This method returns the container of alphabets that is used in the classifier.
 ResultSet getCharacteristics()
          Returns some information characterizing or describing the current instance of the classifier.
abstract  CategoricalResult[] getClassifierAnnotation()
          Returns an array of Results of dimension getNumberOfClasses() that contains information about the classifier and for each class.
protected abstract  StringBuffer getFurtherClassifierInfos()
          This method returns further information of a classifier as a StringBuffer.
abstract  String getInstanceName()
          Returns a short description of the classifier.
 int getLength()
          Returns the length of the sequences this classifier can handle or 0 for sequences of arbitrary length.
protected  double[][][] getMultiClassScores(DataSet[] s)
          This method returns a multidimensional array with class specific scores.
abstract  int getNumberOfClasses()
          Returns the number of classes that can be distinguished.
abstract  NumericalResultSet getNumericalCharacteristics()
          Returns the subset of numerical values that are also returned by getCharacteristics().
protected  boolean getResults(LinkedList list, DataSet[] s, PerformanceMeasureParameterSet params, boolean exceptionIfNotComputeable)
          This method computes the results for any evaluation of the classifier.
protected abstract  String getXMLTag()
          Returns the String that is used as tag for the XML representation of the classifier.
abstract  boolean isInitialized()
          This method gives information about the state of the classifier.
 StringBuffer toXML()
          This method returns an XML representation as StringBuffer of an instance of the implementing class.
 void train(DataSet... s)
          Trains the AbstractClassifier object given the data as DataSets.
abstract  void train(DataSet[] s, double[][] weights)
          This method trains a classifier over an array of weighted DataSet s.
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

AbstractClassifier

public AbstractClassifier(AlphabetContainer abc)
The constructor for a homogeneous classifier. Such a classifier can handle sequences of arbitrary length.

Parameters:
abc - the alphabets that are used
See Also:
AbstractClassifier(AlphabetContainer, int)

AbstractClassifier

public AbstractClassifier(AlphabetContainer abc,
                          int length)
                   throws IllegalArgumentException
The constructor for an inhomogeneous classifier. Such a classifier can handle sequences of fixed length.

Parameters:
abc - the alphabets that are used
length - the length of the sequences that can be classified
Throws:
IllegalArgumentException - if the length and the possible length of the AlphabetContainer does not match

AbstractClassifier

public AbstractClassifier(StringBuffer xml)
                   throws NonParsableException
The standard constructor for the interface Storable. Creates a new AbstractClassifier out of its XML representation.

Parameters:
xml - the XML representation as StringBuffer
Throws:
NonParsableException - if the AbstractClassifier could not be reconstructed out of the XML representation (the StringBuffer could not be parsed)
See Also:
Storable
Method Detail

classify

public abstract byte classify(Sequence seq)
                       throws Exception
This method classifies a sequence and returns the index i of the class to which the sequence is assigned with 0 < i < getNumberOfClasses().

This method should check that the sequence is defined over the underlying alphabet and length.

Parameters:
seq - the sequence to be classified
Returns:
the index of the class to which the sequence is assigned
Throws:
Exception - if the classifier is not trained or something is wrong with the sequence

classify

public byte[] classify(DataSet s)
                throws Exception
This method classifies all sequences of a sample and returns an array of indices of the classes to which the respective sequences are assigned with for each index i in the array 0 < i < getNumberOfClasses().

Parameters:
s - the sample to be classified
Returns:
an array of class assignments
Throws:
Exception - if something went wrong during the classification

clone

public AbstractClassifier clone()
                         throws CloneNotSupportedException
Overrides:
clone in class Object
Throws:
CloneNotSupportedException

evaluate

public final ResultSet evaluate(PerformanceMeasureParameterSet params,
                                boolean exceptionIfNotComputeable,
                                DataSet... s)
                         throws Exception
This method evaluates the classifier and computes, for instance, the sensitivity for a given specificity, the area under the ROC curve and so on. This method should be used in any kind of ClassifierAssessment as, for instance, crossvalidation, hold out sampling, ... .

For two classes it is highly recommended to set the foreground as first class and the second class as background, i.e. the first sample should be the foreground sample and the second should be background sample. See also this comment.

Parameters:
params - the current parameters defining the set of AbstractPerformanceMeasures to be evaluated
exceptionIfNotComputeable - indicates that the method throws an Exception if a measure could not be computed
s - the array of DataSets
Returns:
a set of results, if all results are scalars the return type is NumericalResultSet, otherwise ResultSet
Throws:
Exception - if something went wrong
See Also:
NumericalResultSet, ResultSet, getResults(LinkedList, DataSet[], PerformanceMeasureParameterSet, boolean), ClassifierAssessment, ClassifierAssessment.assess(de.jstacs.classifiers.performanceMeasures.NumericalPerformanceMeasureParameterSet, de.jstacs.classifiers.assessment.ClassifierAssessmentAssessParameterSet, DataSet...)

getResults

protected boolean getResults(LinkedList list,
                             DataSet[] s,
                             PerformanceMeasureParameterSet params,
                             boolean exceptionIfNotComputeable)
                      throws Exception
This method computes the results for any evaluation of the classifier.

Parameters:
list - a list adding the results
s - the array of DataSets
params - the current parameters
exceptionIfNotComputeable - indicates the method throws an Exception if a measure could not be computed
Returns:
a boolean indicating if all results are numerical
Throws:
Exception - if something went wrong
See Also:
evaluate(PerformanceMeasureParameterSet, boolean, DataSet...), NumericalResult, Result

getMultiClassScores

protected double[][][] getMultiClassScores(DataSet[] s)
                                    throws Exception
This method returns a multidimensional array with class specific scores. The first dimension is for the data set, the second for the sequences, and the third for the classes. The entry result[d][n][c] returns the score of class c for sequence n of the data set s[d]. The class with the maximum score for any sequence is the predicted class of the sequence.

Parameters:
s - the data sets
Returns:
a multidimensional array with class specific scores
Throws:
Exception - if the scores can not be computed
See Also:
getResults(LinkedList, DataSet[], PerformanceMeasureParameterSet, boolean)

getAlphabetContainer

public final AlphabetContainer getAlphabetContainer()
This method returns the container of alphabets that is used in the classifier.

Returns:
the used container of alphabets

getCharacteristics

public ResultSet getCharacteristics()
                             throws Exception
Returns some information characterizing or describing the current instance of the classifier. This could be for instance the number of edges for a Bayesian network or an image showing some representation of the model of a class. The set of characteristics should always include the XML representation of the classifier. The corresponding result type is StorableResult.

Returns:
the characteristics of the current instance of the classifier
Throws:
Exception - if some of the characteristics could not be defined
See Also:
StorableResult, getNumericalCharacteristics(), ResultSet.ResultSet(de.jstacs.results.Result[][])

getInstanceName

public abstract String getInstanceName()
Returns a short description of the classifier.

Returns:
a short description of the classifier

getClassifierAnnotation

public abstract CategoricalResult[] getClassifierAnnotation()
Returns an array of Results of dimension getNumberOfClasses() that contains information about the classifier and for each class.

res[0] = new CategoricalResult( "classifier", "the kind of classifier", getInstanceName() );
res[1] = new CategoricalResult( "class info 0", "some information about the class", "info0" );
res[2] = new CategoricalResult( "class info 1", "some information about the class", "info1" );
...

Returns:
an array of Results that contains information about the classifier

getLength

public final int getLength()
Returns the length of the sequences this classifier can handle or 0 for sequences of arbitrary length.

Returns:
the length of the sequences the classifier can handle

getNumericalCharacteristics

public abstract NumericalResultSet getNumericalCharacteristics()
                                                        throws Exception
Returns the subset of numerical values that are also returned by getCharacteristics().

Returns:
the numerical characteristics
Throws:
Exception - if some of the characteristics could not be defined

getNumberOfClasses

public abstract int getNumberOfClasses()
Returns the number of classes that can be distinguished. For example if distinguishing between foreground and background this method should return 2, even if you use a mixture model for either foreground or background.

Returns:
the number of classes that can be distinguished

isInitialized

public abstract boolean isInitialized()
This method gives information about the state of the classifier.

Returns:
true if the classifier is initialized and therefore able to classify sequences, otherwise false

train

public void train(DataSet... s)
           throws Exception
Trains the AbstractClassifier object given the data as DataSets.
This method should work non-incrementally. That means the result of the following series: train(data1); train(data2); should be a fully trained model over data2 and not over data1, data2.

This method should check that the DataSets are defined over the underlying alphabet and length.

Parameters:
s - the data
  • either an array of DataSets: train( new DataSet[]{s1,s2,s3}) or
  • an enumeration of DataSets: train(s1,s2,s3)
Throws:
Exception - if the training did not succeed
See Also:
train(DataSet[], double[][])

train

public abstract void train(DataSet[] s,
                           double[][] weights)
                    throws Exception
This method trains a classifier over an array of weighted DataSet s. That is why the following has to be fulfilled: This method should work non-incrementally as the method train(DataSet...).

This method should check that the DataSets are defined over the underlying alphabet and length.

Parameters:
s - an array of DataSets
weights - the weights for the DataSets
Throws:
Exception - if the weights are incorrect or the training did not succeed
See Also:
train(DataSet...)

getXMLTag

protected abstract String getXMLTag()
Returns the String that is used as tag for the XML representation of the classifier. This method is used by the methods fromXML(StringBuffer) and toXML().

Returns:
the String that is used as tag for the XML representation of the classifier

extractFurtherClassifierInfosFromXML

protected abstract void extractFurtherClassifierInfosFromXML(StringBuffer xml)
                                                      throws NonParsableException
Extracts further information of a classifier from an XML representation. This method is used by the method fromXML(StringBuffer) and should not be made public.

Parameters:
xml - the XML representation as StringBuffer
Throws:
NonParsableException - if the information could not be parsed out of the XML representation (the StringBuffer could not be parsed)
See Also:
fromXML(StringBuffer)

toXML

public final StringBuffer toXML()
Description copied from interface: Storable
This method returns an XML representation as StringBuffer of an instance of the implementing class.

Specified by:
toXML in interface Storable
Returns:
the XML representation

getFurtherClassifierInfos

protected abstract StringBuffer getFurtherClassifierInfos()
This method returns further information of a classifier as a StringBuffer. This method is used by the method toXML() and should not be made public.

Returns:
further information of a classifier as a StringBuffer
See Also:
toXML()