de.jstacs.classifiers
Class AbstractScoreBasedClassifier

java.lang.Object
  extended by de.jstacs.classifiers.AbstractClassifier
      extended by de.jstacs.classifiers.AbstractScoreBasedClassifier
All Implemented Interfaces:
Storable, Cloneable
Direct Known Subclasses:
MappingClassifier, SamplingScoreBasedClassifier, ScoreClassifier, TrainSMBasedClassifier

public abstract class AbstractScoreBasedClassifier
extends AbstractClassifier

This class is the main class for all score based classifiers. Score based classifiers enable you to compute many different measures easily. For instance one can use the package "ROCR" in R to compute or plot many of them.

Author:
Jens Keilwagen, Jan Grau, Andre Gohr

Nested Class Summary
static class AbstractScoreBasedClassifier.DoubleTableResult
          This class is for Results given as a table of double s.
 
Constructor Summary
AbstractScoreBasedClassifier(AlphabetContainer abc, int classes)
          The constructor for a homogeneous classifier.
AbstractScoreBasedClassifier(AlphabetContainer abc, int classes, double classWeight)
          The constructor for a homogeneous classifier.
AbstractScoreBasedClassifier(AlphabetContainer abc, int length, int classes)
          The constructor for an inhomogeneous classifier.
AbstractScoreBasedClassifier(AlphabetContainer abc, int length, int classes, double classWeight)
          The constructor for an inhomogeneous classifier.
AbstractScoreBasedClassifier(StringBuffer xml)
          The standard constructor for the interface Storable.
 
Method Summary
protected  void check(DataSet s)
          This method checks if the given DataSet can be used.
protected  void check(Sequence seq)
          This method checks if the given Sequence can be used.
 byte classify(Sequence seq)
          This method classifies a sequence and returns the index i of the class to which the sequence is assigned with 0 < i < getNumberOfClasses().
protected  byte classify(Sequence seq, boolean check)
          This method classifies a Sequence.
 AbstractScoreBasedClassifier clone()
           
protected  void createDefaultClassWeights(int classes, double val)
          This method creates new class weights.
protected  void extractFurtherClassifierInfosFromXML(StringBuffer xml)
          Extracts further information of a classifier from an XML representation.
protected  double getClassWeight(int index)
          Returns the class weight for the class with a given index.
 double[] getClassWeights()
          Returns the specific class weights of a AbstractScoreBasedClassifier.
protected  StringBuffer getFurtherClassifierInfos()
          This method returns further information of a classifier as a StringBuffer.
protected  double[][][] getMultiClassScores(DataSet[] s)
          This method returns a multidimensional array with class specific scores.
 int getNumberOfClasses()
          Returns the number of classes that can be distinguished.
 double[] getPValue(DataSet candidates, DataSet bg)
          Returns the p-values for all Sequences in the DataSet candidates with respect to a given background DataSet .
 double getPValue(Sequence candidate, DataSet bg)
          Returns the p-value for a Sequence candidate with respect to a given background DataSet.
protected  boolean getResults(LinkedList list, DataSet[] s, double[][] weights, AbstractPerformanceMeasureParameterSet<? extends PerformanceMeasure> params, boolean exceptionIfNotComputeable)
          This method computes the results for any evaluation of the classifier.
 double getScore(Sequence seq, int i)
          This method returns the score for a given Sequence and a given class.
protected abstract  double getScore(Sequence seq, int i, boolean check)
          This method returns the score for a given Sequence and a given class.
 double[] getScores(DataSet s)
          This method returns the scores of the classifier for any Sequence in the DataSet.
 void setClassWeights(boolean add, double... weights)
          Sets new class weights.
protected  void setClassWeights(boolean add, double[] weights, int start)
          Sets new class weights.
 void setThresholdClassWeights(boolean add, double t)
          Sets a new threshold for 2-class-classifiers.
Only available if this AbstractScoreBasedClassifier distinguishes between 2 classes 0 and 1.
 
Methods inherited from class de.jstacs.classifiers.AbstractClassifier
classify, evaluate, evaluate, getAlphabetContainer, getCharacteristics, getClassifierAnnotation, getInstanceName, getLength, getNumericalCharacteristics, getXMLTag, isInitialized, toXML, train, train
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

AbstractScoreBasedClassifier

public AbstractScoreBasedClassifier(AlphabetContainer abc,
                                    int classes)
The constructor for a homogeneous classifier. Such a classifier can handle sequences of arbitrary length. The class weights are set initially to 0.

Parameters:
abc - the alphabets that are used
classes - the number of different classes
See Also:
AbstractScoreBasedClassifier(AlphabetContainer, int, int, double)

AbstractScoreBasedClassifier

public AbstractScoreBasedClassifier(AlphabetContainer abc,
                                    int classes,
                                    double classWeight)
The constructor for a homogeneous classifier. Such a classifier can handle sequences of arbitrary length. The class weights are set initially to classWeight.

Parameters:
abc - the alphabets that are used
classes - the number of different classes
classWeight - the value of all class weights
See Also:
AbstractScoreBasedClassifier(AlphabetContainer, int, int, double)

AbstractScoreBasedClassifier

public AbstractScoreBasedClassifier(AlphabetContainer abc,
                                    int length,
                                    int classes)
The constructor for an inhomogeneous classifier. Such a classifier can handle sequences of fixed length. The class weights are set initially to 0.

Parameters:
abc - the alphabets that are used
length - the length of the sequences that can be classified
classes - the number of different classes
See Also:
AbstractScoreBasedClassifier(AlphabetContainer, int, int, double)

AbstractScoreBasedClassifier

public AbstractScoreBasedClassifier(AlphabetContainer abc,
                                    int length,
                                    int classes,
                                    double classWeight)
                             throws IllegalArgumentException
The constructor for an inhomogeneous classifier. Such a classifier can handle sequences of fixed length. The class weights are set initially to classWeight.

Parameters:
abc - the alphabets that are used
length - the length of the sequences that can be classified
classes - the number of different classes
classWeight - the value of all class weights
Throws:
IllegalArgumentException - if the length and the possible length of the AlphabetContainer does not match or the number of classes is less than 2
See Also:
AbstractClassifier.AbstractClassifier(AlphabetContainer, int)

AbstractScoreBasedClassifier

public AbstractScoreBasedClassifier(StringBuffer xml)
                             throws NonParsableException
The standard constructor for the interface Storable. Creates a new AbstractScoreBasedClassifier out of its XML representation.

Parameters:
xml - the XML representation as StringBuffer
Throws:
NonParsableException - if the AbstractScoreBasedClassifier could not be reconstructed out of the XML representation (the StringBuffer could not be parsed)
See Also:
Storable, AbstractClassifier.AbstractClassifier(StringBuffer)
Method Detail

clone

public AbstractScoreBasedClassifier clone()
                                   throws CloneNotSupportedException
Overrides:
clone in class AbstractClassifier
Throws:
CloneNotSupportedException

classify

public byte classify(Sequence seq)
              throws Exception
Description copied from class: AbstractClassifier
This method classifies a sequence and returns the index i of the class to which the sequence is assigned with 0 < i < getNumberOfClasses().

This method should check that the sequence is defined over the underlying alphabet and length.

Specified by:
classify in class AbstractClassifier
Parameters:
seq - the sequence to be classified
Returns:
the index of the class to which the sequence is assigned
Throws:
Exception - if the classifier is not trained or something is wrong with the sequence

getMultiClassScores

protected double[][][] getMultiClassScores(DataSet[] s)
                                    throws Exception
Description copied from class: AbstractClassifier
This method returns a multidimensional array with class specific scores. The first dimension is for the data set, the second for the sequences, and the third for the classes. The entry result[d][n][c] returns the score of class c for sequence n of the data set s[d]. The class with the maximum score for any sequence is the predicted class of the sequence.

Overrides:
getMultiClassScores in class AbstractClassifier
Parameters:
s - the data sets
Returns:
a multidimensional array with class specific scores
Throws:
Exception - if the scores can not be computed
See Also:
AbstractClassifier.getResults(LinkedList, DataSet[], double[][], AbstractPerformanceMeasureParameterSet, boolean)

getResults

protected boolean getResults(LinkedList list,
                             DataSet[] s,
                             double[][] weights,
                             AbstractPerformanceMeasureParameterSet<? extends PerformanceMeasure> params,
                             boolean exceptionIfNotComputeable)
                      throws Exception
Description copied from class: AbstractClassifier
This method computes the results for any evaluation of the classifier.

Overrides:
getResults in class AbstractClassifier
Parameters:
list - a list adding the results
s - the array of DataSets
weights - the weights of the sequences for each data set
params - the current parameters
exceptionIfNotComputeable - indicates the method throws an Exception if a measure could not be computed
Returns:
a boolean indicating if all results are numerical
Throws:
Exception - if something went wrong
See Also:
AbstractClassifier.evaluate(AbstractPerformanceMeasureParameterSet, boolean, DataSet[], double[][]), NumericalResult, Result

getClassWeights

public double[] getClassWeights()
Returns the specific class weights of a AbstractScoreBasedClassifier.

Returns:
the class weights of the classifier

getNumberOfClasses

public int getNumberOfClasses()
Description copied from class: AbstractClassifier
Returns the number of classes that can be distinguished. For example if distinguishing between foreground and background this method should return 2, even if you use a mixture model for either foreground or background.

Specified by:
getNumberOfClasses in class AbstractClassifier
Returns:
the number of classes that can be distinguished

getScore

public double getScore(Sequence seq,
                       int i)
                throws Exception
This method returns the score for a given Sequence and a given class.

Parameters:
seq - the given Sequence
i - the index of the class
Returns:
the score for a given Sequence and a given class
Throws:
Exception - if something went wrong
See Also:
getScore(Sequence, int, boolean)

setClassWeights

public final void setClassWeights(boolean add,
                                  double... weights)
                           throws ClassDimensionException
Sets new class weights.
The logarithmic probabilities of an item i given class 0 to class n are computed to classify this item into a class. The class weights are added to each of these logarithmic probabilities. As higher (relational) the class weight of class j, as more probable it becomes, that any item is classified into this class.
Class weights do not have to be logarithmic probabilities. If $\sum_{j=0}^n \exp(classWeight_j) \stackrel{!}{=}1$, the class weights may be interpreted as logarithmic class-a-priori-probabilities.

Parameters:
add - indicates if the class weights are added to the current class weights
weights - the array of weights, for each class the weight that is added in a classification
Throws:
ClassDimensionException - if something is wrong with the number of classes

setClassWeights

protected final void setClassWeights(boolean add,
                                     double[] weights,
                                     int start)
Sets new class weights.

Only for internal use.

Parameters:
add - indicates if the class weights are added to the current class weights
weights - an array of weights that might have more entries than the classifier has classes
start - the start index
See Also:
setClassWeights(boolean, double...)

setThresholdClassWeights

public final void setThresholdClassWeights(boolean add,
                                           double t)
                                    throws OperationNotSupportedException
Sets a new threshold for 2-class-classifiers.
Only available if this AbstractScoreBasedClassifier distinguishes between 2 classes 0 and 1. In this case, t will be interpreted as $\log\left(\frac{P(class1)}{P(class0)}\right)$. A large t (greater than 0) makes the classifier to decide more often for class 1. A small t (smaller than 0) makes the classifier to decide more often for class 0.

Parameters:
add - indicates if the class weights are added to the current class weights
t - the new threshold
Throws:
OperationNotSupportedException - if the classifier is no 2-class-classifier

getFurtherClassifierInfos

protected StringBuffer getFurtherClassifierInfos()
Description copied from class: AbstractClassifier
This method returns further information of a classifier as a StringBuffer. This method is used by the method AbstractClassifier.toXML() and should not be made public.

Specified by:
getFurtherClassifierInfos in class AbstractClassifier
Returns:
further information of a classifier as a StringBuffer
See Also:
AbstractClassifier.toXML()

check

protected void check(DataSet s)
              throws NotTrainedException,
                     IllegalArgumentException
This method checks if the given DataSet can be used.

Parameters:
s - the DataSet to be checked
Throws:
NotTrainedException - if the classifier is not trained
IllegalArgumentException - if something is wrong with the DataSet s
See Also:
AlphabetContainer.checkConsistency(AlphabetContainer)

check

protected void check(Sequence seq)
              throws NotTrainedException,
                     IllegalArgumentException
This method checks if the given Sequence can be used.

Parameters:
seq - the Sequence to be checked
Throws:
NotTrainedException - if the classifier is not trained
IllegalArgumentException - if something is wrong with the Sequence seq

classify

protected byte classify(Sequence seq,
                        boolean check)
                 throws Exception
This method classifies a Sequence. It enables you to check the constraints (alphabets, length, AbstractClassifier.isInitialized() ).

Parameters:
seq - the Sequence
check - indicates if the constraints will be checked
Returns:
the index of the class the Sequence is assigned to
Throws:
Exception - if something went wrong
See Also:
check(Sequence)

createDefaultClassWeights

protected void createDefaultClassWeights(int classes,
                                         double val)
                                  throws IllegalArgumentException
This method creates new class weights. Each class weight has the same value val. So the class weights do not have any influence on the classification.

Parameters:
classes - the number of different classes
val - the value that is used for all classes
Throws:
IllegalArgumentException - if the number of classes is below 2

extractFurtherClassifierInfosFromXML

protected void extractFurtherClassifierInfosFromXML(StringBuffer xml)
                                             throws NonParsableException
Description copied from class: AbstractClassifier
Extracts further information of a classifier from an XML representation. This method is used by the method AbstractClassifier.fromXML(StringBuffer) and should not be made public.

Specified by:
extractFurtherClassifierInfosFromXML in class AbstractClassifier
Parameters:
xml - the XML representation as StringBuffer
Throws:
NonParsableException - if the information could not be parsed out of the XML representation (the StringBuffer could not be parsed)
See Also:
AbstractClassifier.fromXML(StringBuffer)

getClassWeight

protected double getClassWeight(int index)
Returns the class weight for the class with a given index.

Parameters:
index - the given index of the class
Returns:
the class weight

getScore

protected abstract double getScore(Sequence seq,
                                   int i,
                                   boolean check)
                            throws IllegalArgumentException,
                                   NotTrainedException,
                                   Exception
This method returns the score for a given Sequence and a given class.

Parameters:
seq - the Sequence
i - the index of the class
check - the switch to decide whether to check AlphabetContainer and the length of the Sequence or not
Returns:
the score for a given Sequence and a given class
Throws:
NotTrainedException - if the classifier is not trained
IllegalArgumentException - if something is wrong with the Sequence seq
Exception - if something went wrong

getScores

public double[] getScores(DataSet s)
                   throws Exception
This method returns the scores of the classifier for any Sequence in the DataSet. The scores are stored in the array according to the index of the Sequence in the DataSet.

Only for 2-class-classifiers.

Parameters:
s - the DataSet
Returns:
the array of scores
Throws:
Exception - if something went wrong

getPValue

public double getPValue(Sequence candidate,
                        DataSet bg)
                 throws Exception
Returns the p-value for a Sequence candidate with respect to a given background DataSet.

The p-value is the percentage of background Sequences that have a score that is at least as high as the score for the candidate.

It is not recommended to use this method in a for-loop. In such cases one should use the method that works on two DataSets.

Parameters:
candidate - the candidate Sequence
bg - the background DataSet
Returns:
the p-value for the Sequence candidate
Throws:
Exception - if something went wrong
See Also:
getPValue(DataSet, DataSet), PValueComputation.getPValue(double[], double)

getPValue

public double[] getPValue(DataSet candidates,
                          DataSet bg)
                   throws Exception
Returns the p-values for all Sequences in the DataSet candidates with respect to a given background DataSet .

The p-value is the percentage of background Sequences that have a score that is at least as high as the score for a Sequence in candidates.

Parameters:
candidates - the DataSet with candidate sequences
bg - the background data set
Returns:
the p-values for all sequences in candidates
Throws:
Exception - if something went wrong
See Also:
getPValue(Sequence, DataSet), PValueComputation.getPValue(double[], double)