|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectde.jstacs.data.Sample
public class Sample
This is the class for any sample of sequences. All sequences in a sample have
to have the same AlphabetContainer. The sequences may have different
lengths.
For the internal representation the class Sequence is used, where the
external alphabet is converted to integral numerical values. The class
Sample knows about this coding via instances of class
AlphabetContainer respectively Alphabet.
There are different ways to access the elements of a
Sample. If one needs random access there's the method getElementAt( int i ). For fast sequential access it
is recommended to use an ElementEnumerator.
Sample is immutable.
AlphabetContainer,
Sequence| Nested Class Summary | |
|---|---|
static class |
Sample.ElementEnumerator
This class can be used to have a fast sequential access to a sample. |
static class |
Sample.PartitionMethod
This enum defines different partition methods for a sample. |
static class |
Sample.WeightedSampleFactory
This class enables you to eliminate sequences that occur more than once in one or more samples. |
| Constructor Summary | |
|---|---|
Sample(AlphabetContainer abc,
AbstractStringExtractor se)
Creates a Sample from a StringExtractor
using the given AlphabetContainer. |
|
Sample(AlphabetContainer abc,
AbstractStringExtractor se,
int subsequenceLength)
Creates a Sample from a StringExtractor
using the given AlphabetContainer and all overlapping windows of
subsequenceLength. |
|
Sample(AlphabetContainer abc,
AbstractStringExtractor se,
String delim)
Creates a Sample from a StringExtractor
using the given AlphabetContainer and delimiter. |
|
Sample(AlphabetContainer abc,
AbstractStringExtractor se,
String delim,
int subsequenceLength)
Creates a Sample from a StringExtractor
using the given AlphabetContainer, the given delimiter and all
overlapping windows of subsequenceLength. |
|
Sample(Sample s,
int subsequenceLength)
This constructor enables you to use subsequences of the elements of a sample. |
|
Sample(String annotation,
Sequence... seqs)
This constructor is specially designed for the method Model.emitSample(int, int...). |
|
| Method Summary | |
|---|---|
Sequence[] |
getAllElements()
Returns an array of sequences containing all elements of this Sample. |
AlphabetContainer |
getAlphabetContainer()
Returns the AlphabetContainer of this Sample. |
String |
getAnnotation()
This method returns some annotation of the Sample. |
static String |
getAnnotation(Sample... s)
Returns the annotation for an array of Samples. |
Sample |
getCompositeSample(int[] starts,
int[] lengths)
This method enables you to use only composite sequences of all elements in the current sample. |
Sequence |
getElementAt(int i)
This method returns the element with index i. |
int |
getElementLength()
Returns the length of the elements in this Sample. |
Sample |
getInfixSample(int start,
int length)
This method enables you to use only an infix of all elements in the current sample. |
int |
getMaximalElementLength()
Returns the maximal length of an element in this Sample. |
int |
getMinimalElementLength()
Returns the minimal length of an element in this Sample. |
int |
getNumberOfElements()
Returns the number of elements in this Sample. |
int |
getNumberOfElementsWithLength(int len)
Returns the number of overlapping elements that can be extracted. |
Sample |
getSuffixSample(int start)
This method enables you to use only a suffix of all elements in the current sample. |
static Sample |
intersection(Sample... samples)
This method computes the intersection between all elements of the array, i.e. it returns a Sample containing only sequences that are
contained in all Samples of the array. |
boolean |
isDiscreteSample()
This method returns true if all positions use discrete
values. |
boolean |
isSimpleSample()
This method answers the question whether all random variables are defined over the same range, i.e. all positions use the same (fixed) alphabet. |
Sample[] |
partition(double p,
Sample.PartitionMethod method,
int subsequenceLength)
This method partitions the elements of the sample in 2
distinct parts. |
Sample[] |
partition(int k,
Sample.PartitionMethod method)
This method partitions the elements of the sample in k
distinct parts. |
Sample[] |
partition(Sample.PartitionMethod method,
double... percentage)
This method partitions the elements of the sample in distinct parts. |
void |
save(String msg,
File f)
This method writes a message msg and the sample to a file
f. |
Sample |
subSampling(int number)
Randomly samples elements (sequences) from the set of all elements (sequences) contained in this Sample. |
String |
toString()
|
static Sample |
union(Sample... s)
Unites all samples in s. |
static Sample |
union(Sample[] s,
boolean[] in)
This method unites all Samples from s regarding
in. |
static Sample |
union(Sample[] s,
boolean[] in,
int subsequenceLength)
This method unites all Samples from s regarding
in and sets the element length in the united sample to
subsequenceLength. |
static Sample |
union(Sample[] s,
int subsequenceLength)
This method unites all Samples from s and sets the
element length in the united sample to subsequenceLength. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Constructor Detail |
|---|
public Sample(AlphabetContainer abc,
AbstractStringExtractor se)
throws WrongAlphabetException,
EmptySampleException,
WrongLengthException
Sample from a StringExtractor
using the given AlphabetContainer.
abc - the AlphabetContainerse - the StringExtractor
WrongAlphabetException - if the AlphabetContainer is not suitable
EmptySampleException - if the Sample would be empty
WrongLengthException - never happens
public Sample(AlphabetContainer abc,
AbstractStringExtractor se,
int subsequenceLength)
throws WrongAlphabetException,
WrongLengthException,
EmptySampleException
Sample from a StringExtractor
using the given AlphabetContainer and all overlapping windows of
subsequenceLength.
abc - the AlphabetContainerse - the StringExtractorsubsequenceLength - the length of the window, sliding on the String of
se. If len is 0 (zero) than the
sequences are used as given from the
StringExtractor
WrongAlphabetException - if the AlphabetContainer is not suitable
WrongLengthException - if the subsequence length is not supported
EmptySampleException - if the Sample would be empty
public Sample(AlphabetContainer abc,
AbstractStringExtractor se,
String delim)
throws WrongAlphabetException,
EmptySampleException,
WrongLengthException
Sample from a StringExtractor
using the given AlphabetContainer and delimiter.
abc - the AlphabetContainerse - the StringExtractordelim - the delimiter for parsing the Strings
WrongAlphabetException - if the AlphabetContainer is not suitable
EmptySampleException - if the Sample would be empty
WrongLengthException - never happens
public Sample(AlphabetContainer abc,
AbstractStringExtractor se,
String delim,
int subsequenceLength)
throws EmptySampleException,
WrongAlphabetException,
WrongLengthException
Sample from a StringExtractor
using the given AlphabetContainer, the given delimiter and all
overlapping windows of subsequenceLength.
abc - the AlphabetContainerse - the StringExtractordelim - the delimiter for parsing the StringssubsequenceLength - the length of the window, sliding on the String of
se. If len is 0 (zero) than the
sequences are used as given from the
StringExtractor
WrongAlphabetException - if the AlphabetContainer is not suitable
EmptySampleException - if the Sample would be empty
WrongLengthException - if the subsequence length is not supported
public Sample(Sample s,
int subsequenceLength)
throws WrongLengthException
getElementAt(int) are real objects and do not have to be created
at the invocation of the method. (The same holds for the
Sample.ElementEnumerator. In those cases both ways to access the
sequence are approximately equally fast.)
s - the samplesubsequenceLength - the new element length
WrongLengthException - if something is wrong with subsequenceLength
public Sample(String annotation,
Sequence... seqs)
throws EmptySampleException,
IllegalArgumentException
Model.emitSample(int, int...).
annotation - the annotation of the sampleseqs - the sequence(s)
EmptySampleException - if the array seqs is null or the
length is 0
IllegalArgumentException - if the alphabets do not match| Method Detail |
|---|
public static final String getAnnotation(Sample... s)
Samples.
s - an array of Samples
public static final Sample intersection(Sample... samples)
throws IllegalArgumentException,
EmptySampleException
Sample containing only sequences that are
contained in all Samples of the array.
samples - the array
IllegalArgumentException - if the elements of the array are from different domains
EmptySampleException - if the intersection is empty
public static final Sample union(Sample[] s,
boolean[] in)
throws IllegalArgumentException,
EmptySampleException
Samples from s regarding
in.
s - the Samplesin - an array indicating which sample is used in the union, if
in[i]==true the sample s[i] is used
IllegalArgumentException - if s.length != in.length or the alphabets do not
match
EmptySampleException - if the union is empty
public static final Sample union(Sample... s)
throws IllegalArgumentException
s.
s - the samples
IllegalArgumentException - if the alphabets do not matchunion(Sample[], boolean[])
public static final Sample union(Sample[] s,
boolean[] in,
int subsequenceLength)
throws IllegalArgumentException,
EmptySampleException,
WrongLengthException
Samples from s regarding
in and sets the element length in the united sample to
subsequenceLength.
s - the Samplesin - an array indicating which sample is used in the union, if
in[i]==true the sample s[i] is usedsubsequenceLength - the length of the elements in the united sample
IllegalArgumentException - if s.length != in.length or the alphabets do not
match
EmptySampleException - if the union is empty
WrongLengthException - if the united sample does not support this
subsequenceLength
public static final Sample union(Sample[] s,
int subsequenceLength)
throws IllegalArgumentException,
WrongLengthException
Samples from s and sets the
element length in the united sample to subsequenceLength.
s - the SamplessubsequenceLength - the length of the elements in the united sample
IllegalArgumentException - if the alphabets do not match
WrongLengthException - if the united sample does not support this
subsequenceLengthunion(Sample[], boolean[], int)public Sequence[] getAllElements()
Sample.
Samplepublic final AlphabetContainer getAlphabetContainer()
AlphabetContainer of this Sample.
AlphabetContainer of this Samplepublic final String getAnnotation()
Sample.
Sample
public final Sample getCompositeSample(int[] starts,
int[] lengths)
throws IllegalArgumentException
starts - the start positionslengths - the lengths of the chunks
IllegalArgumentException - if either starts or lengths or both
in combination are not suitablepublic Sequence getElementAt(int i)
i. See also this comment.
i - the index
public int getElementLength()
Sample.
Sample
public final Sample getInfixSample(int start,
int length)
throws IllegalArgumentException
start - the start position of the infixlength - the length of the infix, has to be positive
IllegalArgumentException - if either start or length or both
in combination are not suitablepublic int getMinimalElementLength()
Sample.
Samplepublic int getMaximalElementLength()
Sample.
Samplepublic int getNumberOfElements()
Sample.
Sample
public int getNumberOfElementsWithLength(int len)
throws WrongLengthException
len - the length of the elements
WrongLengthException - if the given length is bigger than the minimal element length
public final Sample getSuffixSample(int start)
throws IllegalArgumentException
start - the start position of the suffix
IllegalArgumentException - if start is not suitablepublic final boolean isSimpleSample()
true if the sample is simplepublic final boolean isDiscreteSample()
true if all positions use discrete
values.
true if the sample is discrete
public Sample[] partition(double p,
Sample.PartitionMethod method,
int subsequenceLength)
throws WrongLengthException,
UnsupportedOperationException,
EmptySampleException
2
distinct parts. The second part (test sample) holds the percentage of
p, the first the rest (train sample). The first part has
element length as the current sample, the second has element length
subsequenceLength.
p - the percentage for the second part, the second part holds at
least this percentage of the full samplemethod - the method how to partition the sample (partitioning
criterion)subsequenceLength - the element length of the second part. If len is
0 (zero) than the sequences are used as given in this
Sample.
WrongLengthException - if some is wrong with subsequenceLength
UnsupportedOperationException - if the sample is not simple
EmptySampleException - if at least one of the created partitions is emptySample.PartitionMethod.PARTITION_BY_NUMBER_OF_ELEMENTS,
Sample.PartitionMethod.PARTITION_BY_NUMBER_OF_SYMBOLS
public Sample[] partition(Sample.PartitionMethod method,
double... percentage)
throws IllegalArgumentException,
EmptySampleException
method - the method how to partition the sample (partitioning
criterion)percentage - the array of percentages for each "subsample"
IllegalArgumentException - if something with the percentages is not correct (sum != 1 or
one value is not in [0,1])
EmptySampleException - if at least one of the created partitions is emptySample.PartitionMethod.PARTITION_BY_NUMBER_OF_ELEMENTS,
Sample.PartitionMethod.PARTITION_BY_NUMBER_OF_SYMBOLS
public Sample[] partition(int k,
Sample.PartitionMethod method)
throws IllegalArgumentException,
EmptySampleException
k
distinct parts.
k - the number of partsmethod - how to split the data
IllegalArgumentException - if k is not correct
EmptySampleException - if at least one of the created partitions is emptySample.PartitionMethod.PARTITION_BY_NUMBER_OF_ELEMENTS,
Sample.PartitionMethod.PARTITION_BY_NUMBER_OF_SYMBOLS
public Sample subSampling(int number)
throws EmptySampleException
Sample. Sample is chosen to contain overlapping
elements (windows of length subsequenceLength) or not, those
elements (overlapping windows, whole sequences) are subsampled.
number - of Sequences that should be drawn from the contained
set of sequences (with replacement)
Sample containing the drawn Sequences
EmptySampleException - if number is not positive
public final void save(String msg,
File f)
throws IOException
msg and the sample to a file
f.
msg - the message, any informationf - the File
IOException - if something went wrong with the filepublic String toString()
toString in class Object
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||