Interface DataSet


public interface DataSet
Interface representing a list of instances used for learning and/or evaluation. A data set also has to have attributes represented as Attribute objects.
  • Method Summary

    Modifier and Type
    Method
    Description
    boolean
    add(Instance instance)
    Adds a new Instance to this data set
    <T> List<Attribute<T>>
    Returns the list of all attributes (inputs and output) of this data set.
    Returns a data set containing only the instances of this data set which fulfill the passed filter.
    Returns an Iterable with this data set as its source.
    boolean
    Returns true if this data set contains no instances.
    <T> Attribute<T>
    Get the output attribute of the attributes of this DataSet, i.e. the attribute which predictions shall be learned.
    boolean
    remove(Instance instance)
    Removes an Instance from this data set
    void
    shuffle(Random random)
    Randomly permute the entries of this data set using the specified source of randomness.
    int
    Returns the number of instances in this data set.
    split(double fraction)
    Splits the set of instances into a set for training and a set for testing according to the passed fraction.
    split(int numFolds, int testIndex)
    Splits the set of instances into a set for training and a set for testing according to the passed number of folds and the index determining which one shall be used for testing.
    Returns a sequential Stream with this data set as its source.
  • Method Details

    • outputAttribute

      <T> Attribute<T> outputAttribute()
      Get the output attribute of the attributes of this DataSet, i.e. the attribute which predictions shall be learned.
      Returns:
      the Attribute representing the output attribute of this data set
    • attributes

      <T> List<Attribute<T>> attributes()
      Returns the list of all attributes (inputs and output) of this data set.
      Returns:
      a list of Attribute of this DataSet object
    • add

      boolean add(Instance instance)
      Adds a new Instance to this data set
      Parameters:
      instance - the instance to add
      Returns:
      true if this data set changed as a result of the call
    • remove

      boolean remove(Instance instance)
      Removes an Instance from this data set
      Parameters:
      instance - the instance to remove
      Returns:
      true if the instance was removed, false otherwise
    • size

      int size()
      Returns the number of instances in this data set.
      Returns:
      the number of instances in this data set
    • isEmpty

      boolean isEmpty()
      Returns true if this data set contains no instances.
      Returns:
      true if this data set contains no instances
    • instances

      Iterable<Instance> instances()
      Returns an Iterable with this data set as its source.
      Returns:
      an Iterable of Instances
    • stream

      Stream<Instance> stream()
      Returns a sequential Stream with this data set as its source.
      Returns:
      a Stream of Instances
    • shuffle

      void shuffle(Random random)
      Randomly permute the entries of this data set using the specified source of randomness.
      Parameters:
      random - the source of randomness to use to shuffle the entries
    • split

      TrainTestSets split(double fraction)
      Splits the set of instances into a set for training and a set for testing according to the passed fraction.
      Parameters:
      fraction - the fraction for the training set, the rest will be add to the test set.
      Returns:
      a TrainTestSets of train and test data
    • split

      TrainTestSets split(int numFolds, int testIndex)
      Splits the set of instances into a set for training and a set for testing according to the passed number of folds and the index determining which one shall be used for testing.
      Parameters:
      numFolds - number of folds to apply to data set
      testIndex - represents which part is used for testing
      Returns:
      a TrainTestSets of train and test data.
    • filtered

      DataSet filtered(Predicate<Instance> filter)
      Returns a data set containing only the instances of this data set which fulfill the passed filter.
      Parameters:
      filter - the predicate which must be fulfilled by a given instance to be part of the result
      Returns:
      a filtered data set