BinaryNumericSplit< FitnessFunction, ObservationType > Class Template Reference

The BinaryNumericSplit class implements the numeric feature splitting strategy devised by Gama, Rocha, and Medas in the following paper: More...

Public Types

typedef BinaryNumericSplitInfo< ObservationType > SplitInfo
 The splitting information required by the BinaryNumericSplit. More...

 

Public Member Functions

 BinaryNumericSplit (const size_t numClasses=0)
 Create the BinaryNumericSplit object with the given number of classes. More...

 
 BinaryNumericSplit (const size_t numClasses, const BinaryNumericSplit &other)
 Create the BinaryNumericSplit object with the given number of classes, using information from the given other split for other parameters. More...

 
void EvaluateFitnessFunction (double &bestFitness, double &secondBestFitness)
 Given the points seen so far, evaluate the fitness function, returning the best possible gain of a binary split. More...

 
size_t MajorityClass () const
 The majority class of the points seen so far. More...

 
double MajorityProbability () const
 The probability of the majority class given the points seen so far. More...

 
size_t NumChildren () const
 
template
<
typename
Archive
>
void serialize (Archive &ar, const uint32_t)
 Serialize the object. More...

 
void Split (arma::Col< size_t > &childMajorities, SplitInfo &splitInfo)
 Given that a split should happen, return the majority classes of the (two) children and an initialized SplitInfo object. More...

 
void Train (ObservationType value, const size_t label)
 Train on the given value with the given label. More...

 

Detailed Description


template
<
typename
FitnessFunction
,
typename
ObservationType
=
double
>

class mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >

The BinaryNumericSplit class implements the numeric feature splitting strategy devised by Gama, Rocha, and Medas in the following paper:

@inproceedings{gama2003accurate,
title={Accurate Decision Trees for Mining High-Speed Data Streams},
author={Gama, J. and Rocha, R. and Medas, P.},
year={2003},
booktitle={Proceedings of the Ninth ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining (KDD '03)},
pages={523--528}
}

This splitting procedure builds a binary tree on points it has seen so far, and then EvaluateFitnessFunction() returns the best possible split in O(n) time, where n is the number of samples seen so far. Every split with this split type returns only two splits (greater than or equal to the split point, and less than the split point). The Train() function should take O(1) time.

Template Parameters
FitnessFunctionFitness function to use for calculating gain.
ObservationTypeType of observation used by this dimension.

Definition at line 47 of file binary_numeric_split.hpp.

Member Typedef Documentation

◆ SplitInfo

typedef BinaryNumericSplitInfo<ObservationType> SplitInfo

The splitting information required by the BinaryNumericSplit.

Definition at line 51 of file binary_numeric_split.hpp.

Constructor & Destructor Documentation

◆ BinaryNumericSplit() [1/2]

BinaryNumericSplit ( const size_t  numClasses = 0)

Create the BinaryNumericSplit object with the given number of classes.

Parameters
numClassesNumber of classes in dataset.

◆ BinaryNumericSplit() [2/2]

BinaryNumericSplit ( const size_t  numClasses,
const BinaryNumericSplit< FitnessFunction, ObservationType > &  other 
)

Create the BinaryNumericSplit object with the given number of classes, using information from the given other split for other parameters.

In this case, there are no other parameters, but this function is required by the HoeffdingTree class.

Member Function Documentation

◆ EvaluateFitnessFunction()

void EvaluateFitnessFunction ( double &  bestFitness,
double &  secondBestFitness 
)

Given the points seen so far, evaluate the fitness function, returning the best possible gain of a binary split.

Note that this takes O(n) time, where n is the number of points seen so far. So this may not exactly be fast...

The best possible split will be stored in bestFitness, and the second best possible split will be stored in secondBestFitness.

Parameters
bestFitnessFitness function value for best possible split.
secondBestFitnessFitness function value for second best possible split.

◆ MajorityClass()

size_t MajorityClass ( ) const

The majority class of the points seen so far.

Referenced by BinaryNumericSplit< FitnessFunction, ObservationType >::NumChildren().

◆ MajorityProbability()

double MajorityProbability ( ) const

The probability of the majority class given the points seen so far.

Referenced by BinaryNumericSplit< FitnessFunction, ObservationType >::NumChildren().

◆ NumChildren()

◆ serialize()

void serialize ( Archive &  ar,
const uint32_t   
)

◆ Split()

void Split ( arma::Col< size_t > &  childMajorities,
SplitInfo splitInfo 
)

Given that a split should happen, return the majority classes of the (two) children and an initialized SplitInfo object.

Parameters
childMajoritiesMajority classes of the children after the split.
splitInfoSplit information.

Referenced by BinaryNumericSplit< FitnessFunction, ObservationType >::NumChildren().

◆ Train()

void Train ( ObservationType  value,
const size_t  label 
)

Train on the given value with the given label.

Parameters
valueThe value to train on.
labelThe label to train on.

The documentation for this class was generated from the following file: