HoeffdingCategoricalSplit< FitnessFunction > Class Template Reference

This is the standard Hoeffding-bound categorical feature proposed in the paper below: More...

Public Types

typedef CategoricalSplitInfo SplitInfo
 The type of split information required by the HoeffdingCategoricalSplit. More...

 

Public Member Functions

 HoeffdingCategoricalSplit (const size_t numCategories=0, const size_t numClasses=0)
 Create the HoeffdingCategoricalSplit given a number of categories for this dimension and a number of classes. More...

 
 HoeffdingCategoricalSplit (const size_t numCategories, const size_t numClasses, const HoeffdingCategoricalSplit &other)
 Create the HoeffdingCategoricalSplit given a number of categories for this dimension and a number of classes and another HoeffdingCategoricalSplit to take parameters from. More...

 
void EvaluateFitnessFunction (double &bestFitness, double &secondBestFitness) const
 Given the points seen so far, evaluate the fitness function, returning the gain for the best possible split and the second best possible split. More...

 
size_t MajorityClass () const
 Get the majority class seen so far. More...

 
double MajorityProbability () const
 Get the probability of the majority class given the points seen so far. More...

 
size_t NumChildren () const
 Return the number of children, if the node were to split. More...

 
template
<
typename
Archive
>
void serialize (Archive &ar, const uint32_t)
 Serialize the categorical split. More...

 
void Split (arma::Col< size_t > &childMajorities, SplitInfo &splitInfo)
 Gather the information for a split: get the labels of the child majorities, and initialize the SplitInfo object. More...

 
template
<
typename
eT
>
void Train (eT value, const size_t label)
 Train on the given value with the given label. More...

 

Detailed Description


template
<
typename
FitnessFunction
>

class mlpack::tree::HoeffdingCategoricalSplit< FitnessFunction >

This is the standard Hoeffding-bound categorical feature proposed in the paper below:

@inproceedings{domingos2000mining,
title={{Mining High-Speed Data Streams}},
author={Domingos, P. and Hulten, G.},
year={2000},
booktitle={Proceedings of the Sixth ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining (KDD '00)},
pages={71--80}
}

This class will track the sufficient statistics of the training points it has seen. The HoeffdingSplit class (and other related classes) can use this class to track categorical features and split decision tree nodes.

Template Parameters
FitnessFunctionFitness function to use for calculating gain.

Definition at line 44 of file hoeffding_categorical_split.hpp.

Member Typedef Documentation

◆ SplitInfo

The type of split information required by the HoeffdingCategoricalSplit.

Definition at line 48 of file hoeffding_categorical_split.hpp.

Constructor & Destructor Documentation

◆ HoeffdingCategoricalSplit() [1/2]

HoeffdingCategoricalSplit ( const size_t  numCategories = 0,
const size_t  numClasses = 0 
)

Create the HoeffdingCategoricalSplit given a number of categories for this dimension and a number of classes.

Parameters
numCategoriesNumber of categories in this dimension.
numClassesNumber of classes in this dimension.

◆ HoeffdingCategoricalSplit() [2/2]

HoeffdingCategoricalSplit ( const size_t  numCategories,
const size_t  numClasses,
const HoeffdingCategoricalSplit< FitnessFunction > &  other 
)

Create the HoeffdingCategoricalSplit given a number of categories for this dimension and a number of classes and another HoeffdingCategoricalSplit to take parameters from.

In this particular case, there are no parameters to take, but this constructor is required by the HoeffdingTree class.

Member Function Documentation

◆ EvaluateFitnessFunction()

void EvaluateFitnessFunction ( double &  bestFitness,
double &  secondBestFitness 
) const

Given the points seen so far, evaluate the fitness function, returning the gain for the best possible split and the second best possible split.

In this splitting technique, we only split one possible way, so secondBestFitness will always be 0.

Parameters
bestFitnessThe fitness function result for this split.
secondBestFitnessThis is always set to 0 (this split only splits one way).

◆ MajorityClass()

size_t MajorityClass ( ) const

Get the majority class seen so far.

Referenced by HoeffdingCategoricalSplit< FitnessFunction >::NumChildren().

◆ MajorityProbability()

double MajorityProbability ( ) const

Get the probability of the majority class given the points seen so far.

Referenced by HoeffdingCategoricalSplit< FitnessFunction >::NumChildren().

◆ NumChildren()

size_t NumChildren ( ) const
inline

◆ serialize()

void serialize ( Archive &  ar,
const uint32_t   
)
inline

Serialize the categorical split.

Definition at line 111 of file hoeffding_categorical_split.hpp.

◆ Split()

void Split ( arma::Col< size_t > &  childMajorities,
SplitInfo splitInfo 
)

Gather the information for a split: get the labels of the child majorities, and initialize the SplitInfo object.

Parameters
childMajoritiesMajorities of child nodes to be created.
splitInfoInformation for splitting.

Referenced by HoeffdingCategoricalSplit< FitnessFunction >::NumChildren().

◆ Train()

void Train ( eT  value,
const size_t  label 
)

Train on the given value with the given label.

Parameters
valueValue to train on.
labelLabel to train on.

The documentation for this class was generated from the following file: