The HoeffdingNumericSplit class implements the numeric feature splitting strategy alluded to by Domingos and Hulten in the following paper: More...
Public Types | |
| typedef NumericSplitInfo< ObservationType > | SplitInfo |
| The splitting information type required by the HoeffdingNumericSplit. More... | |
Public Member Functions | |
| HoeffdingNumericSplit (const size_t numClasses=0, const size_t bins=10, const size_t observationsBeforeBinning=100) | |
| Create the HoeffdingNumericSplit class, and specify some basic parameters about how the binning should take place. More... | |
| HoeffdingNumericSplit (const size_t numClasses, const HoeffdingNumericSplit &other) | |
| Create the HoeffdingNumericSplit class, using the parameters from the given other split object. More... | |
| size_t | Bins () const |
| Return the number of bins. More... | |
| void | EvaluateFitnessFunction (double &bestFitness, double &secondBestFitness) const |
| Evaluate the fitness function given what has been calculated so far. More... | |
| size_t | MajorityClass () const |
| Return the majority class. More... | |
| double | MajorityProbability () const |
| Return the probability of the majority class. More... | |
| size_t | NumChildren () const |
| Return the number of children if this node splits on this feature. More... | |
template < typename Archive > | |
| void | serialize (Archive &ar, const uint32_t) |
| Serialize the object. More... | |
| void | Split (arma::Col< size_t > &childMajorities, SplitInfo &splitInfo) const |
| Return the majority class of each child to be created, if a split on this dimension was performed. More... | |
| void | Train (ObservationType value, const size_t label) |
| Train the HoeffdingNumericSplit on the given observed value (remember that this object only cares about the information for a single feature, not an entire point). More... | |
The HoeffdingNumericSplit class implements the numeric feature splitting strategy alluded to by Domingos and Hulten in the following paper:
The strategy alluded to is very simple: we discretize the numeric features that we see. But in this case, we don't know how many bins we have, which makes things a little difficult. This class only makes binary splits, and has a maximum number of bins. The binning strategy is simple: the split caches the minimum and maximum value of points seen so far, and when the number of points hits a predefined threshold, the cached minimum-maximum range is equally split into bins, and splitting proceeds in the same way as with the categorical splits. This is a simple and stupid strategy, so don't expect it to be the best possible thing you can do.
| FitnessFunction | Fitness function to use for calculating gain. |
| ObservationType | Type of observations in this dimension. |
Definition at line 53 of file hoeffding_numeric_split.hpp.
| typedef NumericSplitInfo<ObservationType> SplitInfo |
The splitting information type required by the HoeffdingNumericSplit.
Definition at line 57 of file hoeffding_numeric_split.hpp.
| HoeffdingNumericSplit | ( | const size_t | numClasses = 0, |
| const size_t | bins = 10, |
||
| const size_t | observationsBeforeBinning = 100 |
||
| ) |
Create the HoeffdingNumericSplit class, and specify some basic parameters about how the binning should take place.
| numClasses | Number of classes. |
| bins | Number of bins. |
| observationsBeforeBinning | Number of points to see before binning is performed. |
| HoeffdingNumericSplit | ( | const size_t | numClasses, |
| const HoeffdingNumericSplit< FitnessFunction, ObservationType > & | other | ||
| ) |
Create the HoeffdingNumericSplit class, using the parameters from the given other split object.
|
inline |
Return the number of bins.
Definition at line 120 of file hoeffding_numeric_split.hpp.
References HoeffdingNumericSplit< FitnessFunction, ObservationType >::serialize().
| void EvaluateFitnessFunction | ( | double & | bestFitness, |
| double & | secondBestFitness | ||
| ) | const |
Evaluate the fitness function given what has been calculated so far.
In this case, if binning has not yet been performed, 0 will be returned (i.e., no gain). Because this split can only split one possible way, secondBestFitness (the fitness function for the second best possible split) will be set to 0.
| bestFitness | Value of the fitness function for the best possible split. |
| secondBestFitness | Value of the fitness function for the second best possible split (always 0 for this split). |
| size_t MajorityClass | ( | ) | const |
Return the majority class.
Referenced by HoeffdingNumericSplit< FitnessFunction, ObservationType >::NumChildren().
| double MajorityProbability | ( | ) | const |
Return the probability of the majority class.
Referenced by HoeffdingNumericSplit< FitnessFunction, ObservationType >::NumChildren().
|
inline |
Return the number of children if this node splits on this feature.
Definition at line 106 of file hoeffding_numeric_split.hpp.
References HoeffdingNumericSplit< FitnessFunction, ObservationType >::MajorityClass(), HoeffdingNumericSplit< FitnessFunction, ObservationType >::MajorityProbability(), and HoeffdingNumericSplit< FitnessFunction, ObservationType >::Split().
| void serialize | ( | Archive & | ar, |
| const uint32_t | |||
| ) |
Serialize the object.
Referenced by HoeffdingNumericSplit< FitnessFunction, ObservationType >::Bins().
| void Split | ( | arma::Col< size_t > & | childMajorities, |
| SplitInfo & | splitInfo | ||
| ) | const |
Return the majority class of each child to be created, if a split on this dimension was performed.
Also create the split object.
Referenced by HoeffdingNumericSplit< FitnessFunction, ObservationType >::NumChildren().
| void Train | ( | ObservationType | value, |
| const size_t | label | ||
| ) |
Train the HoeffdingNumericSplit on the given observed value (remember that this object only cares about the information for a single feature, not an entire point).
| value | Value in the dimension that this HoeffdingNumericSplit refers to. |
| label | Label of the given point. |