This class implements a generic decision tree learner. More...
Public Types | |
typedef CategoricalSplitType< FitnessFunction > | CategoricalSplit |
Allow access to the categorical split type. More... | |
typedef DimensionSelectionType | DimensionSelection |
Allow access to the dimension selection type. More... | |
typedef NumericSplitType< FitnessFunction > | NumericSplit |
Allow access to the numeric split type. More... | |
Public Member Functions | |
DecisionTreeRegressor () | |
Construct a decision tree without training it. More... | |
template < typename MatType , typename ResponsesType > | |
DecisionTreeRegressor (MatType data, const data::DatasetInfo &datasetInfo, ResponsesType responses, const size_t minimumLeafSize=10, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, DimensionSelectionType dimensionSelector=DimensionSelectionType()) | |
Construct the decision tree on the given data and responses, where the data can be both numeric and categorical. More... | |
template < typename MatType , typename ResponsesType > | |
DecisionTreeRegressor (MatType data, ResponsesType responses, const size_t minimumLeafSize=10, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, DimensionSelectionType dimensionSelector=DimensionSelectionType()) | |
Construct the decision tree on the given data and responses, assuming that the data is all of the numeric type. More... | |
template < typename MatType , typename ResponsesType , typename WeightsType > | |
DecisionTreeRegressor (MatType data, const data::DatasetInfo &datasetInfo, ResponsesType responses, WeightsType weights, const size_t minimumLeafSize=10, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, DimensionSelectionType dimensionSelector=DimensionSelectionType(), const std::enable_if_t< arma::is_arma_type< typename std::remove_reference< WeightsType >::type >::value > *=0) | |
Construct the decision tree on the given data and responses with weights, where the data can be both numeric and categorical. More... | |
template < typename MatType , typename ResponsesType , typename WeightsType > | |
DecisionTreeRegressor (MatType data, ResponsesType responses, WeightsType weights, const size_t minimumLeafSize=10, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, DimensionSelectionType dimensionSelector=DimensionSelectionType(), const std::enable_if_t< arma::is_arma_type< typename std::remove_reference< WeightsType >::type >::value > *=0) | |
Construct the decision tree on the given data and responses with weights, assuming that the data is all of the numeric type. More... | |
template < typename MatType , typename ResponsesType , typename WeightsType > | |
DecisionTreeRegressor (const DecisionTreeRegressor &other, MatType data, const data::DatasetInfo &datasetInfo, ResponsesType responses, WeightsType weights, const size_t minimumLeafSize=10, const double minimumGainSplit=1e-7, const std::enable_if_t< arma::is_arma_type< typename std::remove_reference< WeightsType >::type >::value > *=0) | |
Take ownership of another decision tree and train on the given data and responses with weights, where the data can be both numeric and categorical. More... | |
template < typename MatType , typename ResponsesType , typename WeightsType > | |
DecisionTreeRegressor (const DecisionTreeRegressor &other, MatType data, ResponsesType responses, WeightsType weights, const size_t minimumLeafSize=10, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, DimensionSelectionType dimensionSelector=DimensionSelectionType(), const std::enable_if_t< arma::is_arma_type< typename std::remove_reference< WeightsType >::type >::value > *=0) | |
Take ownership of another decision tree and train on the given data and responses with weights, assuming that the data is all of the numeric type. More... | |
DecisionTreeRegressor (const DecisionTreeRegressor &other) | |
Copy another tree. More... | |
DecisionTreeRegressor (DecisionTreeRegressor &&other) | |
Take ownership of another tree. More... | |
~DecisionTreeRegressor () | |
Clean up memory. More... | |
template < typename VecType > | |
size_t | CalculateDirection (const VecType &point) const |
Given a point and that this node is not a leaf, calculate the index of the child node this point would go towards. More... | |
const DecisionTreeRegressor & | Child (const size_t i) const |
Get the child of the given index. More... | |
DecisionTreeRegressor & | Child (const size_t i) |
Modify the child of the given index (be careful!). More... | |
size_t | NumChildren () const |
Get the number of children. More... | |
size_t | NumLeaves () const |
Get the number of leaves in the tree. More... | |
DecisionTreeRegressor & | operator= (const DecisionTreeRegressor &other) |
Copy another tree. More... | |
DecisionTreeRegressor & | operator= (DecisionTreeRegressor &&other) |
Take ownership of another tree. More... | |
template < typename VecType > | |
double | Predict (const VecType &point) const |
Make prediction for the given point, using the entire tree. More... | |
template < typename MatType > | |
void | Predict (const MatType &data, arma::Row< double > &predictions) const |
Make prediction for the given points, using the entire tree. More... | |
template < typename Archive > | |
void | serialize (Archive &ar, const uint32_t) |
Serialize the tree. More... | |
size_t | SplitDimension () const |
Get the split dimension (only meaningful if this is a non-leaf in a trained tree). More... | |
template < typename MatType , typename ResponsesType > | |
double | Train (MatType data, const data::DatasetInfo &datasetInfo, ResponsesType responses, const size_t minimumLeafSize=10, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, DimensionSelectionType dimensionSelector=DimensionSelectionType(), FitnessFunction fitnessFunction=FitnessFunction()) |
Train the decision tree on the given data. More... | |
template < typename MatType , typename ResponsesType > | |
double | Train (MatType data, ResponsesType responses, const size_t minimumLeafSize=10, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, DimensionSelectionType dimensionSelector=DimensionSelectionType(), FitnessFunction fitnessFunction=FitnessFunction()) |
Train the decision tree on the given data, assuming that all dimensions are numeric. More... | |
template < typename MatType , typename ResponsesType , typename WeightsType > | |
double | Train (MatType data, const data::DatasetInfo &datasetInfo, ResponsesType responses, WeightsType weights, const size_t minimumLeafSize=10, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, DimensionSelectionType dimensionSelector=DimensionSelectionType(), FitnessFunction fitnessFunction=FitnessFunction(), const std::enable_if_t< arma::is_arma_type< typename std::remove_reference< WeightsType >::type >::value > *=0) |
Train the decision tree on the given weighted data. More... | |
template < typename MatType , typename ResponsesType , typename WeightsType > | |
double | Train (MatType data, ResponsesType responses, WeightsType weights, const size_t minimumLeafSize=10, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, DimensionSelectionType dimensionSelector=DimensionSelectionType(), FitnessFunction fitnessFunction=FitnessFunction(), const std::enable_if_t< arma::is_arma_type< typename std::remove_reference< WeightsType >::type >::value > *=0) |
Train the decision tree on the given weighted data, assuming that all dimensions are numeric. More... | |
This class implements a generic decision tree learner.
Its behavior can be controlled via its template arguments.
The class inherits from the auxiliary split information in order to prevent an empty auxiliary split information struct from taking any extra size.
Definition at line 41 of file decision_tree_regressor.hpp.
typedef CategoricalSplitType<FitnessFunction> CategoricalSplit |
Allow access to the categorical split type.
Definition at line 49 of file decision_tree_regressor.hpp.
typedef DimensionSelectionType DimensionSelection |
Allow access to the dimension selection type.
Definition at line 51 of file decision_tree_regressor.hpp.
typedef NumericSplitType<FitnessFunction> NumericSplit |
Allow access to the numeric split type.
Definition at line 47 of file decision_tree_regressor.hpp.
Construct a decision tree without training it.
It will be a leaf node.
DecisionTreeRegressor | ( | MatType | data, |
const data::DatasetInfo & | datasetInfo, | ||
ResponsesType | responses, | ||
const size_t | minimumLeafSize = 10 , |
||
const double | minimumGainSplit = 1e-7 , |
||
const size_t | maximumDepth = 0 , |
||
DimensionSelectionType | dimensionSelector = DimensionSelectionType() |
||
) |
Construct the decision tree on the given data and responses, where the data can be both numeric and categorical.
Setting minimumLeafSize and minimumGainSplit too small may cause the tree to overfit, but setting them too large may cause it to underfit.
Use std::move if data or responses are no longer needed to avoid copies.
data | Dataset to train on. |
datasetInfo | Type information for each dimension of the dataset. |
responses | Responses for each training point. |
minimumLeafSize | Minimum number of points in each leaf node. |
minimumGainSplit | Minimum gain for the node to split. |
maximumDepth | Maximum depth for the tree. |
dimensionSelector | Instantiated dimension selection policy. |
DecisionTreeRegressor | ( | MatType | data, |
ResponsesType | responses, | ||
const size_t | minimumLeafSize = 10 , |
||
const double | minimumGainSplit = 1e-7 , |
||
const size_t | maximumDepth = 0 , |
||
DimensionSelectionType | dimensionSelector = DimensionSelectionType() |
||
) |
Construct the decision tree on the given data and responses, assuming that the data is all of the numeric type.
Setting minimumLeafSize and minimumGainSplit too small may cause the tree to overfit, but setting them too large may cause it to underfit.
Use std::move if data or responses are no longer needed to avoid copies.
data | Dataset to train on. |
responses | Responses for each training point. |
minimumLeafSize | Minimum number of points in each leaf node. |
minimumGainSplit | Minimum gain for the node to split. |
maximumDepth | Maximum depth for the tree. |
dimensionSelector | Instantiated dimension selection policy. |
DecisionTreeRegressor | ( | MatType | data, |
const data::DatasetInfo & | datasetInfo, | ||
ResponsesType | responses, | ||
WeightsType | weights, | ||
const size_t | minimumLeafSize = 10 , |
||
const double | minimumGainSplit = 1e-7 , |
||
const size_t | maximumDepth = 0 , |
||
DimensionSelectionType | dimensionSelector = DimensionSelectionType() , |
||
const std::enable_if_t< arma::is_arma_type< typename std::remove_reference< WeightsType >::type >::value > * | = 0 |
||
) |
Construct the decision tree on the given data and responses with weights, where the data can be both numeric and categorical.
Setting minimumLeafSize and minimumGainSplit too small may cause the tree to overfit, but setting them too large may cause it to underfit.
Use std::move if data, responses or weights are no longer needed to avoid copies.
data | Dataset to train on. |
datasetInfo | Type information for each dimension of the dataset. |
responses | Responses for each training point. |
weights | The weight list of given label. |
minimumLeafSize | Minimum number of points in each leaf node. |
minimumGainSplit | Minimum gain for the node to split. |
maximumDepth | Maximum depth for the tree. |
dimensionSelector | Instantiated dimension selection policy. |
DecisionTreeRegressor | ( | MatType | data, |
ResponsesType | responses, | ||
WeightsType | weights, | ||
const size_t | minimumLeafSize = 10 , |
||
const double | minimumGainSplit = 1e-7 , |
||
const size_t | maximumDepth = 0 , |
||
DimensionSelectionType | dimensionSelector = DimensionSelectionType() , |
||
const std::enable_if_t< arma::is_arma_type< typename std::remove_reference< WeightsType >::type >::value > * | = 0 |
||
) |
Construct the decision tree on the given data and responses with weights, assuming that the data is all of the numeric type.
Setting minimumLeafSize and minimumGainSplit too small may cause the tree to overfit, but setting them too large may cause it to underfit.
Use std::move if data, responses or weights are no longer needed to avoid copies.
data | Dataset to train on. |
responses | Responses for each training point. |
weights | The Weight list of given labels. |
minimumLeafSize | Minimum number of points in each leaf node. |
minimumGainSplit | Minimum gain for the node to split. |
maximumDepth | Maximum depth for the tree. |
dimensionSelector | Instantiated dimension selection policy. |
DecisionTreeRegressor | ( | const DecisionTreeRegressor< FitnessFunction, NumericSplitType, CategoricalSplitType, DimensionSelectionType, NoRecursion > & | other, |
MatType | data, | ||
const data::DatasetInfo & | datasetInfo, | ||
ResponsesType | responses, | ||
WeightsType | weights, | ||
const size_t | minimumLeafSize = 10 , |
||
const double | minimumGainSplit = 1e-7 , |
||
const std::enable_if_t< arma::is_arma_type< typename std::remove_reference< WeightsType >::type >::value > * | = 0 |
||
) |
Take ownership of another decision tree and train on the given data and responses with weights, where the data can be both numeric and categorical.
Setting minimumLeafSize and minimumGainSplit too small may cause the tree to overfit, but setting them too large may cause it to underfit.
Use std::move if data, responses or weights are no longer needed to avoid copies.
other | Tree to take ownership of. |
data | Dataset to train on. |
datasetInfo | Type information for each dimension of the dataset. |
responses | Responses for each training point. |
weights | The weight list of given label. |
minimumLeafSize | Minimum number of points in each leaf node. |
minimumGainSplit | Minimum gain for the node to split. |
DecisionTreeRegressor | ( | const DecisionTreeRegressor< FitnessFunction, NumericSplitType, CategoricalSplitType, DimensionSelectionType, NoRecursion > & | other, |
MatType | data, | ||
ResponsesType | responses, | ||
WeightsType | weights, | ||
const size_t | minimumLeafSize = 10 , |
||
const double | minimumGainSplit = 1e-7 , |
||
const size_t | maximumDepth = 0 , |
||
DimensionSelectionType | dimensionSelector = DimensionSelectionType() , |
||
const std::enable_if_t< arma::is_arma_type< typename std::remove_reference< WeightsType >::type >::value > * | = 0 |
||
) |
Take ownership of another decision tree and train on the given data and responses with weights, assuming that the data is all of the numeric type.
Setting minimumLeafSize and minimumGainSplit too small may cause the tree to overfit, but setting them too large may cause it to underfit.
Use std::move if data, responses or weights are no longer needed to avoid copies.
other | Tree to take ownership of. |
data | Dataset to train on. |
responses | Responses for each training point. |
weights | The Weight list of given labels. |
minimumLeafSize | Minimum number of points in each leaf node. |
minimumGainSplit | Minimum gain for the node to split. |
maximumDepth | Maximum depth for the tree. |
dimensionSelector | Instantiated dimension selection policy. |
DecisionTreeRegressor | ( | const DecisionTreeRegressor< FitnessFunction, NumericSplitType, CategoricalSplitType, DimensionSelectionType, NoRecursion > & | other | ) |
Copy another tree.
This may use a lot of memory—be sure that it's what you want to do.
other | Tree to copy. |
DecisionTreeRegressor | ( | DecisionTreeRegressor< FitnessFunction, NumericSplitType, CategoricalSplitType, DimensionSelectionType, NoRecursion > && | other | ) |
Take ownership of another tree.
other | Tree to take ownership of. |
Clean up memory.
size_t CalculateDirection | ( | const VecType & | point | ) | const |
Given a point and that this node is not a leaf, calculate the index of the child node this point would go towards.
This method is primarily used by the Predict() function, but it can be used in a standalone sense too.
point | Point to predict. |
Referenced by DecisionTreeRegressor< FitnessFunction, NumericSplitType, CategoricalSplitType, DimensionSelectionType, NoRecursion >::SplitDimension().
|
inline |
Get the child of the given index.
Definition at line 424 of file decision_tree_regressor.hpp.
|
inline |
Modify the child of the given index (be careful!).
Definition at line 429 of file decision_tree_regressor.hpp.
|
inline |
Get the number of children.
Definition at line 418 of file decision_tree_regressor.hpp.
size_t NumLeaves | ( | ) | const |
Get the number of leaves in the tree.
Referenced by DecisionTreeRegressor< FitnessFunction, NumericSplitType, CategoricalSplitType, DimensionSelectionType, NoRecursion >::NumChildren().
DecisionTreeRegressor& operator= | ( | const DecisionTreeRegressor< FitnessFunction, NumericSplitType, CategoricalSplitType, DimensionSelectionType, NoRecursion > & | other | ) |
Copy another tree.
This may use a lot of memory—be sure that it's what you want to do.
other | Tree to copy. |
DecisionTreeRegressor& operator= | ( | DecisionTreeRegressor< FitnessFunction, NumericSplitType, CategoricalSplitType, DimensionSelectionType, NoRecursion > && | other | ) |
Take ownership of another tree.
other | Tree to take ownership of. |
double Predict | ( | const VecType & | point | ) | const |
Make prediction for the given point, using the entire tree.
The predicted label is returned.
point | Point to predict. |
void Predict | ( | const MatType & | data, |
arma::Row< double > & | predictions | ||
) | const |
Make prediction for the given points, using the entire tree.
The predicted responses for each point are stored in the given vector.
data | Set of points to predict. |
predictions | This will be filled with predictions for each point. |
void serialize | ( | Archive & | ar, |
const uint32_t | |||
) |
Serialize the tree.
|
inline |
Get the split dimension (only meaningful if this is a non-leaf in a trained tree).
Definition at line 433 of file decision_tree_regressor.hpp.
References DecisionTreeRegressor< FitnessFunction, NumericSplitType, CategoricalSplitType, DimensionSelectionType, NoRecursion >::CalculateDirection(), and DecisionTreeRegressor< FitnessFunction, NumericSplitType, CategoricalSplitType, DimensionSelectionType, NoRecursion >::Train().
double Train | ( | MatType | data, |
const data::DatasetInfo & | datasetInfo, | ||
ResponsesType | responses, | ||
const size_t | minimumLeafSize = 10 , |
||
const double | minimumGainSplit = 1e-7 , |
||
const size_t | maximumDepth = 0 , |
||
DimensionSelectionType | dimensionSelector = DimensionSelectionType() , |
||
FitnessFunction | fitnessFunction = FitnessFunction() |
||
) |
Train the decision tree on the given data.
This will overwrite the existing model. The data may have numeric and categorical types, specified by the datasetInfo parameter. Setting minimumLeafSize and minimumGainSplit too small may cause the tree to overfit, but setting them too large may cause it to underfit.
Use std::move if data or responses are no longer needed to avoid copies.
data | Dataset to train on. |
datasetInfo | Type information for each dimension. |
responses | Responses for each training point. |
minimumLeafSize | Minimum number of points in each leaf node. |
minimumGainSplit | Minimum gain for the node to split. |
maximumDepth | Maximum depth for the tree. |
dimensionSelector | Instantiated dimension selection policy. |
fitnessFunction | Instantiated fitnessFunction. It is used to evaluate the fitness score for splitting each node. |
Referenced by DecisionTreeRegressor< FitnessFunction, NumericSplitType, CategoricalSplitType, DimensionSelectionType, NoRecursion >::SplitDimension().
double Train | ( | MatType | data, |
ResponsesType | responses, | ||
const size_t | minimumLeafSize = 10 , |
||
const double | minimumGainSplit = 1e-7 , |
||
const size_t | maximumDepth = 0 , |
||
DimensionSelectionType | dimensionSelector = DimensionSelectionType() , |
||
FitnessFunction | fitnessFunction = FitnessFunction() |
||
) |
Train the decision tree on the given data, assuming that all dimensions are numeric.
This will overwrite the given model. Setting minimumLeafSize and minimumGainSplit too small may cause the tree to overfit, but setting them too large may cause it to underfit.
Use std::move if data or responses are no longer needed to avoid copies.
data | Dataset to train on. |
responses | Responses for each training point. |
minimumLeafSize | Minimum number of points in each leaf node. |
minimumGainSplit | Minimum gain for the node to split. |
maximumDepth | Maximum depth for the tree. |
dimensionSelector | Instantiated dimension selection policy. |
fitnessFunction | Instantiated fitnessFunction. It is used to evaluate the fitness score for splitting each node. |
double Train | ( | MatType | data, |
const data::DatasetInfo & | datasetInfo, | ||
ResponsesType | responses, | ||
WeightsType | weights, | ||
const size_t | minimumLeafSize = 10 , |
||
const double | minimumGainSplit = 1e-7 , |
||
const size_t | maximumDepth = 0 , |
||
DimensionSelectionType | dimensionSelector = DimensionSelectionType() , |
||
FitnessFunction | fitnessFunction = FitnessFunction() , |
||
const std::enable_if_t< arma::is_arma_type< typename std::remove_reference< WeightsType >::type >::value > * | = 0 |
||
) |
Train the decision tree on the given weighted data.
This will overwrite the existing model. The data may have numeric and categorical types, specified by the datasetInfo parameter. Setting minimumLeafSize and minimumGainSplit too small may cause the tree to overfit, but setting them too large may cause it to underfit.
Use std::move if data, responses or weights are no longer needed to avoid copies.
data | Dataset to train on. |
datasetInfo | Type information for each dimension. |
responses | Responses for each training point. |
weights | Weights of all the labels |
minimumLeafSize | Minimum number of points in each leaf node. |
minimumGainSplit | Minimum gain for the node to split. |
maximumDepth | Maximum depth for the tree. |
dimensionSelector | Instantiated dimension selection policy. |
fitnessFunction | Instantiated fitnessFunction. It is used to evaluate the fitness score for splitting each node. |
double Train | ( | MatType | data, |
ResponsesType | responses, | ||
WeightsType | weights, | ||
const size_t | minimumLeafSize = 10 , |
||
const double | minimumGainSplit = 1e-7 , |
||
const size_t | maximumDepth = 0 , |
||
DimensionSelectionType | dimensionSelector = DimensionSelectionType() , |
||
FitnessFunction | fitnessFunction = FitnessFunction() , |
||
const std::enable_if_t< arma::is_arma_type< typename std::remove_reference< WeightsType >::type >::value > * | = 0 |
||
) |
Train the decision tree on the given weighted data, assuming that all dimensions are numeric.
This will overwrite the given model. Setting minimumLeafSize and minimumGainSplit too small may cause the tree to overfit, but setting them too large may cause it to underfit.
Use std::move if data, responses or weights are no longer needed to avoid copies.
data | Dataset to train on. |
responses | Responses for each training point. |
weights | Weights of all the labels |
minimumLeafSize | Minimum number of points in each leaf node. |
minimumGainSplit | Minimum gain for the node to split. |
maximumDepth | Maximum depth for the tree. |
dimensionSelector | Instantiated dimension selection policy. |
fitnessFunction | Instantiated fitnessFunction. It is used to evaluate the fitness score for splitting each node. |