DecisionTreeRegressor< FitnessFunction, NumericSplitType, CategoricalSplitType, DimensionSelectionType, NoRecursion > Class Template Reference

This class implements a generic decision tree learner. More...

Inheritance diagram for DecisionTreeRegressor< FitnessFunction, NumericSplitType, CategoricalSplitType, DimensionSelectionType, NoRecursion >:

Public Types

typedef CategoricalSplitType< FitnessFunction > CategoricalSplit
 Allow access to the categorical split type. More...

 
typedef DimensionSelectionType DimensionSelection
 Allow access to the dimension selection type. More...

 
typedef NumericSplitType< FitnessFunction > NumericSplit
 Allow access to the numeric split type. More...

 

Public Member Functions

 DecisionTreeRegressor ()
 Construct a decision tree without training it. More...

 
template
<
typename
MatType
,
typename
ResponsesType
>
 DecisionTreeRegressor (MatType data, const data::DatasetInfo &datasetInfo, ResponsesType responses, const size_t minimumLeafSize=10, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, DimensionSelectionType dimensionSelector=DimensionSelectionType())
 Construct the decision tree on the given data and responses, where the data can be both numeric and categorical. More...

 
template
<
typename
MatType
,
typename
ResponsesType
>
 DecisionTreeRegressor (MatType data, ResponsesType responses, const size_t minimumLeafSize=10, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, DimensionSelectionType dimensionSelector=DimensionSelectionType())
 Construct the decision tree on the given data and responses, assuming that the data is all of the numeric type. More...

 
template
<
typename
MatType
,
typename
ResponsesType
,
typename
WeightsType
>
 DecisionTreeRegressor (MatType data, const data::DatasetInfo &datasetInfo, ResponsesType responses, WeightsType weights, const size_t minimumLeafSize=10, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, DimensionSelectionType dimensionSelector=DimensionSelectionType(), const std::enable_if_t< arma::is_arma_type< typename std::remove_reference< WeightsType >::type >::value > *=0)
 Construct the decision tree on the given data and responses with weights, where the data can be both numeric and categorical. More...

 
template
<
typename
MatType
,
typename
ResponsesType
,
typename
WeightsType
>
 DecisionTreeRegressor (MatType data, ResponsesType responses, WeightsType weights, const size_t minimumLeafSize=10, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, DimensionSelectionType dimensionSelector=DimensionSelectionType(), const std::enable_if_t< arma::is_arma_type< typename std::remove_reference< WeightsType >::type >::value > *=0)
 Construct the decision tree on the given data and responses with weights, assuming that the data is all of the numeric type. More...

 
template
<
typename
MatType
,
typename
ResponsesType
,
typename
WeightsType
>
 DecisionTreeRegressor (const DecisionTreeRegressor &other, MatType data, const data::DatasetInfo &datasetInfo, ResponsesType responses, WeightsType weights, const size_t minimumLeafSize=10, const double minimumGainSplit=1e-7, const std::enable_if_t< arma::is_arma_type< typename std::remove_reference< WeightsType >::type >::value > *=0)
 Take ownership of another decision tree and train on the given data and responses with weights, where the data can be both numeric and categorical. More...

 
template
<
typename
MatType
,
typename
ResponsesType
,
typename
WeightsType
>
 DecisionTreeRegressor (const DecisionTreeRegressor &other, MatType data, ResponsesType responses, WeightsType weights, const size_t minimumLeafSize=10, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, DimensionSelectionType dimensionSelector=DimensionSelectionType(), const std::enable_if_t< arma::is_arma_type< typename std::remove_reference< WeightsType >::type >::value > *=0)
 Take ownership of another decision tree and train on the given data and responses with weights, assuming that the data is all of the numeric type. More...

 
 DecisionTreeRegressor (const DecisionTreeRegressor &other)
 Copy another tree. More...

 
 DecisionTreeRegressor (DecisionTreeRegressor &&other)
 Take ownership of another tree. More...

 
 ~DecisionTreeRegressor ()
 Clean up memory. More...

 
template
<
typename
VecType
>
size_t CalculateDirection (const VecType &point) const
 Given a point and that this node is not a leaf, calculate the index of the child node this point would go towards. More...

 
const DecisionTreeRegressorChild (const size_t i) const
 Get the child of the given index. More...

 
DecisionTreeRegressorChild (const size_t i)
 Modify the child of the given index (be careful!). More...

 
size_t NumChildren () const
 Get the number of children. More...

 
size_t NumLeaves () const
 Get the number of leaves in the tree. More...

 
DecisionTreeRegressoroperator= (const DecisionTreeRegressor &other)
 Copy another tree. More...

 
DecisionTreeRegressoroperator= (DecisionTreeRegressor &&other)
 Take ownership of another tree. More...

 
template
<
typename
VecType
>
double Predict (const VecType &point) const
 Make prediction for the given point, using the entire tree. More...

 
template
<
typename
MatType
>
void Predict (const MatType &data, arma::Row< double > &predictions) const
 Make prediction for the given points, using the entire tree. More...

 
template
<
typename
Archive
>
void serialize (Archive &ar, const uint32_t)
 Serialize the tree. More...

 
size_t SplitDimension () const
 Get the split dimension (only meaningful if this is a non-leaf in a trained tree). More...

 
template
<
typename
MatType
,
typename
ResponsesType
>
double Train (MatType data, const data::DatasetInfo &datasetInfo, ResponsesType responses, const size_t minimumLeafSize=10, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, DimensionSelectionType dimensionSelector=DimensionSelectionType(), FitnessFunction fitnessFunction=FitnessFunction())
 Train the decision tree on the given data. More...

 
template
<
typename
MatType
,
typename
ResponsesType
>
double Train (MatType data, ResponsesType responses, const size_t minimumLeafSize=10, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, DimensionSelectionType dimensionSelector=DimensionSelectionType(), FitnessFunction fitnessFunction=FitnessFunction())
 Train the decision tree on the given data, assuming that all dimensions are numeric. More...

 
template
<
typename
MatType
,
typename
ResponsesType
,
typename
WeightsType
>
double Train (MatType data, const data::DatasetInfo &datasetInfo, ResponsesType responses, WeightsType weights, const size_t minimumLeafSize=10, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, DimensionSelectionType dimensionSelector=DimensionSelectionType(), FitnessFunction fitnessFunction=FitnessFunction(), const std::enable_if_t< arma::is_arma_type< typename std::remove_reference< WeightsType >::type >::value > *=0)
 Train the decision tree on the given weighted data. More...

 
template
<
typename
MatType
,
typename
ResponsesType
,
typename
WeightsType
>
double Train (MatType data, ResponsesType responses, WeightsType weights, const size_t minimumLeafSize=10, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, DimensionSelectionType dimensionSelector=DimensionSelectionType(), FitnessFunction fitnessFunction=FitnessFunction(), const std::enable_if_t< arma::is_arma_type< typename std::remove_reference< WeightsType >::type >::value > *=0)
 Train the decision tree on the given weighted data, assuming that all dimensions are numeric. More...

 

Detailed Description


template<typename FitnessFunction = MSEGain, template< typename > class NumericSplitType = BestBinaryNumericSplit, template< typename > class CategoricalSplitType = AllCategoricalSplit, typename DimensionSelectionType = AllDimensionSelect, bool NoRecursion = false>
class mlpack::tree::DecisionTreeRegressor< FitnessFunction, NumericSplitType, CategoricalSplitType, DimensionSelectionType, NoRecursion >

This class implements a generic decision tree learner.

Its behavior can be controlled via its template arguments.

The class inherits from the auxiliary split information in order to prevent an empty auxiliary split information struct from taking any extra size.

Definition at line 41 of file decision_tree_regressor.hpp.

Member Typedef Documentation

◆ CategoricalSplit

typedef CategoricalSplitType<FitnessFunction> CategoricalSplit

Allow access to the categorical split type.

Definition at line 49 of file decision_tree_regressor.hpp.

◆ DimensionSelection

typedef DimensionSelectionType DimensionSelection

Allow access to the dimension selection type.

Definition at line 51 of file decision_tree_regressor.hpp.

◆ NumericSplit

typedef NumericSplitType<FitnessFunction> NumericSplit

Allow access to the numeric split type.

Definition at line 47 of file decision_tree_regressor.hpp.

Constructor & Destructor Documentation

◆ DecisionTreeRegressor() [1/9]

Construct a decision tree without training it.

It will be a leaf node.

◆ DecisionTreeRegressor() [2/9]

DecisionTreeRegressor ( MatType  data,
const data::DatasetInfo datasetInfo,
ResponsesType  responses,
const size_t  minimumLeafSize = 10,
const double  minimumGainSplit = 1e-7,
const size_t  maximumDepth = 0,
DimensionSelectionType  dimensionSelector = DimensionSelectionType() 
)

Construct the decision tree on the given data and responses, where the data can be both numeric and categorical.

Setting minimumLeafSize and minimumGainSplit too small may cause the tree to overfit, but setting them too large may cause it to underfit.

Use std::move if data or responses are no longer needed to avoid copies.

Parameters
dataDataset to train on.
datasetInfoType information for each dimension of the dataset.
responsesResponses for each training point.
minimumLeafSizeMinimum number of points in each leaf node.
minimumGainSplitMinimum gain for the node to split.
maximumDepthMaximum depth for the tree.
dimensionSelectorInstantiated dimension selection policy.

◆ DecisionTreeRegressor() [3/9]

DecisionTreeRegressor ( MatType  data,
ResponsesType  responses,
const size_t  minimumLeafSize = 10,
const double  minimumGainSplit = 1e-7,
const size_t  maximumDepth = 0,
DimensionSelectionType  dimensionSelector = DimensionSelectionType() 
)

Construct the decision tree on the given data and responses, assuming that the data is all of the numeric type.

Setting minimumLeafSize and minimumGainSplit too small may cause the tree to overfit, but setting them too large may cause it to underfit.

Use std::move if data or responses are no longer needed to avoid copies.

Parameters
dataDataset to train on.
responsesResponses for each training point.
minimumLeafSizeMinimum number of points in each leaf node.
minimumGainSplitMinimum gain for the node to split.
maximumDepthMaximum depth for the tree.
dimensionSelectorInstantiated dimension selection policy.

◆ DecisionTreeRegressor() [4/9]

DecisionTreeRegressor ( MatType  data,
const data::DatasetInfo datasetInfo,
ResponsesType  responses,
WeightsType  weights,
const size_t  minimumLeafSize = 10,
const double  minimumGainSplit = 1e-7,
const size_t  maximumDepth = 0,
DimensionSelectionType  dimensionSelector = DimensionSelectionType(),
const std::enable_if_t< arma::is_arma_type< typename std::remove_reference< WeightsType >::type >::value > *  = 0 
)

Construct the decision tree on the given data and responses with weights, where the data can be both numeric and categorical.

Setting minimumLeafSize and minimumGainSplit too small may cause the tree to overfit, but setting them too large may cause it to underfit.

Use std::move if data, responses or weights are no longer needed to avoid copies.

Parameters
dataDataset to train on.
datasetInfoType information for each dimension of the dataset.
responsesResponses for each training point.
weightsThe weight list of given label.
minimumLeafSizeMinimum number of points in each leaf node.
minimumGainSplitMinimum gain for the node to split.
maximumDepthMaximum depth for the tree.
dimensionSelectorInstantiated dimension selection policy.

◆ DecisionTreeRegressor() [5/9]

DecisionTreeRegressor ( MatType  data,
ResponsesType  responses,
WeightsType  weights,
const size_t  minimumLeafSize = 10,
const double  minimumGainSplit = 1e-7,
const size_t  maximumDepth = 0,
DimensionSelectionType  dimensionSelector = DimensionSelectionType(),
const std::enable_if_t< arma::is_arma_type< typename std::remove_reference< WeightsType >::type >::value > *  = 0 
)

Construct the decision tree on the given data and responses with weights, assuming that the data is all of the numeric type.

Setting minimumLeafSize and minimumGainSplit too small may cause the tree to overfit, but setting them too large may cause it to underfit.

Use std::move if data, responses or weights are no longer needed to avoid copies.

Parameters
dataDataset to train on.
responsesResponses for each training point.
weightsThe Weight list of given labels.
minimumLeafSizeMinimum number of points in each leaf node.
minimumGainSplitMinimum gain for the node to split.
maximumDepthMaximum depth for the tree.
dimensionSelectorInstantiated dimension selection policy.

◆ DecisionTreeRegressor() [6/9]

DecisionTreeRegressor ( const DecisionTreeRegressor< FitnessFunction, NumericSplitType, CategoricalSplitType, DimensionSelectionType, NoRecursion > &  other,
MatType  data,
const data::DatasetInfo datasetInfo,
ResponsesType  responses,
WeightsType  weights,
const size_t  minimumLeafSize = 10,
const double  minimumGainSplit = 1e-7,
const std::enable_if_t< arma::is_arma_type< typename std::remove_reference< WeightsType >::type >::value > *  = 0 
)

Take ownership of another decision tree and train on the given data and responses with weights, where the data can be both numeric and categorical.

Setting minimumLeafSize and minimumGainSplit too small may cause the tree to overfit, but setting them too large may cause it to underfit.

Use std::move if data, responses or weights are no longer needed to avoid copies.

Parameters
otherTree to take ownership of.
dataDataset to train on.
datasetInfoType information for each dimension of the dataset.
responsesResponses for each training point.
weightsThe weight list of given label.
minimumLeafSizeMinimum number of points in each leaf node.
minimumGainSplitMinimum gain for the node to split.

◆ DecisionTreeRegressor() [7/9]

DecisionTreeRegressor ( const DecisionTreeRegressor< FitnessFunction, NumericSplitType, CategoricalSplitType, DimensionSelectionType, NoRecursion > &  other,
MatType  data,
ResponsesType  responses,
WeightsType  weights,
const size_t  minimumLeafSize = 10,
const double  minimumGainSplit = 1e-7,
const size_t  maximumDepth = 0,
DimensionSelectionType  dimensionSelector = DimensionSelectionType(),
const std::enable_if_t< arma::is_arma_type< typename std::remove_reference< WeightsType >::type >::value > *  = 0 
)

Take ownership of another decision tree and train on the given data and responses with weights, assuming that the data is all of the numeric type.

Setting minimumLeafSize and minimumGainSplit too small may cause the tree to overfit, but setting them too large may cause it to underfit.

Use std::move if data, responses or weights are no longer needed to avoid copies.

Parameters
otherTree to take ownership of.
dataDataset to train on.
responsesResponses for each training point.
weightsThe Weight list of given labels.
minimumLeafSizeMinimum number of points in each leaf node.
minimumGainSplitMinimum gain for the node to split.
maximumDepthMaximum depth for the tree.
dimensionSelectorInstantiated dimension selection policy.

◆ DecisionTreeRegressor() [8/9]

DecisionTreeRegressor ( const DecisionTreeRegressor< FitnessFunction, NumericSplitType, CategoricalSplitType, DimensionSelectionType, NoRecursion > &  other)

Copy another tree.

This may use a lot of memory—be sure that it's what you want to do.

Parameters
otherTree to copy.

◆ DecisionTreeRegressor() [9/9]

DecisionTreeRegressor ( DecisionTreeRegressor< FitnessFunction, NumericSplitType, CategoricalSplitType, DimensionSelectionType, NoRecursion > &&  other)

Take ownership of another tree.

Parameters
otherTree to take ownership of.

◆ ~DecisionTreeRegressor()

Clean up memory.

Member Function Documentation

◆ CalculateDirection()

size_t CalculateDirection ( const VecType &  point) const

Given a point and that this node is not a leaf, calculate the index of the child node this point would go towards.

This method is primarily used by the Predict() function, but it can be used in a standalone sense too.

Parameters
pointPoint to predict.

Referenced by DecisionTreeRegressor< FitnessFunction, NumericSplitType, CategoricalSplitType, DimensionSelectionType, NoRecursion >::SplitDimension().

◆ Child() [1/2]

const DecisionTreeRegressor& Child ( const size_t  i) const
inline

Get the child of the given index.

Definition at line 424 of file decision_tree_regressor.hpp.

◆ Child() [2/2]

DecisionTreeRegressor& Child ( const size_t  i)
inline

Modify the child of the given index (be careful!).

Definition at line 429 of file decision_tree_regressor.hpp.

◆ NumChildren()

size_t NumChildren ( ) const
inline

◆ NumLeaves()

◆ operator=() [1/2]

DecisionTreeRegressor& operator= ( const DecisionTreeRegressor< FitnessFunction, NumericSplitType, CategoricalSplitType, DimensionSelectionType, NoRecursion > &  other)

Copy another tree.

This may use a lot of memory—be sure that it's what you want to do.

Parameters
otherTree to copy.

◆ operator=() [2/2]

DecisionTreeRegressor& operator= ( DecisionTreeRegressor< FitnessFunction, NumericSplitType, CategoricalSplitType, DimensionSelectionType, NoRecursion > &&  other)

Take ownership of another tree.

Parameters
otherTree to take ownership of.

◆ Predict() [1/2]

double Predict ( const VecType &  point) const

Make prediction for the given point, using the entire tree.

The predicted label is returned.

Parameters
pointPoint to predict.

◆ Predict() [2/2]

void Predict ( const MatType &  data,
arma::Row< double > &  predictions 
) const

Make prediction for the given points, using the entire tree.

The predicted responses for each point are stored in the given vector.

Parameters
dataSet of points to predict.
predictionsThis will be filled with predictions for each point.

◆ serialize()

void serialize ( Archive &  ar,
const uint32_t   
)

Serialize the tree.

◆ SplitDimension()

◆ Train() [1/4]

double Train ( MatType  data,
const data::DatasetInfo datasetInfo,
ResponsesType  responses,
const size_t  minimumLeafSize = 10,
const double  minimumGainSplit = 1e-7,
const size_t  maximumDepth = 0,
DimensionSelectionType  dimensionSelector = DimensionSelectionType(),
FitnessFunction  fitnessFunction = FitnessFunction() 
)

Train the decision tree on the given data.

This will overwrite the existing model. The data may have numeric and categorical types, specified by the datasetInfo parameter. Setting minimumLeafSize and minimumGainSplit too small may cause the tree to overfit, but setting them too large may cause it to underfit.

Use std::move if data or responses are no longer needed to avoid copies.

Parameters
dataDataset to train on.
datasetInfoType information for each dimension.
responsesResponses for each training point.
minimumLeafSizeMinimum number of points in each leaf node.
minimumGainSplitMinimum gain for the node to split.
maximumDepthMaximum depth for the tree.
dimensionSelectorInstantiated dimension selection policy.
fitnessFunctionInstantiated fitnessFunction. It is used to evaluate the fitness score for splitting each node.
Returns
The final entropy of decision tree.

Referenced by DecisionTreeRegressor< FitnessFunction, NumericSplitType, CategoricalSplitType, DimensionSelectionType, NoRecursion >::SplitDimension().

◆ Train() [2/4]

double Train ( MatType  data,
ResponsesType  responses,
const size_t  minimumLeafSize = 10,
const double  minimumGainSplit = 1e-7,
const size_t  maximumDepth = 0,
DimensionSelectionType  dimensionSelector = DimensionSelectionType(),
FitnessFunction  fitnessFunction = FitnessFunction() 
)

Train the decision tree on the given data, assuming that all dimensions are numeric.

This will overwrite the given model. Setting minimumLeafSize and minimumGainSplit too small may cause the tree to overfit, but setting them too large may cause it to underfit.

Use std::move if data or responses are no longer needed to avoid copies.

Parameters
dataDataset to train on.
responsesResponses for each training point.
minimumLeafSizeMinimum number of points in each leaf node.
minimumGainSplitMinimum gain for the node to split.
maximumDepthMaximum depth for the tree.
dimensionSelectorInstantiated dimension selection policy.
fitnessFunctionInstantiated fitnessFunction. It is used to evaluate the fitness score for splitting each node.
Returns
The final entropy of decision tree.

◆ Train() [3/4]

double Train ( MatType  data,
const data::DatasetInfo datasetInfo,
ResponsesType  responses,
WeightsType  weights,
const size_t  minimumLeafSize = 10,
const double  minimumGainSplit = 1e-7,
const size_t  maximumDepth = 0,
DimensionSelectionType  dimensionSelector = DimensionSelectionType(),
FitnessFunction  fitnessFunction = FitnessFunction(),
const std::enable_if_t< arma::is_arma_type< typename std::remove_reference< WeightsType >::type >::value > *  = 0 
)

Train the decision tree on the given weighted data.

This will overwrite the existing model. The data may have numeric and categorical types, specified by the datasetInfo parameter. Setting minimumLeafSize and minimumGainSplit too small may cause the tree to overfit, but setting them too large may cause it to underfit.

Use std::move if data, responses or weights are no longer needed to avoid copies.

Parameters
dataDataset to train on.
datasetInfoType information for each dimension.
responsesResponses for each training point.
weightsWeights of all the labels
minimumLeafSizeMinimum number of points in each leaf node.
minimumGainSplitMinimum gain for the node to split.
maximumDepthMaximum depth for the tree.
dimensionSelectorInstantiated dimension selection policy.
fitnessFunctionInstantiated fitnessFunction. It is used to evaluate the fitness score for splitting each node.
Returns
The final entropy of decision tree.

◆ Train() [4/4]

double Train ( MatType  data,
ResponsesType  responses,
WeightsType  weights,
const size_t  minimumLeafSize = 10,
const double  minimumGainSplit = 1e-7,
const size_t  maximumDepth = 0,
DimensionSelectionType  dimensionSelector = DimensionSelectionType(),
FitnessFunction  fitnessFunction = FitnessFunction(),
const std::enable_if_t< arma::is_arma_type< typename std::remove_reference< WeightsType >::type >::value > *  = 0 
)

Train the decision tree on the given weighted data, assuming that all dimensions are numeric.

This will overwrite the given model. Setting minimumLeafSize and minimumGainSplit too small may cause the tree to overfit, but setting them too large may cause it to underfit.

Use std::move if data, responses or weights are no longer needed to avoid copies.

Parameters
dataDataset to train on.
responsesResponses for each training point.
weightsWeights of all the labels
minimumLeafSizeMinimum number of points in each leaf node.
minimumGainSplitMinimum gain for the node to split.
maximumDepthMaximum depth for the tree.
dimensionSelectorInstantiated dimension selection policy.
fitnessFunctionInstantiated fitnessFunction. It is used to evaluate the fitness score for splitting each node.
Returns
The final entropy of decision tree.

The documentation for this class was generated from the following file: