The RandomForest class provides an implementation of random forests, described in Breiman's seminal paper: More...
Public Types | |
typedef DecisionTree< FitnessFunction, NumericSplitType, CategoricalSplitType, DimensionSelectionType > | DecisionTreeType |
Allow access to the underlying decision tree type. More... | |
Public Member Functions | |
RandomForest () | |
Construct the random forest without any training or specifying the number of trees. More... | |
template < typename MatType > | |
RandomForest (const MatType &dataset, const arma::Row< size_t > &labels, const size_t numClasses, const size_t numTrees=20, const size_t minimumLeafSize=1, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, DimensionSelectionType dimensionSelector=DimensionSelectionType()) | |
Create a random forest, training on the given labeled training data with the given number of trees. More... | |
template < typename MatType > | |
RandomForest (const MatType &dataset, const data::DatasetInfo &datasetInfo, const arma::Row< size_t > &labels, const size_t numClasses, const size_t numTrees=20, const size_t minimumLeafSize=1, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, DimensionSelectionType dimensionSelector=DimensionSelectionType()) | |
Create a random forest, training on the given labeled training data with the given dataset info and the given number of trees. More... | |
template < typename MatType > | |
RandomForest (const MatType &dataset, const arma::Row< size_t > &labels, const size_t numClasses, const arma::rowvec &weights, const size_t numTrees=20, const size_t minimumLeafSize=1, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, DimensionSelectionType dimensionSelector=DimensionSelectionType()) | |
Create a random forest, training on the given weighted labeled training data with the given number of trees. More... | |
template < typename MatType > | |
RandomForest (const MatType &dataset, const data::DatasetInfo &datasetInfo, const arma::Row< size_t > &labels, const size_t numClasses, const arma::rowvec &weights, const size_t numTrees=20, const size_t minimumLeafSize=1, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, DimensionSelectionType dimensionSelector=DimensionSelectionType()) | |
Create a random forest, training on the given weighted labeled training data with the given dataset info and the given number of trees. More... | |
template < typename VecType > | |
size_t | Classify (const VecType &point) const |
Predict the class of the given point. More... | |
template < typename VecType > | |
void | Classify (const VecType &point, size_t &prediction, arma::vec &probabilities) const |
Predict the class of the given point and return the predicted class probabilities for each class. More... | |
template < typename MatType > | |
void | Classify (const MatType &data, arma::Row< size_t > &predictions) const |
Predict the classes of each point in the given dataset. More... | |
template < typename MatType > | |
void | Classify (const MatType &data, arma::Row< size_t > &predictions, arma::mat &probabilities) const |
Predict the classes of each point in the given dataset, also returning the predicted class probabilities for each point. More... | |
size_t | NumTrees () const |
Get the number of trees in the forest. More... | |
template < typename Archive > | |
void | serialize (Archive &ar, const uint32_t) |
Serialize the random forest. More... | |
template < typename MatType > | |
double | Train (const MatType &data, const arma::Row< size_t > &labels, const size_t numClasses, const size_t numTrees=20, const size_t minimumLeafSize=1, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, const bool warmStart=false, DimensionSelectionType dimensionSelector=DimensionSelectionType()) |
Train the random forest on the given labeled training data with the given number of trees. More... | |
template < typename MatType > | |
double | Train (const MatType &data, const data::DatasetInfo &datasetInfo, const arma::Row< size_t > &labels, const size_t numClasses, const size_t numTrees=20, const size_t minimumLeafSize=1, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, const bool warmStart=false, DimensionSelectionType dimensionSelector=DimensionSelectionType()) |
Train the random forest on the given labeled training data with the given dataset info and the given number of trees. More... | |
template < typename MatType > | |
double | Train (const MatType &data, const arma::Row< size_t > &labels, const size_t numClasses, const arma::rowvec &weights, const size_t numTrees=20, const size_t minimumLeafSize=1, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, const bool warmStart=false, DimensionSelectionType dimensionSelector=DimensionSelectionType()) |
Train the random forest on the given weighted labeled training data with the given number of trees. More... | |
template < typename MatType > | |
double | Train (const MatType &data, const data::DatasetInfo &datasetInfo, const arma::Row< size_t > &labels, const size_t numClasses, const arma::rowvec &weights, const size_t numTrees=20, const size_t minimumLeafSize=1, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, const bool warmStart=false, DimensionSelectionType dimensionSelector=DimensionSelectionType()) |
Train the random forest on the given weighted labeled training data with the given dataset info and the given number of trees. More... | |
const DecisionTreeType & | Tree (const size_t i) const |
Access a tree in the forest. More... | |
DecisionTreeType & | Tree (const size_t i) |
Modify a tree in the forest (be careful!). More... | |
The RandomForest class provides an implementation of random forests, described in Breiman's seminal paper:
Definition at line 44 of file random_forest.hpp.
typedef DecisionTree<FitnessFunction, NumericSplitType, CategoricalSplitType, DimensionSelectionType> DecisionTreeType |
Allow access to the underlying decision tree type.
Definition at line 49 of file random_forest.hpp.
RandomForest | ( | ) |
Construct the random forest without any training or specifying the number of trees.
Predict() will throw an exception until Train() is called.
RandomForest | ( | const MatType & | dataset, |
const arma::Row< size_t > & | labels, | ||
const size_t | numClasses, | ||
const size_t | numTrees = 20 , |
||
const size_t | minimumLeafSize = 1 , |
||
const double | minimumGainSplit = 1e-7 , |
||
const size_t | maximumDepth = 0 , |
||
DimensionSelectionType | dimensionSelector = DimensionSelectionType() |
||
) |
Create a random forest, training on the given labeled training data with the given number of trees.
The minimumLeafSize and minimumGainSplit parameters are given to each individual decision tree during tree building. Optionally, you may specify a DimensionSelectionType to set parameters for the strategy used to choose dimensions.
dataset | Dataset to train on. |
labels | Labels for dataset. |
numClasses | Number of classes in dataset. |
numTrees | Number of trees in the forest. |
minimumLeafSize | Minimum number of points in each tree's leaf nodes. |
minimumGainSplit | Minimum gain for splitting a decision tree node. |
maximumDepth | Maximum depth for the tree. |
dimensionSelector | Instantiated dimension selection policy. |
RandomForest | ( | const MatType & | dataset, |
const data::DatasetInfo & | datasetInfo, | ||
const arma::Row< size_t > & | labels, | ||
const size_t | numClasses, | ||
const size_t | numTrees = 20 , |
||
const size_t | minimumLeafSize = 1 , |
||
const double | minimumGainSplit = 1e-7 , |
||
const size_t | maximumDepth = 0 , |
||
DimensionSelectionType | dimensionSelector = DimensionSelectionType() |
||
) |
Create a random forest, training on the given labeled training data with the given dataset info and the given number of trees.
The minimumLeafSize and minimumGainSplit parameters are given to each individual decision tree during tree building. Optionally, you may specify a DimensionSelectionType to set parameters for the strategy used to choose dimensions. This constructor can be used to train on categorical data.
dataset | Dataset to train on. |
datasetInfo | Dimension info for the dataset. |
labels | Labels for dataset. |
numClasses | Number of classes in dataset. |
numTrees | Number of trees in the forest. |
minimumLeafSize | Minimum number of points in each tree's leaf nodes. |
minimumGainSplit | Minimum gain for splitting a decision tree node. |
maximumDepth | Maximum depth for the tree. |
dimensionSelector | Instantiated dimension selection policy. |
RandomForest | ( | const MatType & | dataset, |
const arma::Row< size_t > & | labels, | ||
const size_t | numClasses, | ||
const arma::rowvec & | weights, | ||
const size_t | numTrees = 20 , |
||
const size_t | minimumLeafSize = 1 , |
||
const double | minimumGainSplit = 1e-7 , |
||
const size_t | maximumDepth = 0 , |
||
DimensionSelectionType | dimensionSelector = DimensionSelectionType() |
||
) |
Create a random forest, training on the given weighted labeled training data with the given number of trees.
The minimumLeafSize parameter is given to each individual decision tree during tree building.
dataset | Dataset to train on. |
labels | Labels for dataset. |
numClasses | Number of classes in dataset. |
weights | Weights (importances) of each point in the dataset. |
numTrees | Number of trees in the forest. |
minimumLeafSize | Minimum number of points in each tree's leaf nodes. |
minimumGainSplit | Minimum gain for splitting a decision tree node. |
maximumDepth | Maximum depth for the tree. |
dimensionSelector | Instantiated dimension selection policy. |
RandomForest | ( | const MatType & | dataset, |
const data::DatasetInfo & | datasetInfo, | ||
const arma::Row< size_t > & | labels, | ||
const size_t | numClasses, | ||
const arma::rowvec & | weights, | ||
const size_t | numTrees = 20 , |
||
const size_t | minimumLeafSize = 1 , |
||
const double | minimumGainSplit = 1e-7 , |
||
const size_t | maximumDepth = 0 , |
||
DimensionSelectionType | dimensionSelector = DimensionSelectionType() |
||
) |
Create a random forest, training on the given weighted labeled training data with the given dataset info and the given number of trees.
The minimumLeafSize and minimumGainSplit parameters are given to each individual decision tree during tree building. Optionally, you may specify a DimensionSelectionType to set parameters for the strategy used to choose dimensions. This can be used for categorical weighted training.
dataset | Dataset to train on. |
datasetInfo | Dimension info for the dataset. |
labels | Labels for dataset. |
numClasses | Number of classes in dataset. |
weights | Weights (importances) of each point in the dataset. |
numTrees | Number of trees in the forest. |
minimumLeafSize | Minimum number of points in each tree's leaf nodes. |
minimumGainSplit | Minimum gain for splitting a decision tree node. |
maximumDepth | Maximum depth for the tree. |
dimensionSelector | Instantiated dimension selection policy. |
size_t Classify | ( | const VecType & | point | ) | const |
Predict the class of the given point.
If the random forest has not been trained, this will throw an exception.
point | Point to be classified. |
void Classify | ( | const VecType & | point, |
size_t & | prediction, | ||
arma::vec & | probabilities | ||
) | const |
Predict the class of the given point and return the predicted class probabilities for each class.
If the random forest has not been trained, this will throw an exception.
point | Point to be classified. |
prediction | size_t to store predicted class in. |
probabilities | Output vector of class probabilities. |
void Classify | ( | const MatType & | data, |
arma::Row< size_t > & | predictions | ||
) | const |
Predict the classes of each point in the given dataset.
If the random forest has not been trained, this will throw an exception.
data | Dataset to be classified. |
predictions | Output predictions for each point in the dataset. |
void Classify | ( | const MatType & | data, |
arma::Row< size_t > & | predictions, | ||
arma::mat & | probabilities | ||
) | const |
Predict the classes of each point in the given dataset, also returning the predicted class probabilities for each point.
If the random forest has not been trained, this will throw an exception.
data | Dataset to be classified. |
predictions | Output predictions for each point in the dataset. |
probabilities | Output matrix of class probabilities for each point. |
|
inline |
Get the number of trees in the forest.
Definition at line 362 of file random_forest.hpp.
References RandomForest< FitnessFunction, DimensionSelectionType, NumericSplitType, CategoricalSplitType, UseBootstrap >::serialize(), and RandomForest< FitnessFunction, DimensionSelectionType, NumericSplitType, CategoricalSplitType, UseBootstrap >::Train().
void serialize | ( | Archive & | ar, |
const uint32_t | |||
) |
Serialize the random forest.
Referenced by RandomForest< FitnessFunction, DimensionSelectionType, NumericSplitType, CategoricalSplitType, UseBootstrap >::NumTrees().
double Train | ( | const MatType & | data, |
const arma::Row< size_t > & | labels, | ||
const size_t | numClasses, | ||
const size_t | numTrees = 20 , |
||
const size_t | minimumLeafSize = 1 , |
||
const double | minimumGainSplit = 1e-7 , |
||
const size_t | maximumDepth = 0 , |
||
const bool | warmStart = false , |
||
DimensionSelectionType | dimensionSelector = DimensionSelectionType() |
||
) |
Train the random forest on the given labeled training data with the given number of trees.
The minimumLeafSize and minimumGainSplit parameters are given to each individual decision tree during tree building. Optionally, you may specify a DimensionSelectionType to set parameters for the strategy used to choose dimensions.
data | Dataset to train on. |
labels | Labels for dataset. |
numClasses | Number of classes in dataset. |
numTrees | Number of trees in the forest. |
minimumLeafSize | Minimum number of points in each tree's leaf nodes. |
minimumGainSplit | Minimum gain for splitting a decision tree node. |
maximumDepth | Maximum depth for the tree. |
warmStart | When set to true , it adds numTrees new trees to the existing random forest otherwise a new forest is trained from scratch. |
dimensionSelector | Instantiated dimension selection policy. |
Referenced by RandomForest< FitnessFunction, DimensionSelectionType, NumericSplitType, CategoricalSplitType, UseBootstrap >::NumTrees().
double Train | ( | const MatType & | data, |
const data::DatasetInfo & | datasetInfo, | ||
const arma::Row< size_t > & | labels, | ||
const size_t | numClasses, | ||
const size_t | numTrees = 20 , |
||
const size_t | minimumLeafSize = 1 , |
||
const double | minimumGainSplit = 1e-7 , |
||
const size_t | maximumDepth = 0 , |
||
const bool | warmStart = false , |
||
DimensionSelectionType | dimensionSelector = DimensionSelectionType() |
||
) |
Train the random forest on the given labeled training data with the given dataset info and the given number of trees.
The minimumLeafSize parameter is given to each individual decision tree during tree building. Optionally, you may specify a DimensionSelectionType to set parameters for the strategy used to choose dimensions. This overload can be used to train on categorical data.
data | Dataset to train on. |
datasetInfo | Dimension info for the dataset. |
labels | Labels for dataset. |
numClasses | Number of classes in dataset. |
numTrees | Number of trees in the forest. |
minimumLeafSize | Minimum number of points in each tree's leaf nodes. |
minimumGainSplit | Minimum gain for splitting a decision tree node. |
maximumDepth | Maximum depth for the tree. |
warmStart | When set to true , it adds numTrees new trees to the existing random forest else a new forest is trained from scratch. |
dimensionSelector | Instantiated dimension selection policy. |
double Train | ( | const MatType & | data, |
const arma::Row< size_t > & | labels, | ||
const size_t | numClasses, | ||
const arma::rowvec & | weights, | ||
const size_t | numTrees = 20 , |
||
const size_t | minimumLeafSize = 1 , |
||
const double | minimumGainSplit = 1e-7 , |
||
const size_t | maximumDepth = 0 , |
||
const bool | warmStart = false , |
||
DimensionSelectionType | dimensionSelector = DimensionSelectionType() |
||
) |
Train the random forest on the given weighted labeled training data with the given number of trees.
The minimumLeafSize and minimumGainSplit parameters are given to each individual decision tree during tree building. Optionally, you may specify a DimensionSelectionType to set parameters for the strategy used to choose dimensions.
data | Dataset to train on. |
labels | Labels for dataset. |
numClasses | Number of classes in dataset. |
weights | Weights (importances) of each point in the dataset. |
numTrees | Number of trees in the forest. |
minimumLeafSize | Minimum number of points in each tree's leaf nodes. |
minimumGainSplit | Minimum gain for splitting a decision tree node. |
maximumDepth | Maximum depth for the tree. |
warmStart | When set to true , it adds numTrees new trees to the existing random forest else a new forest is trained from scratch. |
dimensionSelector | Instantiated dimension selection policy. |
double Train | ( | const MatType & | data, |
const data::DatasetInfo & | datasetInfo, | ||
const arma::Row< size_t > & | labels, | ||
const size_t | numClasses, | ||
const arma::rowvec & | weights, | ||
const size_t | numTrees = 20 , |
||
const size_t | minimumLeafSize = 1 , |
||
const double | minimumGainSplit = 1e-7 , |
||
const size_t | maximumDepth = 0 , |
||
const bool | warmStart = false , |
||
DimensionSelectionType | dimensionSelector = DimensionSelectionType() |
||
) |
Train the random forest on the given weighted labeled training data with the given dataset info and the given number of trees.
The minimumLeafSize and minimumGainSplit parameters are given to each individual decision tree during tree building. Optionally, you may specify a DimensionSelectionType to set parameters for the strategy used to choose dimensions. This overload can be used for categorical weighted training.
data | Dataset to train on. |
datasetInfo | Dimension info for the dataset. |
labels | Labels for dataset. |
numClasses | Number of classes in dataset. |
weights | Weights (importances) of each point in the dataset. |
numTrees | Number of trees in the forest. |
minimumLeafSize | Minimum number of points in each tree's leaf nodes. |
minimumGainSplit | Minimum gain for splitting a decision tree node. |
maximumDepth | Maximum depth for the tree. |
warmStart | When set to true , it adds numTrees new trees to the existing random forest else a new forest is trained from scratch. |
dimensionSelector | Instantiated dimension selection policy. |
|
inline |
Access a tree in the forest.
Definition at line 357 of file random_forest.hpp.
|
inline |
Modify a tree in the forest (be careful!).
Definition at line 359 of file random_forest.hpp.