A hybrid spill tree is a variant of binary space trees in which the children of a node can "spill over" each other, and contain shared datapoints. More...
Classes | |
class | SpillDualTreeTraverser |
A generic dual-tree traverser for hybrid spill trees; see spill_dual_tree_traverser.hpp for implementation. More... | |
class | SpillSingleTreeTraverser |
A generic single-tree traverser for hybrid spill trees; see spill_single_tree_traverser.hpp for implementation. More... | |
Public Types | |
typedef HyperplaneType< MetricType >::BoundType | BoundType |
The bound type. More... | |
template < typename RuleType > | |
using | DefeatistDualTreeTraverser = SpillDualTreeTraverser< RuleType, true > |
A defeatist dual-tree traverser for hybrid spill trees. More... | |
template < typename RuleType > | |
using | DefeatistSingleTreeTraverser = SpillSingleTreeTraverser< RuleType, true > |
A defeatist single-tree traverser for hybrid spill trees. More... | |
template < typename RuleType > | |
using | DualTreeTraverser = SpillDualTreeTraverser< RuleType, false > |
A dual-tree traverser for hybrid spill trees. More... | |
typedef MatType::elem_type | ElemType |
The type of element held in MatType. More... | |
typedef MatType | Mat |
So other classes can use TreeType::Mat. More... | |
template < typename RuleType > | |
using | SingleTreeTraverser = SpillSingleTreeTraverser< RuleType, false > |
A single-tree traverser for hybrid spill trees. More... | |
Public Member Functions | |
SpillTree (const MatType &data, const double tau=0, const size_t maxLeafSize=20, const double rho=0.7) | |
Construct this as the root node of a hybrid spill tree using the given dataset. More... | |
SpillTree (MatType &&data, const double tau=0, const size_t maxLeafSize=20, const double rho=0.7) | |
Construct this as the root node of a hybrid spill tree using the given dataset. More... | |
SpillTree (SpillTree *parent, arma::Col< size_t > &points, const double tau=0, const size_t maxLeafSize=20, const double rho=0.7) | |
Construct this node as a child of the given parent, including the given list of points. More... | |
SpillTree (const SpillTree &other) | |
Create a hybrid spill tree by copying the other tree. More... | |
SpillTree (SpillTree &&other) | |
Move constructor for a SpillTree; possess all the members of the given tree. More... | |
template < typename Archive > | |
SpillTree (Archive &ar, const typename std::enable_if_t< cereal::is_loading< Archive >()> *=0) | |
Initialize the tree from a cereal archive. More... | |
~SpillTree () | |
Deletes this node, deallocating the memory for the children and calling their destructors in turn. More... | |
const BoundType & | Bound () const |
Return the bound object for this node. More... | |
BoundType & | Bound () |
Return the bound object for this node. More... | |
void | Center (arma::vec ¢er) |
Store the center of the bounding region in the given vector. More... | |
SpillTree & | Child (const size_t child) const |
Return the specified child (0 will be left, 1 will be right). More... | |
SpillTree *& | ChildPtr (const size_t child) |
const MatType & | Dataset () const |
Get the dataset which the tree is built on. More... | |
size_t | Descendant (const size_t index) const |
Return the index (with reference to the dataset) of a particular descendant of this node. More... | |
ElemType | FurthestDescendantDistance () const |
Return the furthest possible descendant distance. More... | |
ElemType | FurthestPointDistance () const |
Return the furthest distance to a point held in this node. More... | |
template < typename VecType > | |
size_t | GetFurthestChild (const VecType &point, typename std::enable_if_t< IsVector< VecType >::value > *=0) |
Return the index of the furthest child node to the given query point (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the furthest). More... | |
size_t | GetFurthestChild (const SpillTree &queryNode) |
Return the index of the furthest child node to the given query node (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the furthest). More... | |
template < typename VecType > | |
size_t | GetNearestChild (const VecType &point, typename std::enable_if_t< IsVector< VecType >::value > *=0) |
Return the index of the nearest child node to the given query point (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the nearest). More... | |
size_t | GetNearestChild (const SpillTree &queryNode) |
Return the index of the nearest child node to the given query node (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the nearest). More... | |
const HyperplaneType< MetricType > & | Hyperplane () const |
Get the Hyperplane instance. More... | |
bool | IsLeaf () const |
Return whether or not this node is a leaf (true if it has no children). More... | |
SpillTree * | Left () const |
Gets the left child of this node. More... | |
SpillTree *& | Left () |
Modify the left child of this node. More... | |
ElemType | MaxDistance (const SpillTree &other) const |
Return the maximum distance to another node. More... | |
template < typename VecType > | |
ElemType | MaxDistance (const VecType &point, typename std::enable_if_t< IsVector< VecType >::value > *=0) const |
Return the maximum distance to another point. More... | |
MetricType | Metric () const |
Get the metric that the tree uses. More... | |
ElemType | MinDistance (const SpillTree &other) const |
Return the minimum distance to another node. More... | |
template < typename VecType > | |
ElemType | MinDistance (const VecType &point, typename std::enable_if_t< IsVector< VecType >::value > *=0) const |
Return the minimum distance to another point. More... | |
ElemType | MinimumBoundDistance () const |
Return the minimum distance from the center of the node to any bound edge. More... | |
size_t | NumChildren () const |
Return the number of children in this node. More... | |
size_t | NumDescendants () const |
Return the number of descendants of this node. More... | |
size_t | NumPoints () const |
Return the number of points in this node (0 if not a leaf). More... | |
SpillTree & | operator= (const SpillTree &other) |
Copy the given Spill Tree. More... | |
SpillTree & | operator= (SpillTree &&other) |
Take ownership of the given Spill Tree. More... | |
bool | Overlap () const |
Distinguish overlapping nodes from non-overlapping nodes. More... | |
SpillTree * | Parent () const |
Gets the parent of this node. More... | |
SpillTree *& | Parent () |
Modify the parent of this node. More... | |
ElemType | ParentDistance () const |
Return the distance from the center of this node to the center of the parent node. More... | |
ElemType & | ParentDistance () |
Modify the distance from the center of this node to the center of the parent node. More... | |
size_t | Point (const size_t index) const |
Return the index (with reference to the dataset) of a particular point in this node. More... | |
math::RangeType< ElemType > | RangeDistance (const SpillTree &other) const |
Return the minimum and maximum distance to another node. More... | |
template < typename VecType > | |
math::RangeType< ElemType > | RangeDistance (const VecType &point, typename std::enable_if_t< IsVector< VecType >::value > *=0) const |
Return the minimum and maximum distance to another point. More... | |
SpillTree * | Right () const |
Gets the right child of this node. More... | |
SpillTree *& | Right () |
Modify the right child of this node. More... | |
template < typename Archive > | |
void | serialize (Archive &ar, const uint32_t version) |
Serialize the tree. More... | |
const StatisticType & | Stat () const |
Return the statistic object for this node. More... | |
StatisticType & | Stat () |
Return the statistic object for this node. More... | |
Static Public Member Functions | |
static bool | HasSelfChildren () |
Returns false: this tree type does not have self children. More... | |
Protected Member Functions | |
SpillTree () | |
A default constructor. More... | |
A hybrid spill tree is a variant of binary space trees in which the children of a node can "spill over" each other, and contain shared datapoints.
Two new separating planes lplane and rplane are defined, both of which are parallel to the original decision boundary and at a distance tau from it. The region between lplane and rplane is called "overlapping buffer".
For each node, we first split the points considering the overlapping buffer. If either of its children contains more than rho fraction of the total points we undo the overlapping splitting. Instead a conventional partition is used. In this way, we can ensure that each split reduces the number of points of a node by at least a constant factor.
This particular tree does not allow growth, so you cannot add or delete nodes from it. If you need to add or delete a node, the better procedure is to rebuild the tree entirely.
Three runtime parameters are required in the constructor:
For more information on spill trees, see
MetricType | The metric used for tree-building. |
StatisticType | Extra data contained in the node. See statistic.hpp for the necessary skeleton interface. |
MatType | The dataset class. |
HyperplaneType | The splitting hyperplane class. |
SplitType | The class that partitions the dataset/points at a particular node into two parts. Its definition decides the way this split is done. |
Definition at line 73 of file spill_tree.hpp.
The bound type.
Definition at line 81 of file spill_tree.hpp.
using DefeatistDualTreeTraverser = SpillDualTreeTraverser<RuleType, true> |
A defeatist dual-tree traverser for hybrid spill trees.
Definition at line 146 of file spill_tree.hpp.
using DefeatistSingleTreeTraverser = SpillSingleTreeTraverser<RuleType, true> |
A defeatist single-tree traverser for hybrid spill trees.
Definition at line 138 of file spill_tree.hpp.
using DualTreeTraverser = SpillDualTreeTraverser<RuleType, false> |
A dual-tree traverser for hybrid spill trees.
Definition at line 142 of file spill_tree.hpp.
typedef MatType::elem_type ElemType |
The type of element held in MatType.
Definition at line 79 of file spill_tree.hpp.
typedef MatType Mat |
So other classes can use TreeType::Mat.
Definition at line 77 of file spill_tree.hpp.
using SingleTreeTraverser = SpillSingleTreeTraverser<RuleType, false> |
A single-tree traverser for hybrid spill trees.
Definition at line 134 of file spill_tree.hpp.
SpillTree | ( | const MatType & | data, |
const double | tau = 0 , |
||
const size_t | maxLeafSize = 20 , |
||
const double | rho = 0.7 |
||
) |
Construct this as the root node of a hybrid spill tree using the given dataset.
The dataset will not be modified during the building procedure (unlike BinarySpaceTree).
data | Dataset to create tree from. |
tau | Overlapping size. |
maxLeafSize | Size of each leaf in the tree. |
rho | Balance threshold. |
SpillTree | ( | MatType && | data, |
const double | tau = 0 , |
||
const size_t | maxLeafSize = 20 , |
||
const double | rho = 0.7 |
||
) |
Construct this as the root node of a hybrid spill tree using the given dataset.
This will take ownership of the data matrix; if you don't want this, consider using the constructor that takes a const reference to a dataset.
data | Dataset to create tree from. |
tau | Overlapping size. |
maxLeafSize | Size of each leaf in the tree. |
rho | Balance threshold. |
SpillTree | ( | SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > * | parent, |
arma::Col< size_t > & | points, | ||
const double | tau = 0 , |
||
const size_t | maxLeafSize = 20 , |
||
const double | rho = 0.7 |
||
) |
Construct this node as a child of the given parent, including the given list of points.
This is used for recursive tree-building by the other constructors which don't specify point indices.
parent | Parent of this node. |
points | Vector of indexes of points to be included in this node. |
tau | Overlapping size. |
maxLeafSize | Size of each leaf in the tree. |
rho | Balance threshold. |
SpillTree | ( | const SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > & | other | ) |
Create a hybrid spill tree by copying the other tree.
Be careful! This can take a long time and use a lot of memory.
other | tree to be replicated. |
Move constructor for a SpillTree; possess all the members of the given tree.
other | tree to be moved. |
SpillTree | ( | Archive & | ar, |
const typename std::enable_if_t< cereal::is_loading< Archive >()> * | = 0 |
||
) |
Initialize the tree from a cereal archive.
ar | Archive to load tree from. Must be an iarchive, not an oarchive. |
~SpillTree | ( | ) |
Deletes this node, deallocating the memory for the children and calling their destructors in turn.
This will invalidate any pointers or references to any nodes which are children of this one.
|
protected |
A default constructor.
This is meant to only be used with cereal, which is allowed with the friend declaration below. This does not return a valid tree! The method must be protected, so that the serialization shim can work with the default constructor.
Referenced by SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::Center().
|
inline |
Return the bound object for this node.
Definition at line 244 of file spill_tree.hpp.
Referenced by SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::MaxDistance(), SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::MinDistance(), and SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::RangeDistance().
|
inline |
Return the bound object for this node.
Definition at line 246 of file spill_tree.hpp.
|
inline |
Store the center of the bounding region in the given vector.
Definition at line 438 of file spill_tree.hpp.
References SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::SpillTree().
SpillTree& Child | ( | const size_t | child | ) | const |
Return the specified child (0 will be left, 1 will be right).
If the index is greater than 1, this will return the right child.
child | Index of child to return. |
Referenced by SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::ParentDistance().
|
inline |
Definition at line 357 of file spill_tree.hpp.
References SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::Descendant(), SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::NumDescendants(), SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::NumPoints(), and SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::Point().
|
inline |
Get the dataset which the tree is built on.
Definition at line 272 of file spill_tree.hpp.
size_t Descendant | ( | const size_t | index | ) | const |
Return the index (with reference to the dataset) of a particular descendant of this node.
The index should be greater than zero but less than the number of descendants.
index | Index of the descendant. |
Referenced by SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::ChildPtr().
ElemType FurthestDescendantDistance | ( | ) | const |
Return the furthest possible descendant distance.
This returns the maximum distance from the centroid to the edge of the bound and not the empirical quantity which is the actual furthest descendant distance. So the actual furthest descendant distance may be less than what this method returns (but it will never be greater than this).
Referenced by SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::Metric().
ElemType FurthestPointDistance | ( | ) | const |
Return the furthest distance to a point held in this node.
If this is not a leaf node, then the distance is 0 because the node holds no points.
Referenced by SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::Metric().
size_t GetFurthestChild | ( | const VecType & | point, |
typename std::enable_if_t< IsVector< VecType >::value > * | = 0 |
||
) |
Return the index of the furthest child node to the given query point (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the furthest).
If this is a leaf node, it will return NumChildren() (invalid index).
Referenced by SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::Metric().
size_t GetFurthestChild | ( | const SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > & | queryNode | ) |
Return the index of the furthest child node to the given query node (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the furthest).
If it can't decide it will return NumChildren() (invalid index).
size_t GetNearestChild | ( | const VecType & | point, |
typename std::enable_if_t< IsVector< VecType >::value > * | = 0 |
||
) |
Return the index of the nearest child node to the given query point (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the nearest).
If this is a leaf node, it will return NumChildren() (invalid index).
Referenced by SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::Metric().
size_t GetNearestChild | ( | const SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > & | queryNode | ) |
Return the index of the nearest child node to the given query node (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the nearest).
If it can't decide it will return NumChildren() (invalid index).
|
inlinestatic |
Returns false: this tree type does not have self children.
Definition at line 435 of file spill_tree.hpp.
|
inline |
Get the Hyperplane instance.
Definition at line 278 of file spill_tree.hpp.
bool IsLeaf | ( | ) | const |
Return whether or not this node is a leaf (true if it has no children).
Referenced by SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::Stat().
|
inline |
Gets the left child of this node.
Definition at line 257 of file spill_tree.hpp.
|
inline |
Modify the left child of this node.
Definition at line 259 of file spill_tree.hpp.
|
inline |
Return the maximum distance to another node.
Definition at line 396 of file spill_tree.hpp.
References SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::Bound().
|
inline |
Return the maximum distance to another point.
Definition at line 418 of file spill_tree.hpp.
|
inline |
Get the metric that the tree uses.
Definition at line 281 of file spill_tree.hpp.
References SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::FurthestDescendantDistance(), SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::FurthestPointDistance(), SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::GetFurthestChild(), SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::GetNearestChild(), SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::MinimumBoundDistance(), and SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::NumChildren().
|
inline |
Return the minimum distance to another node.
Definition at line 390 of file spill_tree.hpp.
References SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::Bound().
|
inline |
Return the minimum distance to another point.
Definition at line 409 of file spill_tree.hpp.
ElemType MinimumBoundDistance | ( | ) | const |
Return the minimum distance from the center of the node to any bound edge.
Referenced by SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::Metric().
size_t NumChildren | ( | ) | const |
Return the number of children in this node.
Referenced by SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::Metric().
size_t NumDescendants | ( | ) | const |
Return the number of descendants of this node.
For a non-leaf spill tree, this is the number of points at the descendant leaves. For a leaf, this is the number of points in the leaf.
Referenced by SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::ChildPtr().
size_t NumPoints | ( | ) | const |
Return the number of points in this node (0 if not a leaf).
Referenced by SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::ChildPtr().
SpillTree& operator= | ( | const SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > & | other | ) |
Copy the given Spill Tree.
other | The tree to be copied. |
SpillTree& operator= | ( | SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > && | other | ) |
Take ownership of the given Spill Tree.
other | The tree to take ownership of. |
|
inline |
Distinguish overlapping nodes from non-overlapping nodes.
Definition at line 275 of file spill_tree.hpp.
|
inline |
Gets the parent of this node.
Definition at line 267 of file spill_tree.hpp.
|
inline |
Modify the parent of this node.
Definition at line 269 of file spill_tree.hpp.
|
inline |
Return the distance from the center of this node to the center of the parent node.
Definition at line 344 of file spill_tree.hpp.
|
inline |
Modify the distance from the center of this node to the center of the parent node.
Definition at line 347 of file spill_tree.hpp.
References SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::Child().
size_t Point | ( | const size_t | index | ) | const |
Return the index (with reference to the dataset) of a particular point in this node.
This will happily return invalid indices if the given index is greater than the number of points in this node (obtained with NumPoints()) – be careful.
index | Index of point for which a dataset index is wanted. |
Referenced by SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::ChildPtr().
|
inline |
Return the minimum and maximum distance to another node.
Definition at line 402 of file spill_tree.hpp.
References SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::Bound().
|
inline |
Return the minimum and maximum distance to another point.
Definition at line 428 of file spill_tree.hpp.
|
inline |
Gets the right child of this node.
Definition at line 262 of file spill_tree.hpp.
|
inline |
Modify the right child of this node.
Definition at line 264 of file spill_tree.hpp.
void serialize | ( | Archive & | ar, |
const uint32_t | version | ||
) |
Serialize the tree.
|
inline |
Return the statistic object for this node.
Definition at line 249 of file spill_tree.hpp.
|
inline |
Return the statistic object for this node.
Definition at line 251 of file spill_tree.hpp.
References SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::IsLeaf().