SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > Class Template Reference

A hybrid spill tree is a variant of binary space trees in which the children of a node can "spill over" each other, and contain shared datapoints. More...

Classes

class  SpillDualTreeTraverser
 A generic dual-tree traverser for hybrid spill trees; see spill_dual_tree_traverser.hpp for implementation. More...

 
class  SpillSingleTreeTraverser
 A generic single-tree traverser for hybrid spill trees; see spill_single_tree_traverser.hpp for implementation. More...

 

Public Types

typedef HyperplaneType< MetricType >::BoundType BoundType
 The bound type. More...

 
template
<
typename
RuleType
>
using DefeatistDualTreeTraverser = SpillDualTreeTraverser< RuleType, true >
 A defeatist dual-tree traverser for hybrid spill trees. More...

 
template
<
typename
RuleType
>
using DefeatistSingleTreeTraverser = SpillSingleTreeTraverser< RuleType, true >
 A defeatist single-tree traverser for hybrid spill trees. More...

 
template
<
typename
RuleType
>
using DualTreeTraverser = SpillDualTreeTraverser< RuleType, false >
 A dual-tree traverser for hybrid spill trees. More...

 
typedef MatType::elem_type ElemType
 The type of element held in MatType. More...

 
typedef MatType Mat
 So other classes can use TreeType::Mat. More...

 
template
<
typename
RuleType
>
using SingleTreeTraverser = SpillSingleTreeTraverser< RuleType, false >
 A single-tree traverser for hybrid spill trees. More...

 

Public Member Functions

 SpillTree (const MatType &data, const double tau=0, const size_t maxLeafSize=20, const double rho=0.7)
 Construct this as the root node of a hybrid spill tree using the given dataset. More...

 
 SpillTree (MatType &&data, const double tau=0, const size_t maxLeafSize=20, const double rho=0.7)
 Construct this as the root node of a hybrid spill tree using the given dataset. More...

 
 SpillTree (SpillTree *parent, arma::Col< size_t > &points, const double tau=0, const size_t maxLeafSize=20, const double rho=0.7)
 Construct this node as a child of the given parent, including the given list of points. More...

 
 SpillTree (const SpillTree &other)
 Create a hybrid spill tree by copying the other tree. More...

 
 SpillTree (SpillTree &&other)
 Move constructor for a SpillTree; possess all the members of the given tree. More...

 
template
<
typename
Archive
>
 SpillTree (Archive &ar, const typename std::enable_if_t< cereal::is_loading< Archive >()> *=0)
 Initialize the tree from a cereal archive. More...

 
 ~SpillTree ()
 Deletes this node, deallocating the memory for the children and calling their destructors in turn. More...

 
const BoundTypeBound () const
 Return the bound object for this node. More...

 
BoundTypeBound ()
 Return the bound object for this node. More...

 
void Center (arma::vec &center)
 Store the center of the bounding region in the given vector. More...

 
SpillTreeChild (const size_t child) const
 Return the specified child (0 will be left, 1 will be right). More...

 
SpillTree *& ChildPtr (const size_t child)
 
const MatType & Dataset () const
 Get the dataset which the tree is built on. More...

 
size_t Descendant (const size_t index) const
 Return the index (with reference to the dataset) of a particular descendant of this node. More...

 
ElemType FurthestDescendantDistance () const
 Return the furthest possible descendant distance. More...

 
ElemType FurthestPointDistance () const
 Return the furthest distance to a point held in this node. More...

 
template
<
typename
VecType
>
size_t GetFurthestChild (const VecType &point, typename std::enable_if_t< IsVector< VecType >::value > *=0)
 Return the index of the furthest child node to the given query point (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the furthest). More...

 
size_t GetFurthestChild (const SpillTree &queryNode)
 Return the index of the furthest child node to the given query node (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the furthest). More...

 
template
<
typename
VecType
>
size_t GetNearestChild (const VecType &point, typename std::enable_if_t< IsVector< VecType >::value > *=0)
 Return the index of the nearest child node to the given query point (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the nearest). More...

 
size_t GetNearestChild (const SpillTree &queryNode)
 Return the index of the nearest child node to the given query node (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the nearest). More...

 
const HyperplaneType< MetricType > & Hyperplane () const
 Get the Hyperplane instance. More...

 
bool IsLeaf () const
 Return whether or not this node is a leaf (true if it has no children). More...

 
SpillTreeLeft () const
 Gets the left child of this node. More...

 
SpillTree *& Left ()
 Modify the left child of this node. More...

 
ElemType MaxDistance (const SpillTree &other) const
 Return the maximum distance to another node. More...

 
template
<
typename
VecType
>
ElemType MaxDistance (const VecType &point, typename std::enable_if_t< IsVector< VecType >::value > *=0) const
 Return the maximum distance to another point. More...

 
MetricType Metric () const
 Get the metric that the tree uses. More...

 
ElemType MinDistance (const SpillTree &other) const
 Return the minimum distance to another node. More...

 
template
<
typename
VecType
>
ElemType MinDistance (const VecType &point, typename std::enable_if_t< IsVector< VecType >::value > *=0) const
 Return the minimum distance to another point. More...

 
ElemType MinimumBoundDistance () const
 Return the minimum distance from the center of the node to any bound edge. More...

 
size_t NumChildren () const
 Return the number of children in this node. More...

 
size_t NumDescendants () const
 Return the number of descendants of this node. More...

 
size_t NumPoints () const
 Return the number of points in this node (0 if not a leaf). More...

 
SpillTreeoperator= (const SpillTree &other)
 Copy the given Spill Tree. More...

 
SpillTreeoperator= (SpillTree &&other)
 Take ownership of the given Spill Tree. More...

 
bool Overlap () const
 Distinguish overlapping nodes from non-overlapping nodes. More...

 
SpillTreeParent () const
 Gets the parent of this node. More...

 
SpillTree *& Parent ()
 Modify the parent of this node. More...

 
ElemType ParentDistance () const
 Return the distance from the center of this node to the center of the parent node. More...

 
ElemTypeParentDistance ()
 Modify the distance from the center of this node to the center of the parent node. More...

 
size_t Point (const size_t index) const
 Return the index (with reference to the dataset) of a particular point in this node. More...

 
math::RangeType< ElemTypeRangeDistance (const SpillTree &other) const
 Return the minimum and maximum distance to another node. More...

 
template
<
typename
VecType
>
math::RangeType< ElemTypeRangeDistance (const VecType &point, typename std::enable_if_t< IsVector< VecType >::value > *=0) const
 Return the minimum and maximum distance to another point. More...

 
SpillTreeRight () const
 Gets the right child of this node. More...

 
SpillTree *& Right ()
 Modify the right child of this node. More...

 
template
<
typename
Archive
>
void serialize (Archive &ar, const uint32_t version)
 Serialize the tree. More...

 
const StatisticType & Stat () const
 Return the statistic object for this node. More...

 
StatisticType & Stat ()
 Return the statistic object for this node. More...

 

Static Public Member Functions

static bool HasSelfChildren ()
 Returns false: this tree type does not have self children. More...

 

Protected Member Functions

 SpillTree ()
 A default constructor. More...

 

Detailed Description


template
<
typename
MetricType
,
typename
StatisticType
=
EmptyStatistic
,
typename
MatType
=
arma::mat
,
template
<
typename
HyperplaneMetricType
>
class
HyperplaneType
=
AxisOrthogonalHyperplane
,
template
<
typename
SplitMetricType
,
typename
SplitMatType
>
class
SplitType
=
MidpointSpaceSplit
>

class mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >

A hybrid spill tree is a variant of binary space trees in which the children of a node can "spill over" each other, and contain shared datapoints.

Two new separating planes lplane and rplane are defined, both of which are parallel to the original decision boundary and at a distance tau from it. The region between lplane and rplane is called "overlapping buffer".

For each node, we first split the points considering the overlapping buffer. If either of its children contains more than rho fraction of the total points we undo the overlapping splitting. Instead a conventional partition is used. In this way, we can ensure that each split reduces the number of points of a node by at least a constant factor.

This particular tree does not allow growth, so you cannot add or delete nodes from it. If you need to add or delete a node, the better procedure is to rebuild the tree entirely.

Three runtime parameters are required in the constructor:

  • maxLeafSize: Max leaf size to be used.
  • tau: Overlapping size.
  • rho: Balance threshold.

For more information on spill trees, see

@inproceedings{
author = {Ting Liu, Andrew W. Moore, Alexander Gray and Ke Yang},
title = {An Investigation of Practical Approximate Nearest Neighbor
Algorithms},
booktitle = {Advances in Neural Information Processing Systems 17},
year = {2005},
pages = {825--832}
}
Template Parameters
MetricTypeThe metric used for tree-building.
StatisticTypeExtra data contained in the node. See statistic.hpp for the necessary skeleton interface.
MatTypeThe dataset class.
HyperplaneTypeThe splitting hyperplane class.
SplitTypeThe class that partitions the dataset/points at a particular node into two parts. Its definition decides the way this split is done.

Definition at line 73 of file spill_tree.hpp.

Member Typedef Documentation

◆ BoundType

typedef HyperplaneType<MetricType>::BoundType BoundType

The bound type.

Definition at line 81 of file spill_tree.hpp.

◆ DefeatistDualTreeTraverser

A defeatist dual-tree traverser for hybrid spill trees.

Definition at line 146 of file spill_tree.hpp.

◆ DefeatistSingleTreeTraverser

A defeatist single-tree traverser for hybrid spill trees.

Definition at line 138 of file spill_tree.hpp.

◆ DualTreeTraverser

using DualTreeTraverser = SpillDualTreeTraverser<RuleType, false>

A dual-tree traverser for hybrid spill trees.

Definition at line 142 of file spill_tree.hpp.

◆ ElemType

typedef MatType::elem_type ElemType

The type of element held in MatType.

Definition at line 79 of file spill_tree.hpp.

◆ Mat

typedef MatType Mat

So other classes can use TreeType::Mat.

Definition at line 77 of file spill_tree.hpp.

◆ SingleTreeTraverser

A single-tree traverser for hybrid spill trees.

Definition at line 134 of file spill_tree.hpp.

Constructor & Destructor Documentation

◆ SpillTree() [1/7]

SpillTree ( const MatType &  data,
const double  tau = 0,
const size_t  maxLeafSize = 20,
const double  rho = 0.7 
)

Construct this as the root node of a hybrid spill tree using the given dataset.

The dataset will not be modified during the building procedure (unlike BinarySpaceTree).

Parameters
dataDataset to create tree from.
tauOverlapping size.
maxLeafSizeSize of each leaf in the tree.
rhoBalance threshold.

◆ SpillTree() [2/7]

SpillTree ( MatType &&  data,
const double  tau = 0,
const size_t  maxLeafSize = 20,
const double  rho = 0.7 
)

Construct this as the root node of a hybrid spill tree using the given dataset.

This will take ownership of the data matrix; if you don't want this, consider using the constructor that takes a const reference to a dataset.

Parameters
dataDataset to create tree from.
tauOverlapping size.
maxLeafSizeSize of each leaf in the tree.
rhoBalance threshold.

◆ SpillTree() [3/7]

SpillTree ( SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > *  parent,
arma::Col< size_t > &  points,
const double  tau = 0,
const size_t  maxLeafSize = 20,
const double  rho = 0.7 
)

Construct this node as a child of the given parent, including the given list of points.

This is used for recursive tree-building by the other constructors which don't specify point indices.

Parameters
parentParent of this node.
pointsVector of indexes of points to be included in this node.
tauOverlapping size.
maxLeafSizeSize of each leaf in the tree.
rhoBalance threshold.

◆ SpillTree() [4/7]

SpillTree ( const SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > &  other)

Create a hybrid spill tree by copying the other tree.

Be careful! This can take a long time and use a lot of memory.

Parameters
othertree to be replicated.

◆ SpillTree() [5/7]

SpillTree ( SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > &&  other)

Move constructor for a SpillTree; possess all the members of the given tree.

Parameters
othertree to be moved.

◆ SpillTree() [6/7]

SpillTree ( Archive &  ar,
const typename std::enable_if_t< cereal::is_loading< Archive >()> *  = 0 
)

Initialize the tree from a cereal archive.

Parameters
arArchive to load tree from. Must be an iarchive, not an oarchive.

◆ ~SpillTree()

~SpillTree ( )

Deletes this node, deallocating the memory for the children and calling their destructors in turn.

This will invalidate any pointers or references to any nodes which are children of this one.

◆ SpillTree() [7/7]

SpillTree ( )
protected

A default constructor.

This is meant to only be used with cereal, which is allowed with the friend declaration below. This does not return a valid tree! The method must be protected, so that the serialization shim can work with the default constructor.

Referenced by SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::Center().

Member Function Documentation

◆ Bound() [1/2]

◆ Bound() [2/2]

BoundType& Bound ( )
inline

Return the bound object for this node.

Definition at line 246 of file spill_tree.hpp.

◆ Center()

void Center ( arma::vec &  center)
inline

Store the center of the bounding region in the given vector.

Definition at line 438 of file spill_tree.hpp.

References SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::SpillTree().

◆ Child()

SpillTree& Child ( const size_t  child) const

Return the specified child (0 will be left, 1 will be right).

If the index is greater than 1, this will return the right child.

Parameters
childIndex of child to return.

Referenced by SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::ParentDistance().

◆ ChildPtr()

◆ Dataset()

const MatType& Dataset ( ) const
inline

Get the dataset which the tree is built on.

Definition at line 272 of file spill_tree.hpp.

◆ Descendant()

size_t Descendant ( const size_t  index) const

Return the index (with reference to the dataset) of a particular descendant of this node.

The index should be greater than zero but less than the number of descendants.

Parameters
indexIndex of the descendant.

Referenced by SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::ChildPtr().

◆ FurthestDescendantDistance()

ElemType FurthestDescendantDistance ( ) const

Return the furthest possible descendant distance.

This returns the maximum distance from the centroid to the edge of the bound and not the empirical quantity which is the actual furthest descendant distance. So the actual furthest descendant distance may be less than what this method returns (but it will never be greater than this).

Referenced by SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::Metric().

◆ FurthestPointDistance()

ElemType FurthestPointDistance ( ) const

Return the furthest distance to a point held in this node.

If this is not a leaf node, then the distance is 0 because the node holds no points.

Referenced by SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::Metric().

◆ GetFurthestChild() [1/2]

size_t GetFurthestChild ( const VecType &  point,
typename std::enable_if_t< IsVector< VecType >::value > *  = 0 
)

Return the index of the furthest child node to the given query point (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the furthest).

If this is a leaf node, it will return NumChildren() (invalid index).

Referenced by SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::Metric().

◆ GetFurthestChild() [2/2]

size_t GetFurthestChild ( const SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > &  queryNode)

Return the index of the furthest child node to the given query node (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the furthest).

If it can't decide it will return NumChildren() (invalid index).

◆ GetNearestChild() [1/2]

size_t GetNearestChild ( const VecType &  point,
typename std::enable_if_t< IsVector< VecType >::value > *  = 0 
)

Return the index of the nearest child node to the given query point (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the nearest).

If this is a leaf node, it will return NumChildren() (invalid index).

Referenced by SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::Metric().

◆ GetNearestChild() [2/2]

size_t GetNearestChild ( const SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > &  queryNode)

Return the index of the nearest child node to the given query node (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the nearest).

If it can't decide it will return NumChildren() (invalid index).

◆ HasSelfChildren()

static bool HasSelfChildren ( )
inlinestatic

Returns false: this tree type does not have self children.

Definition at line 435 of file spill_tree.hpp.

◆ Hyperplane()

const HyperplaneType<MetricType>& Hyperplane ( ) const
inline

Get the Hyperplane instance.

Definition at line 278 of file spill_tree.hpp.

◆ IsLeaf()

bool IsLeaf ( ) const

Return whether or not this node is a leaf (true if it has no children).

Referenced by SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::Stat().

◆ Left() [1/2]

SpillTree* Left ( ) const
inline

Gets the left child of this node.

Definition at line 257 of file spill_tree.hpp.

◆ Left() [2/2]

SpillTree*& Left ( )
inline

Modify the left child of this node.

Definition at line 259 of file spill_tree.hpp.

◆ MaxDistance() [1/2]

ElemType MaxDistance ( const SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > &  other) const
inline

Return the maximum distance to another node.

Definition at line 396 of file spill_tree.hpp.

References SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::Bound().

◆ MaxDistance() [2/2]

ElemType MaxDistance ( const VecType &  point,
typename std::enable_if_t< IsVector< VecType >::value > *  = 0 
) const
inline

Return the maximum distance to another point.

Definition at line 418 of file spill_tree.hpp.

◆ Metric()

◆ MinDistance() [1/2]

ElemType MinDistance ( const SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > &  other) const
inline

Return the minimum distance to another node.

Definition at line 390 of file spill_tree.hpp.

References SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::Bound().

◆ MinDistance() [2/2]

ElemType MinDistance ( const VecType &  point,
typename std::enable_if_t< IsVector< VecType >::value > *  = 0 
) const
inline

Return the minimum distance to another point.

Definition at line 409 of file spill_tree.hpp.

◆ MinimumBoundDistance()

ElemType MinimumBoundDistance ( ) const

Return the minimum distance from the center of the node to any bound edge.

Referenced by SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::Metric().

◆ NumChildren()

size_t NumChildren ( ) const

Return the number of children in this node.

Referenced by SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::Metric().

◆ NumDescendants()

size_t NumDescendants ( ) const

Return the number of descendants of this node.

For a non-leaf spill tree, this is the number of points at the descendant leaves. For a leaf, this is the number of points in the leaf.

Referenced by SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::ChildPtr().

◆ NumPoints()

size_t NumPoints ( ) const

Return the number of points in this node (0 if not a leaf).

Referenced by SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::ChildPtr().

◆ operator=() [1/2]

SpillTree& operator= ( const SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > &  other)

Copy the given Spill Tree.

Parameters
otherThe tree to be copied.

◆ operator=() [2/2]

SpillTree& operator= ( SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > &&  other)

Take ownership of the given Spill Tree.

Parameters
otherThe tree to take ownership of.

◆ Overlap()

bool Overlap ( ) const
inline

Distinguish overlapping nodes from non-overlapping nodes.

Definition at line 275 of file spill_tree.hpp.

◆ Parent() [1/2]

SpillTree* Parent ( ) const
inline

Gets the parent of this node.

Definition at line 267 of file spill_tree.hpp.

◆ Parent() [2/2]

SpillTree*& Parent ( )
inline

Modify the parent of this node.

Definition at line 269 of file spill_tree.hpp.

◆ ParentDistance() [1/2]

ElemType ParentDistance ( ) const
inline

Return the distance from the center of this node to the center of the parent node.

Definition at line 344 of file spill_tree.hpp.

◆ ParentDistance() [2/2]

ElemType& ParentDistance ( )
inline

Modify the distance from the center of this node to the center of the parent node.

Definition at line 347 of file spill_tree.hpp.

References SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::Child().

◆ Point()

size_t Point ( const size_t  index) const

Return the index (with reference to the dataset) of a particular point in this node.

This will happily return invalid indices if the given index is greater than the number of points in this node (obtained with NumPoints()) – be careful.

Parameters
indexIndex of point for which a dataset index is wanted.

Referenced by SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::ChildPtr().

◆ RangeDistance() [1/2]

math::RangeType<ElemType> RangeDistance ( const SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > &  other) const
inline

Return the minimum and maximum distance to another node.

Definition at line 402 of file spill_tree.hpp.

References SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::Bound().

◆ RangeDistance() [2/2]

math::RangeType<ElemType> RangeDistance ( const VecType &  point,
typename std::enable_if_t< IsVector< VecType >::value > *  = 0 
) const
inline

Return the minimum and maximum distance to another point.

Definition at line 428 of file spill_tree.hpp.

◆ Right() [1/2]

SpillTree* Right ( ) const
inline

Gets the right child of this node.

Definition at line 262 of file spill_tree.hpp.

◆ Right() [2/2]

SpillTree*& Right ( )
inline

Modify the right child of this node.

Definition at line 264 of file spill_tree.hpp.

◆ serialize()

void serialize ( Archive &  ar,
const uint32_t  version 
)

Serialize the tree.

◆ Stat() [1/2]

const StatisticType& Stat ( ) const
inline

Return the statistic object for this node.

Definition at line 249 of file spill_tree.hpp.

◆ Stat() [2/2]

StatisticType& Stat ( )
inline

Return the statistic object for this node.

Definition at line 251 of file spill_tree.hpp.

References SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::IsLeaf().


The documentation for this class was generated from the following file:
  • /home/ryan/src/mlpack.org/_src/mlpack-git/src/mlpack/core/tree/spill_tree/spill_tree.hpp