RefinedStart Class Reference

A refined approach for choosing initial points for k-means clustering. More...

Public Member Functions

 RefinedStart (const size_t samplings=100, const double percentage=0.02)
 Create the RefinedStart object, optionally specifying parameters for the number of samplings to perform and the percentage of the dataset to use in each sampling. More...

 
template
<
typename
MatType
>
void Cluster (const MatType &data, const size_t clusters, arma::mat &centroids) const
 Partition the given dataset into the given number of clusters according to the random sampling scheme outlined in Bradley and Fayyad's paper, and return centroids. More...

 
template
<
typename
MatType
>
void Cluster (const MatType &data, const size_t clusters, arma::Row< size_t > &assignments) const
 Partition the given dataset into the given number of clusters according to the random sampling scheme outlined in Bradley and Fayyad's paper, and return point assignments. More...

 
double Percentage () const
 Get the percentage of the data used by each subsampling. More...

 
double & Percentage ()
 Modify the percentage of the data used by each subsampling. More...

 
size_t Samplings () const
 Get the number of samplings that will be performed. More...

 
size_t & Samplings ()
 Modify the number of samplings that will be performed. More...

 
template
<
typename
Archive
>
void serialize (Archive &ar, const uint32_t)
 Serialize the object. More...

 

Detailed Description

A refined approach for choosing initial points for k-means clustering.

This approach runs k-means several times on random subsets of the data, and then clusters those solutions to select refined initial cluster assignments. It is an implementation of the following paper:

@inproceedings{bradley1998refining,
title={Refining initial points for k-means clustering},
author={Bradley, Paul S and Fayyad, Usama M},
booktitle={Proceedings of the Fifteenth International Conference on Machine
Learning (ICML 1998)},
volume={66},
year={1998}
}

Definition at line 39 of file refined_start.hpp.

Constructor & Destructor Documentation

◆ RefinedStart()

RefinedStart ( const size_t  samplings = 100,
const double  percentage = 0.02 
)
inline

Create the RefinedStart object, optionally specifying parameters for the number of samplings to perform and the percentage of the dataset to use in each sampling.

Definition at line 47 of file refined_start.hpp.

References RefinedStart::Cluster().

Member Function Documentation

◆ Cluster() [1/2]

void Cluster ( const MatType &  data,
const size_t  clusters,
arma::mat &  centroids 
) const

Partition the given dataset into the given number of clusters according to the random sampling scheme outlined in Bradley and Fayyad's paper, and return centroids.

Template Parameters
MatTypeType of data (arma::mat or arma::sp_mat).
Parameters
dataDataset to partition.
clustersNumber of clusters to split dataset into.
centroidsMatrix to store centroids into.

Referenced by RefinedStart::RefinedStart().

◆ Cluster() [2/2]

void Cluster ( const MatType &  data,
const size_t  clusters,
arma::Row< size_t > &  assignments 
) const

Partition the given dataset into the given number of clusters according to the random sampling scheme outlined in Bradley and Fayyad's paper, and return point assignments.

Template Parameters
MatTypeType of data (arma::mat or arma::sp_mat).
Parameters
dataDataset to partition.
clustersNumber of clusters to split dataset into.
assignmentsVector to store cluster assignments into. Values will be between 0 and (clusters - 1).

◆ Percentage() [1/2]

double Percentage ( ) const
inline

Get the percentage of the data used by each subsampling.

Definition at line 88 of file refined_start.hpp.

◆ Percentage() [2/2]

double& Percentage ( )
inline

Modify the percentage of the data used by each subsampling.

Definition at line 90 of file refined_start.hpp.

◆ Samplings() [1/2]

size_t Samplings ( ) const
inline

Get the number of samplings that will be performed.

Definition at line 83 of file refined_start.hpp.

◆ Samplings() [2/2]

size_t& Samplings ( )
inline

Modify the number of samplings that will be performed.

Definition at line 85 of file refined_start.hpp.

◆ serialize()

void serialize ( Archive &  ar,
const uint32_t   
)
inline

Serialize the object.

Definition at line 94 of file refined_start.hpp.


The documentation for this class was generated from the following file:
  • /home/ryan/src/mlpack.org/_src/mlpack-git/src/mlpack/methods/kmeans/refined_start.hpp