A refined approach for choosing initial points for k-means clustering. More...
Public Member Functions | |
RefinedStart (const size_t samplings=100, const double percentage=0.02) | |
Create the RefinedStart object, optionally specifying parameters for the number of samplings to perform and the percentage of the dataset to use in each sampling. More... | |
template < typename MatType > | |
void | Cluster (const MatType &data, const size_t clusters, arma::mat ¢roids) const |
Partition the given dataset into the given number of clusters according to the random sampling scheme outlined in Bradley and Fayyad's paper, and return centroids. More... | |
template < typename MatType > | |
void | Cluster (const MatType &data, const size_t clusters, arma::Row< size_t > &assignments) const |
Partition the given dataset into the given number of clusters according to the random sampling scheme outlined in Bradley and Fayyad's paper, and return point assignments. More... | |
double | Percentage () const |
Get the percentage of the data used by each subsampling. More... | |
double & | Percentage () |
Modify the percentage of the data used by each subsampling. More... | |
size_t | Samplings () const |
Get the number of samplings that will be performed. More... | |
size_t & | Samplings () |
Modify the number of samplings that will be performed. More... | |
template < typename Archive > | |
void | serialize (Archive &ar, const uint32_t) |
Serialize the object. More... | |
A refined approach for choosing initial points for k-means clustering.
This approach runs k-means several times on random subsets of the data, and then clusters those solutions to select refined initial cluster assignments. It is an implementation of the following paper:
Definition at line 39 of file refined_start.hpp.
|
inline |
Create the RefinedStart object, optionally specifying parameters for the number of samplings to perform and the percentage of the dataset to use in each sampling.
Definition at line 47 of file refined_start.hpp.
References RefinedStart::Cluster().
void Cluster | ( | const MatType & | data, |
const size_t | clusters, | ||
arma::mat & | centroids | ||
) | const |
Partition the given dataset into the given number of clusters according to the random sampling scheme outlined in Bradley and Fayyad's paper, and return centroids.
MatType | Type of data (arma::mat or arma::sp_mat). |
data | Dataset to partition. |
clusters | Number of clusters to split dataset into. |
centroids | Matrix to store centroids into. |
Referenced by RefinedStart::RefinedStart().
void Cluster | ( | const MatType & | data, |
const size_t | clusters, | ||
arma::Row< size_t > & | assignments | ||
) | const |
Partition the given dataset into the given number of clusters according to the random sampling scheme outlined in Bradley and Fayyad's paper, and return point assignments.
MatType | Type of data (arma::mat or arma::sp_mat). |
data | Dataset to partition. |
clusters | Number of clusters to split dataset into. |
assignments | Vector to store cluster assignments into. Values will be between 0 and (clusters - 1). |
|
inline |
Get the percentage of the data used by each subsampling.
Definition at line 88 of file refined_start.hpp.
|
inline |
Modify the percentage of the data used by each subsampling.
Definition at line 90 of file refined_start.hpp.
|
inline |
Get the number of samplings that will be performed.
Definition at line 83 of file refined_start.hpp.
|
inline |
Modify the number of samplings that will be performed.
Definition at line 85 of file refined_start.hpp.
|
inline |
Serialize the object.
Definition at line 94 of file refined_start.hpp.