This class implements K-Means clustering, using a variety of possible implementations of Lloyd's algorithm. More...
Public Member Functions | |
KMeans (const size_t maxIterations=1000, const MetricType metric=MetricType(), const InitialPartitionPolicy partitioner=InitialPartitionPolicy(), const EmptyClusterPolicy emptyClusterAction=EmptyClusterPolicy()) | |
Create a K-Means object and (optionally) set the parameters which K-Means will be run with. More... | |
void | Cluster (const MatType &data, const size_t clusters, arma::Row< size_t > &assignments, const bool initialGuess=false) |
Perform k-means clustering on the data, returning a list of cluster assignments. More... | |
void | Cluster (const MatType &data, size_t clusters, arma::mat ¢roids, const bool initialGuess=false) |
Perform k-means clustering on the data, returning the centroids of each cluster in the centroids matrix. More... | |
void | Cluster (const MatType &data, const size_t clusters, arma::Row< size_t > &assignments, arma::mat ¢roids, const bool initialAssignmentGuess=false, const bool initialCentroidGuess=false) |
Perform k-means clustering on the data, returning a list of cluster assignments and also the centroids of each cluster. More... | |
const EmptyClusterPolicy & | EmptyClusterAction () const |
Get the empty cluster policy. More... | |
EmptyClusterPolicy & | EmptyClusterAction () |
Modify the empty cluster policy. More... | |
size_t | MaxIterations () const |
Get the maximum number of iterations. More... | |
size_t & | MaxIterations () |
Set the maximum number of iterations. More... | |
const MetricType & | Metric () const |
Get the distance metric. More... | |
MetricType & | Metric () |
Modify the distance metric. More... | |
const InitialPartitionPolicy & | Partitioner () const |
Get the initial partitioning policy. More... | |
InitialPartitionPolicy & | Partitioner () |
Modify the initial partitioning policy. More... | |
template < typename Archive > | |
void | serialize (Archive &ar, const uint32_t version) |
Serialize the k-means object. More... | |
This class implements K-Means clustering, using a variety of possible implementations of Lloyd's algorithm.
Four template parameters can (optionally) be supplied: the distance metric to use, the policy for how to find the initial partition of the data, the actions to be taken when an empty cluster is encountered, and the implementation of a single Lloyd step to use.
A simple example of how to run K-Means clustering is shown below.
MetricType | The distance metric to use for this KMeans; see metric::LMetric for an example. |
InitialPartitionPolicy | Initial partitioning policy; must implement a default constructor and either 'void Cluster(const arma::mat&, const size_t, arma::Row<size_t>&)' or 'void Cluster(const arma::mat&, const size_t, arma::mat&)'. |
EmptyClusterPolicy | Policy for what to do on an empty cluster; must implement a default constructor and 'void EmptyCluster(const arma::mat& data, const size_t emptyCluster, const arma::mat& oldCentroids, arma::mat& newCentroids, arma::Col<size_t>& counts, MetricType& metric, const size_t iteration)'. |
LloydStepType | Implementation of single Lloyd step to use. |
Definition at line 73 of file kmeans.hpp.
KMeans | ( | const size_t | maxIterations = 1000 , |
const MetricType | metric = MetricType() , |
||
const InitialPartitionPolicy | partitioner = InitialPartitionPolicy() , |
||
const EmptyClusterPolicy | emptyClusterAction = EmptyClusterPolicy() |
||
) |
Create a K-Means object and (optionally) set the parameters which K-Means will be run with.
maxIterations | Maximum number of iterations allowed before giving up (0 is valid, but the algorithm may never terminate). |
metric | Optional MetricType object; for when the metric has state it needs to store. |
partitioner | Optional InitialPartitionPolicy object; for when a specially initialized partitioning policy is required. |
emptyClusterAction | Optional EmptyClusterPolicy object; for when a specially initialized empty cluster policy is required. |
void Cluster | ( | const MatType & | data, |
const size_t | clusters, | ||
arma::Row< size_t > & | assignments, | ||
const bool | initialGuess = false |
||
) |
Perform k-means clustering on the data, returning a list of cluster assignments.
Optionally, the vector of assignments can be set to an initial guess of the cluster assignments; to do this, set initialGuess to true.
MatType | Type of matrix (arma::mat or arma::sp_mat). |
data | Dataset to cluster. |
clusters | Number of clusters to compute. |
assignments | Vector to store cluster assignments in. |
initialGuess | If true, then it is assumed that assignments has a list of initial cluster assignments. |
void Cluster | ( | const MatType & | data, |
size_t | clusters, | ||
arma::mat & | centroids, | ||
const bool | initialGuess = false |
||
) |
Perform k-means clustering on the data, returning the centroids of each cluster in the centroids matrix.
Optionally, the initial centroids can be specified by filling the centroids matrix with the initial centroids and specifying initialGuess = true.
MatType | Type of matrix (arma::mat or arma::sp_mat). |
data | Dataset to cluster. |
clusters | Number of clusters to compute. |
centroids | Matrix in which centroids are stored. |
initialGuess | If true, then it is assumed that centroids contains the initial cluster centroids. |
void Cluster | ( | const MatType & | data, |
const size_t | clusters, | ||
arma::Row< size_t > & | assignments, | ||
arma::mat & | centroids, | ||
const bool | initialAssignmentGuess = false , |
||
const bool | initialCentroidGuess = false |
||
) |
Perform k-means clustering on the data, returning a list of cluster assignments and also the centroids of each cluster.
Optionally, the vector of assignments can be set to an initial guess of the cluster assignments; to do this, set initialAssignmentGuess to true. Another way to set initial cluster guesses is to fill the centroids matrix with the centroid guesses, and then set initialCentroidGuess to true. initialAssignmentGuess supersedes initialCentroidGuess, so if both are set to true, the assignments vector is used.
MatType | Type of matrix (arma::mat or arma::sp_mat). |
data | Dataset to cluster. |
clusters | Number of clusters to compute. |
assignments | Vector to store cluster assignments in. |
centroids | Matrix in which centroids are stored. |
initialAssignmentGuess | If true, then it is assumed that assignments has a list of initial cluster assignments. |
initialCentroidGuess | If true, then it is assumed that centroids contains the initial centroids of each cluster. |
|
inline |
Get the empty cluster policy.
Definition at line 174 of file kmeans.hpp.
|
inline |
Modify the empty cluster policy.
Definition at line 177 of file kmeans.hpp.
References KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy, LloydStepType, MatType >::serialize().
|
inline |
Get the maximum number of iterations.
Definition at line 159 of file kmeans.hpp.
|
inline |
Set the maximum number of iterations.
Definition at line 161 of file kmeans.hpp.
|
inline |
Get the distance metric.
Definition at line 164 of file kmeans.hpp.
|
inline |
Modify the distance metric.
Definition at line 166 of file kmeans.hpp.
|
inline |
Get the initial partitioning policy.
Definition at line 169 of file kmeans.hpp.
|
inline |
Modify the initial partitioning policy.
Definition at line 171 of file kmeans.hpp.
void serialize | ( | Archive & | ar, |
const uint32_t | version | ||
) |
Serialize the k-means object.
Referenced by KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy, LloydStepType, MatType >::EmptyClusterAction().