Title: | Assessment of Stability of Individual Objects or Clusters in Partitioning Solutions |
---|---|
Description: | Allows one to assess the stability of individual objects, clusters and whole clustering solutions based on repeated runs of the K-means and K-medoids partitioning algorithms. |
Authors: | Etienne Lord, Matthieu Willems, Francois-Joseph Lapointe, and Vladimir Makarenkov |
Maintainer: | Etienne Lord <[email protected]> |
License: | GPL-3 |
Version: | 1.0.4 |
Built: | 2025-03-07 04:49:29 UTC |
Source: | https://github.com/cran/ClusterStability |
The ClusterStability package uses a probabilistic framework and some well-known clustering criteria (e.g. Calinski-Harabasz, Silhouette, Dunn and Davies-Bouldin) to compute the stability scores (ST) of each individual object (i.e., element) in the clustering solution provided by the K-means and K-medoids partitioning algorithms.
Package: | ClusterStability |
Type: | Package |
Version: | 1.0.2 |
Date: | 2015-10-14 |
License: | GPL-2 |
Maintainer: | Etienne Lord <[email protected]>, |
Vladimir Makarenkov <[email protected]> |
Function ClusterStability
computes the individual and global stability scores (ST) for a partitioning solution using either K-means or K-medoids (the approximate solution is provided).
Function ClusterStability_exact
is similar to the ClusterStability
function but uses the Stirling numbers of the second kind to compute the exact stability scores (but is limited to a small number of objects).
Function Kcombination
computes the k-combination of a set of numbers for a given k.
Function Reorder
returns the re-ordered partitioning of a series of clusters.
Function Stirling2nd
computes the Stirling numbers of the second kind.
Etienne Lord, François-Joseph Lapointe and Vladimir Makarenkov
ClusterStability
,
ClusterStability_exact
,
Kcombination
,
Reorder
,
Stirling2nd
This function returns the Calinski Harabasz score of a partition (also known as the Variance Ratio Criterion).
calinski_harabasz_score(X, labels)
calinski_harabasz_score(X, labels)
X |
the input dataset: either a matrix or a dataframe. |
labels |
the partition vector. |
The Calinski Harabasz score for this data.
T. Calinski and J. Harabasz. A dendrite method for cluster analysis. Communications in Statistics, 3, no. 1:1–27, 1974
calinski_harabasz_score(iris[1:10,1:4], c(3,2,2,2,3,1,2,3,2,2)) # Expected : 11.34223
calinski_harabasz_score(iris[1:10,1:4], c(3,2,2,2,3,1,2,3,2,2)) # Expected : 11.34223
This function will return the individual stability score ST and the global score STglobal using either the K-means or K-medoids algorithm and four different clustering indices: Calinski-Harabasz, Silhouette, Dunn or Davies-Bouldin.
ClusterStability(dat, k, replicate, type)
ClusterStability(dat, k, replicate, type)
dat |
the input dataset: either a matrix or a dataframe. |
k |
the number of classes for the K-means or K-medoids algorithm (default=3). |
replicate |
the number of replicates to perform (default=1000). |
type |
the algorithm used in the partitioning: either 'kmeans' or 'kmedoids' algorithm (default=kmeans). |
Returns the individual (ST) and global (ST_global) stability scores for the four clustering indices: Calinski-Harabasz (ch), Silhouette (sil), Dunn (dunn) or Davies-Bouldin (db).
## Calculates the stability scores of individual objects of the Iris dataset ## using K-means, 100 replicates (random starts) and k=3 ClusterStability(dat=iris[1:4],k=3,replicate=100,type='kmeans');
## Calculates the stability scores of individual objects of the Iris dataset ## using K-means, 100 replicates (random starts) and k=3 ClusterStability(dat=iris[1:4],k=3,replicate=100,type='kmeans');
This function will return the exact individual stability score ST and the exact global score STglobal using either the K-means or K-medoids algorithm and four different clustering indices: Calinski-Harabasz, Silhouette, Dunn or Davies-Bouldin. Variable overflow errors are possible for large numbers of objects.
ClusterStability_exact(dat, k, replicate, type)
ClusterStability_exact(dat, k, replicate, type)
dat |
the input dataset: either a matrix or a dataframe. |
k |
the number of classes for the K-means or K-medoids algorithm (default=3). |
replicate |
the number of replicates to perform (default=1000). |
type |
the algorithm used in the partitioning: either 'kmeans' or 'kmedoids' algorithm (default=kmeans). |
Returns the exact individual (ST) and global (ST_global) stability scores for the four clustering indices: Calinski-Harabasz (ch), Silhouette (sil), Dunn (dunn) or Davies-Bouldin (db).
## Calculate the stability scores of individual objects of the Iris dataset ## using K-means, 100 replicates (random starts) and k=3 ClusterStability_exact(dat=iris[1:4],k=3,replicate=100,type='kmeans');
## Calculate the stability scores of individual objects of the Iris dataset ## using K-means, 100 replicates (random starts) and k=3 ClusterStability_exact(dat=iris[1:4],k=3,replicate=100,type='kmeans');
This function returns the Davies Bouldin score of a partition.
davies_bouldin_score(X, labels)
davies_bouldin_score(X, labels)
X |
the input dataset: either a matrix or a dataframe. |
labels |
the partition vector. |
The Davies Bouldin score for this data.
D. L. Davies and D. W. Bouldin. A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1, no. 2:224-227, 1979
davies_bouldin_score(iris[1:10,1:4], c(3,2,2,2,3,1,2,3,2,2)) # Expected : 0.5103277
davies_bouldin_score(iris[1:10,1:4], c(3,2,2,2,3,1,2,3,2,2)) # Expected : 0.5103277
This function returns the Dunn score (also known as the e Dunn index) of a partition .
dunn_score(X, labels)
dunn_score(X, labels)
X |
the input dataset: either a matrix or a dataframe. |
labels |
the partition vector. |
The Dunn index score for this data.
J. Dunn. Well separated clusters and optimal fuzzy partitions. Journal of Cybernetics, 4:95–104, 1974.
dunn_score(iris[1:10,1:4], c(3,2,2,2,3,1,2,3,2,2)) # Expected : 0.5956834
dunn_score(iris[1:10,1:4], c(3,2,2,2,3,1,2,3,2,2)) # Expected : 0.5956834
This function, given a vector of numbers, will return all the possible combinations of a given length k.
Kcombination(data, k, selector)
Kcombination(data, k, selector)
data |
the vector of numbers (i.e. elements) to consider. |
k |
the length of the returned combination (between 2 and 6 in this version). |
selector |
if set, returns only the combinations containing this number. |
Return a list of all possible combinations for the given vector of numbers.
## Returns the k-combination of the list of numbers: 1,2,3 of length=2. ## i.e. (1,2), (1,3), (2,3) Kcombination(c(1,2,3),k=2) ## Returns only the k-combination containing the number 1. ## i.e. (1,2), (1,3) Kcombination(c(1,2,3),k=2,selector=1)
## Returns the k-combination of the list of numbers: 1,2,3 of length=2. ## i.e. (1,2), (1,3), (2,3) Kcombination(c(1,2,3),k=2) ## Returns only the k-combination containing the number 1. ## i.e. (1,2), (1,3) Kcombination(c(1,2,3),k=2,selector=1)
This function returns the ordered partition of a set of numbers in ascending order and reorderd to start at one. This is an auxiliary function.
Reorder(data)
Reorder(data)
data |
vector of partition numbers to reorder. |
A vector of ordered partition numbers for this data.
Reorder(c(1,3,4,4,3,1)) # Expected : 1 2 3 3 2 1
Reorder(c(1,3,4,4,3,1)) # Expected : 1 2 3 3 2 1
This function returns the estimated Stirling numbers of the second kind i.e., the number of ways of partitioning a set of n objects into k nonempty groups.
Stirling2nd(n,k)
Stirling2nd(n,k)
n |
number of objects. |
k |
number of groups (i.e. classes). |
The Stirling number of the 2nd kind for n elements and k groups or NaN (if the Stirling number for those n and k is greater than 1e300).
Stirling2nd(n=3,k=2) # Expected value=3 Stirling2nd(n=300,k=20) # Expected value=NaN
Stirling2nd(n=3,k=2) # Expected value=3 Stirling2nd(n=300,k=20) # Expected value=NaN
The following functions are for internal computation only: calculate_global_PSG, calculate_indices, calculate_singleton, is_partition_group, p_n_k, p_tilde_n_k, calculate_individual_PSG_approximative, calculate_individual_PSG_exact, calculate_individual_PSG.