Title: | Bayesian Cluster Validity Index |
---|---|
Description: | Algorithms for computing and generating plots with and without error bars for Bayesian cluster validity index (BCVI) (O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. <doi:10.1016/j.csda.2024.108053>) based on several underlying cluster validity indexes (CVIs) including Calinski-Harabasz, Chou-Su-Lai, Davies-Bouldin, Dunn, Pakhira-Bandyopadhyay-Maulik, Point biserial correlation, the score function, Starczewski, and Wiroonsri indices for hard clustering, and Correlation Cluster Validity, the generalized C, HF, KWON, KWON2, Modified Pakhira-Bandyopadhyay-Maulik, Pakhira-Bandyopadhyay-Maulik, Tang, Wiroonsri-Preedasawakul, Wu-Li, and Xie-Beni indices for soft clustering. The package is compatible with K-means, fuzzy C means, EM clustering, and hierarchical clustering (single, average, and complete linkage). Though BCVI is compatible with any underlying existing CVIs, we recommend users to use either WI or WP as the underlying CVI. |
Authors: | Nathakhun Wiroonsri [aut] , Onthada Preedasawakul [cre, aut] |
Maintainer: | Onthada Preedasawakul <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0.1 |
Built: | 2024-11-04 03:33:04 UTC |
Source: | https://github.com/cran/BayesCVI |
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using the pearson correlation cluster validity (CCVP) and/or the spearman’s (rho) correlation cluster validity (CCVS) as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
B_CCV.IDX(x, kmax, indexlist = "all", method = "FCM", fzm = 2, iter = 100, nstart = 20, alpha = "default", mult.alpha = 1/2)
B_CCV.IDX(x, kmax, indexlist = "all", method = "FCM", fzm = 2, iter = 100, nstart = 20, alpha = "default", mult.alpha = 1/2)
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
indexlist |
a character string indicating which The generalized C index be computed (" |
method |
a character string indicating which clustering method to be used ( |
fzm |
a number greater than 1 giving the degree of fuzzification for |
iter |
a maximum number of iterations for |
nstart |
a maximum number of initial random sets for FCM for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
BCVI-CCV is defined as follows.
Let
where CVI is either CCVP or CCVS index.
Assume that
represents the conditional probability density function of the dataset given , where
is the normalizing constant. Assume further that
follows a Dirichlet prior distribution with parameters
. The posterior distribution of
still remains a Dirichlet distribution with parameters
.
The BCVI is then defined as
where
The variance of can be computed as
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Nathakhun Wiroonsri and Onthada Preedasawakul
M. Popescu, J. C. Bezdek, T. C. Havens and J. M. Keller (2013). "A Cluster Validity Framework Based on Induced Partition Dissimilarity." https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6246717&isnumber=6340245
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
B7_data, B_TANG.IDX, B_XB.IDX, B_Wvalid, B_DB.IDX
library(BayesCVI) # The data included in this package. data = B7_data[,1:2] # alpha aalpha = c(20,20,20,5,5,5,0.5,0.5,0.5) B.CCV = B_CCV.IDX(x = scale(data), kmax=10, indexlist = "CCVP", method = "FCM", fzm = 2, iter = 100, nstart = 20, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI-CCVP pplot = plot_BCVI(B.CCV$CCVP) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
library(BayesCVI) # The data included in this package. data = B7_data[,1:2] # alpha aalpha = c(20,20,20,5,5,5,0.5,0.5,0.5) B.CCV = B_CCV.IDX(x = scale(data), kmax=10, indexlist = "CCVP", method = "FCM", fzm = 2, iter = 100, nstart = 20, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI-CCVP pplot = plot_BCVI(B.CCV$CCVP) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using Calinski–Harabasz (CH) as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
B_CH.IDX(x, kmax, method = "kmeans", nstart = 100, alpha = "default", mult.alpha = 1/2)
B_CH.IDX(x, kmax, method = "kmeans", nstart = 100, alpha = "default", mult.alpha = 1/2)
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
nstart |
a maximum number of initial random sets for kmeans for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
BCVI-CH is defined as follows.
Let
Assume that
represents the conditional probability density function of the dataset given , where
is the normalizing constant. Assume further that
follows a Dirichlet prior distribution with parameters
. The posterior distribution of
still remains a Dirichlet distribution with parameters
.
The BCVI is then defined as
where
The variance of can be computed as
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Nathakhun Wiroonsri and Onthada Preedasawakul
T. Calinski, J. Harabasz, "A dendrite method for cluster analysis," Communications in Statistics, 3, 1-27 (1974).
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
B2_data, B_TANG.IDX, B_XB.IDX, B_Wvalid, B_DB.IDX
library(BayesCVI) # The data included in this package. data = B2_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.CH = B_CH.IDX(x = scale(data), kmax=10, method = "kmeans", nstart = 100, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.CH) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
library(BayesCVI) # The data included in this package. data = B2_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.CH = B_CH.IDX(x = scale(data), kmax=10, method = "kmeans", nstart = 100, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.CH) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using Chou-Su-Lai (CSL) as the underlying cluster validity index (CVI) and Dirichlet prior parameters of the user's choice. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
B_CSL.IDX(x, kmax, method = "kmeans", nstart = 100, alpha = "default", mult.alpha = 1/2)
B_CSL.IDX(x, kmax, method = "kmeans", nstart = 100, alpha = "default", mult.alpha = 1/2)
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
nstart |
a maximum number of initial random sets for kmeans for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
BCVI-CSL is defined as follows.
Let
Assume that
represents the conditional probability density function of the dataset given , where
is the normalizing constant. Assume further that
follows a Dirichlet prior distribution with parameters
. The posterior distribution of
still remains a Dirichlet distribution with parameters
.
The BCVI is then defined as
where
The variance of can be computed as
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Nathakhun Wiroonsri and Onthada Preedasawakul
C. H. Chou, M. C. Su, E. Lai, "A new cluster validity measure and its application to image compression," Pattern Anal Applic, 7, 205-220 (2004).
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
B2_data, B_TANG.IDX, B_WP.IDX, B_Wvalid, B_DB.IDX
library(BayesCVI) # The data included in this package. data = B2_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.CSL = B_CSL.IDX(x = scale(data), kmax=10, method = "kmeans", nstart = 100, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.CSL) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
library(BayesCVI) # The data included in this package. data = B2_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.CSL = B_CSL.IDX(x = scale(data), kmax=10, method = "kmeans", nstart = 100, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.CSL) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using DB and/or DBs as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
B_DB.IDX(x, kmax, method = "kmeans", indexlist = "all", p = 2, q = 2, nstart = 100, alpha = "default", mult.alpha = 1/2)
B_DB.IDX(x, kmax, method = "kmeans", indexlist = "all", p = 2, q = 2, nstart = 100, alpha = "default", mult.alpha = 1/2)
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
indexlist |
a character string indicating which cluster validity indexes to be computed ( |
p |
the power of the Minkowski distance between centroids of clusters. The default is |
q |
the power of dispersion measure of a cluster. The default is |
nstart |
a maximum number of initial random sets for kmeans for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
BCVI-DB is defined as follows.
Let
where CVI indicates DB or DBs index.
Assume that
represents the conditional probability density function of the dataset given , where
is the normalizing constant. Assume further that
follows a Dirichlet prior distribution with parameters
. The posterior distribution of
still remains a Dirichlet distribution with parameters
.
The BCVI is then defined as
where
The variance of can be computed as
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Nathakhun Wiroonsri and Onthada Preedasawakul
D. L. Davies, D. W. Bouldin, "A cluster separation measure," IEEE Trans Pattern Anal Machine Intell, 1, 224-227 (1979).
M. Kim, R. S. Ramakrishna, "New indices for cluster validity assessment," Pattern Recognition Letters, 26, 2353-2363 (2005).
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
B2_data, B_TANG.IDX, B_WP.IDX, B_Wvalid, B_DI.IDX
library(BayesCVI) # The data included in this package. data = B2_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.DB = B_DB.IDX(x = scale(data), kmax=10, method = "kmeans", indexlist = "all", p = 2, q = 2, nstart = 100, alpha = "default", mult.alpha = 1/2) # plot the BCVI-DB pplot = plot_BCVI(B.DB$DB) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot # plot the BCVI-DBs pplot = plot_BCVI(B.DB$DBs) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
library(BayesCVI) # The data included in this package. data = B2_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.DB = B_DB.IDX(x = scale(data), kmax=10, method = "kmeans", indexlist = "all", p = 2, q = 2, nstart = 100, alpha = "default", mult.alpha = 1/2) # plot the BCVI-DB pplot = plot_BCVI(B.DB$DB) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot # plot the BCVI-DBs pplot = plot_BCVI(B.DB$DBs) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using Dunn index (DI) as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
B_DI.IDX(x, kmax, method = "kmeans", nstart = 100, alpha = "default", mult.alpha = 1/2)
B_DI.IDX(x, kmax, method = "kmeans", nstart = 100, alpha = "default", mult.alpha = 1/2)
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
nstart |
a maximum number of initial random sets for kmeans for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
BCVI-DI is defined as follows.
Let
Assume that
represents the conditional probability density function of the dataset given , where
is the normalizing constant. Assume further that
follows a Dirichlet prior distribution with parameters
. The posterior distribution of
still remains a Dirichlet distribution with parameters
.
The BCVI is then defined as
where
The variance of can be computed as
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Nathakhun Wiroonsri and Onthada Preedasawakul
J. C. Dunn, "A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters," J Cybern, 3(3), 32-57 (1973).
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
B2_data, B_TANG.IDX, B_XB.IDX, B_Wvalid, B_DB.IDX
library(BayesCVI) # The data included in this package. data = B2_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.DI = B_DI.IDX(x = scale(data), kmax=10, method = "kmeans", nstart = 100, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.DI) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
library(BayesCVI) # The data included in this package. data = B2_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.DI = B_DI.IDX(x = scale(data), kmax=10, method = "kmeans", nstart = 100, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.DI) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using all or part of GC1 GC2 GC3 and GC4 as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
B_GC.IDX(x, kmax, indexlist = "all", method = "FCM", fzm = 2, iter = 100, nstart = 20, alpha = "default", mult.alpha = 1/2)
B_GC.IDX(x, kmax, indexlist = "all", method = "FCM", fzm = 2, iter = 100, nstart = 20, alpha = "default", mult.alpha = 1/2)
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
indexlist |
a character string indicating which The generalized C index be computed (" |
method |
a character string indicating which clustering method to be used ( |
fzm |
a number greater than 1 giving the degree of fuzzification for |
iter |
a maximum number of iterations for |
nstart |
a maximum number of initial random sets for FCM for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
BCVI-GC is defined as follows.
Let
where CVI is one of the GC1 GC2 GC3 or GC4 index.
Assume that
represents the conditional probability density function of the dataset given , where
is the normalizing constant. Assume further that
follows a Dirichlet prior distribution with parameters
. The posterior distribution of
still remains a Dirichlet distribution with parameters
.
The BCVI is then defined as
where
The variance of can be computed as
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Nathakhun Wiroonsri and Onthada Preedasawakul
J. C. Bezdek, M. Moshtaghi, T. Runkler, and C. Leckie, “The generalized c index for internal fuzzy cluster validity,” IEEE Transactions on Fuzzy Systems, vol. 24, no. 6, pp. 1500–1512, 2016. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7429723&isnumber=7797168
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
B7_data, B_TANG.IDX, B_XB.IDX, B_Wvalid, B_DB.IDX
library(BayesCVI) # The data included in this package. data = B7_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.GC = B_GC.IDX(x = scale(data), kmax = 10, indexlist = "GC1", method = "FCM", fzm = 2, iter = 100, nstart = 20, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI-GC1 pplot = plot_BCVI(B.GC$GC1) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
library(BayesCVI) # The data included in this package. data = B7_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.GC = B_GC.IDX(x = scale(data), kmax = 10, indexlist = "GC1", method = "FCM", fzm = 2, iter = 100, nstart = 20, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI-GC1 pplot = plot_BCVI(B.GC$GC1) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using HF as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
B_HF.IDX(x, kmax, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = "default", mult.alpha = 1/2)
B_HF.IDX(x, kmax, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = "default", mult.alpha = 1/2)
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
fzm |
a number greater than 1 giving the degree of fuzzification for |
nstart |
a maximum number of initial random sets for FCM for |
iter |
a maximum number of iterations for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
BCVI-HF is defined as follows.
Let
Assume that
represents the conditional probability density function of the dataset given , where
is the normalizing constant. Assume further that
follows a Dirichlet prior distribution with parameters
. The posterior distribution of
still remains a Dirichlet distribution with parameters
.
The BCVI is then defined as
where
The variance of can be computed as
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Nathakhun Wiroonsri and Onthada Preedasawakul
F. Haouas, Z. Ben Dhiaf, A. Hammouda and B. Solaiman, "A new efficient fuzzy cluster validity index: Application to images clustering," 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Naples, Italy, 2017, pp. 1-6. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8015651&isnumber=8015374
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
B7_data, B_TANG.IDX, B_WP.IDX, B_Wvalid, B_DB.IDX
library(BayesCVI) # The data included in this package. data = B7_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.HF = B_HF.IDX(x = scale(data), kmax =10, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.HF) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
library(BayesCVI) # The data included in this package. data = B7_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.HF = B_HF.IDX(x = scale(data), kmax =10, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.HF) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using Modified Kernel form of Pakhira-Bandyopadhyay-Maulik (KPBM) as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
B_KPBM.IDX(x, kmax, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = "default", mult.alpha = 1/2)
B_KPBM.IDX(x, kmax, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = "default", mult.alpha = 1/2)
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
fzm |
a number greater than 1 giving the degree of fuzzification for |
nstart |
a maximum number of initial random sets for FCM for |
iter |
a maximum number of iterations for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
BCVI-KPBM is defined as follows.
Let
Assume that
represents the conditional probability density function of the dataset given , where
is the normalizing constant. Assume further that
follows a Dirichlet prior distribution with parameters
. The posterior distribution of
still remains a Dirichlet distribution with parameters
.
The BCVI is then defined as
where
The variance of can be computed as
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Nathakhun Wiroonsri and Onthada Preedasawakul
C. Alok. (2010). "An investigation of clustering algorithms and soft computing approaches for pattern recognition," Department of Computer Science, Assam University. http://hdl.handle.net/10603/93443
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
B7_data, B_TANG.IDX, B_WP.IDX, B_Wvalid, B_DB.IDX
library(BayesCVI) # The data included in this package. data = B7_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.KPBM = B_KPBM.IDX(x = scale(data), kmax =10, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.KPBM) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
library(BayesCVI) # The data included in this package. data = B7_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.KPBM = B_KPBM.IDX(x = scale(data), kmax =10, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.KPBM) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using KWON as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
B_KWON.IDX(x, kmax, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = "default", mult.alpha = 1/2)
B_KWON.IDX(x, kmax, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = "default", mult.alpha = 1/2)
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
fzm |
a number greater than 1 giving the degree of fuzzification for |
nstart |
a maximum number of initial random sets for FCM for |
iter |
a maximum number of iterations for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
BCVI-KWON is defined as follows.
Let
Assume that
represents the conditional probability density function of the dataset given , where
is the normalizing constant. Assume further that
follows a Dirichlet prior distribution with parameters
. The posterior distribution of
still remains a Dirichlet distribution with parameters
.
The BCVI is then defined as
where
The variance of can be computed as
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Nathakhun Wiroonsri and Onthada Preedasawakul
S. H. Kwon, “Cluster validity index for fuzzy clustering,” Electronics letters, vol. 34, no. 22, pp. 2176–2177, 1998. doi:10.1049/el:19981523
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
B7_data, B_TANG.IDX, B_WP.IDX, B_Wvalid, B_DB.IDX
library(BayesCVI) # The data included in this package. data = B7_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.KWON = B_KWON.IDX(x = scale(data), kmax =10, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.KWON) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
library(BayesCVI) # The data included in this package. data = B7_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.KWON = B_KWON.IDX(x = scale(data), kmax =10, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.KWON) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using KWON2 as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
B_KWON2.IDX(x, kmax, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = "default", mult.alpha = 1/2)
B_KWON2.IDX(x, kmax, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = "default", mult.alpha = 1/2)
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
fzm |
a number greater than 1 giving the degree of fuzzification for |
nstart |
a maximum number of initial random sets for FCM for |
iter |
a maximum number of iterations for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
BCVI-KWON2 is defined as follows.
Let
Assume that
represents the conditional probability density function of the dataset given , where
is the normalizing constant. Assume further that
follows a Dirichlet prior distribution with parameters
. The posterior distribution of
still remains a Dirichlet distribution with parameters
.
The BCVI is then defined as
where
The variance of can be computed as
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Nathakhun Wiroonsri and Onthada Preedasawakul
S. H. Kwon, J. Kim, and S. H. Son, “Improved cluster validity index for fuzzy clustering,” Electronics Letters, vol. 57, no. 21, pp. 792–794, 2021.
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
B7_data, B_TANG.IDX, B_WP.IDX, B_Wvalid, B_DB.IDX
library(BayesCVI) # The data included in this package. data = B7_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.KWON2 = B_KWON2.IDX(x = scale(data), kmax =10, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.KWON2) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
library(BayesCVI) # The data included in this package. data = B7_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.KWON2 = B_KWON2.IDX(x = scale(data), kmax =10, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.KWON2) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using Point biserial correlation (PB) as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
B_PB.IDX(x, kmax, method = "kmeans", corr = "pearson", nstart = 100, alpha = "default", mult.alpha = 1/2)
B_PB.IDX(x, kmax, method = "kmeans", corr = "pearson", nstart = 100, alpha = "default", mult.alpha = 1/2)
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
corr |
a character string indicating which correlation coefficient is to be computed ( |
nstart |
a maximum number of initial random sets for kmeans for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
BCVI-PB is defined as follows.
Let
Assume that
represents the conditional probability density function of the dataset given , where
is the normalizing constant. Assume further that
follows a Dirichlet prior distribution with parameters
. The posterior distribution of
still remains a Dirichlet distribution with parameters
.
The BCVI is then defined as
where
The variance of can be computed as
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Nathakhun Wiroonsri and Onthada Preedasawakul
G. W. Miligan, "An examination of the effect of six types of error perturbation on fifteen clustering algorithms," Psychometrika, 45, 325-342 (1980).
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
B2_data, B_TANG.IDX, B_WP.IDX, B_Wvalid, B_DB.IDX
library(BayesCVI) # The data included in this package. data = B2_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.PB = B_PB.IDX(x = scale(data), kmax=10, method = "kmeans", corr = "pearson", nstart = 100, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.PB) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
library(BayesCVI) # The data included in this package. data = B2_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.PB = B_PB.IDX(x = scale(data), kmax=10, method = "kmeans", corr = "pearson", nstart = 100, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.PB) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using Pakhira-Bandyopadhyay-Maulik (PBM) as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
B_PBM.IDX(x, kmax, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = "default", mult.alpha = 1/2)
B_PBM.IDX(x, kmax, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = "default", mult.alpha = 1/2)
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
fzm |
a number greater than 1 giving the degree of fuzzification for |
nstart |
a maximum number of initial random sets for FCM for |
iter |
a maximum number of iterations for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
BCVI-PBM is defined as follows.
Let
Assume that
represents the conditional probability density function of the dataset given , where
is the normalizing constant. Assume further that
follows a Dirichlet prior distribution with parameters
. The posterior distribution of
still remains a Dirichlet distribution with parameters
.
The BCVI is then defined as
where
The variance of can be computed as
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Nathakhun Wiroonsri and Onthada Preedasawakul
M. K. Pakhira, S. Bandyopadhyay, and U. Maulik, “Validity index for crisp and fuzzy clusters,” Pattern recognition, vol. 37, no. 3, pp. 487–501, 2004.
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
B7_data, B_TANG.IDX, B_WP.IDX, B_Wvalid, B_DB.IDX
library(BayesCVI) # The data included in this package. data = B7_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.PBM = B_PBM.IDX(x = scale(data), kmax =10, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.PBM) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
library(BayesCVI) # The data included in this package. data = B7_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.PBM = B_PBM.IDX(x = scale(data), kmax =10, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.PBM) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using the score function (SF) as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
B_SF.IDX(x, kmax, method = "kmeans", nstart = 100, alpha = "default", mult.alpha = 1/2)
B_SF.IDX(x, kmax, method = "kmeans", nstart = 100, alpha = "default", mult.alpha = 1/2)
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
nstart |
a maximum number of initial random sets for kmeans for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
BCVI-SF is defined as follows.
Let
Assume that
represents the conditional probability density function of the dataset given , where
is the normalizing constant. Assume further that
follows a Dirichlet prior distribution with parameters
. The posterior distribution of
still remains a Dirichlet distribution with parameters
.
The BCVI is then defined as
where
The variance of can be computed as
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Nathakhun Wiroonsri and Onthada Preedasawakul
S. Saitta, B. Raphael, I. Smith, "A bounded index for cluster validity," In Perner, P.: Machine Learning and Data Mining in Pattern Recognition, Lecture Notes in Computer Science, 4571, Springer (2007).
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
B2_data, B_TANG.IDX, B_WP.IDX, B_Wvalid, B_DB.IDX
library(BayesCVI) # The data included in this package. data = B2_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.SF = B_SF.IDX(x = scale(data), kmax=10, method = "kmeans", nstart = 100, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.SF) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
library(BayesCVI) # The data included in this package. data = B2_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.SF = B_SF.IDX(x = scale(data), kmax=10, method = "kmeans", nstart = 100, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.SF) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using Starczewski (STR) and/or Pakhira-Bandyopadhyay-Maulik (PBM) as the underlying cluster validity index (CVI) and Dirichlet prior parameters of the user's choice. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
B_STRPBM.IDX(x, kmax, method = "kmeans", indexlist = "all", nstart = 100, alpha = "default", mult.alpha = 1/2)
B_STRPBM.IDX(x, kmax, method = "kmeans", indexlist = "all", nstart = 100, alpha = "default", mult.alpha = 1/2)
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
indexlist |
a character string indicating which cluster validity indexes to be computed ( |
nstart |
a maximum number of initial random sets for kmeans for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
BCVI-STRPBM is defined as follows.
Let
where CVI is either STR or PBM index.
Assume that
represents the conditional probability density function of the dataset given , where
is the normalizing constant. Assume further that
follows a Dirichlet prior distribution with parameters
. The posterior distribution of
still remains a Dirichlet distribution with parameters
.
The BCVI is then defined as
where
The variance of can be computed as
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Nathakhun Wiroonsri and Onthada Preedasawakul
M. K. Pakhira, S. Bandyopadhyay and U. Maulik, "Validity index for crisp and fuzzy clusters," Pattern Recogn 37(3):487–501 (2004).
A. Starczewski, "A new validity index for crisp clusters," Pattern Anal Applic 20, 687–700 (2017).
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
B2_data, B_TANG.IDX, B_WP.IDX, B_Wvalid, B_DB.IDX
library(BayesCVI) # The data included in this package. data = B2_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.STRPBM = B_STRPBM.IDX(x = scale(data), kmax=10, method = "kmeans", indexlist = "all", nstart = 100, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI-STR pplot = plot_BCVI(B.STRPBM$STR) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot # plot the BCVI-PBM pplot = plot_BCVI(B.STRPBM$PBM) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
library(BayesCVI) # The data included in this package. data = B2_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.STRPBM = B_STRPBM.IDX(x = scale(data), kmax=10, method = "kmeans", indexlist = "all", nstart = 100, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI-STR pplot = plot_BCVI(B.STRPBM$STR) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot # plot the BCVI-PBM pplot = plot_BCVI(B.STRPBM$PBM) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using Tang as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
B_TANG.IDX(x, kmax, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = "default", mult.alpha = 1/2)
B_TANG.IDX(x, kmax, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = "default", mult.alpha = 1/2)
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
fzm |
a number greater than 1 giving the degree of fuzzification for |
nstart |
a maximum number of initial random sets for FCM for |
iter |
a maximum number of iterations for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
BCVI-TANG is defined as follows.
Let
Assume that
represents the conditional probability density function of the dataset given , where
is the normalizing constant. Assume further that
follows a Dirichlet prior distribution with parameters
. The posterior distribution of
still remains a Dirichlet distribution with parameters
.
The BCVI is then defined as
where
The variance of can be computed as
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Nathakhun Wiroonsri and Onthada Preedasawakul
Y. Tang, F. Sun, and Z. Sun, “Improved validation index for fuzzy clustering,” in Proceedings of the 2005, American Control Conference, 2005., pp. 1120–1125 vol. 2, 2005. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1470111&isnumber=31519
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
B7_data, B_DI.IDX, B_WP.IDX, B_Wvalid, B_DB.IDX
library(BayesCVI) # The data included in this package. data = B7_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.TANG = B_TANG.IDX(x = scale(data), kmax =10, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.TANG) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
library(BayesCVI) # The data included in this package. data = B7_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.TANG = B_TANG.IDX(x = scale(data), kmax =10, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.TANG) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using Wu and Li (WL) as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
B_WL.IDX(x, kmax, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = "default", mult.alpha = 1/2)
B_WL.IDX(x, kmax, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = "default", mult.alpha = 1/2)
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
fzm |
a number greater than 1 giving the degree of fuzzification for |
nstart |
a maximum number of initial random sets for FCM for |
iter |
a maximum number of iterations for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
BCVI-WL is defined as follows.
Let
Assume that
represents the conditional probability density function of the dataset given , where
is the normalizing constant. Assume further that
follows a Dirichlet prior distribution with parameters
. The posterior distribution of
still remains a Dirichlet distribution with parameters
.
The BCVI is then defined as
where
The variance of can be computed as
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Nathakhun Wiroonsri and Onthada Preedasawakul
C. H. Wu, C. S. Ouyang, L. W. Chen, and L. W. Lu, “A new fuzzy clustering validity index with a median factor for centroid-based clustering,” IEEE Transactions on Fuzzy Systems, vol. 23, no. 3, pp. 701–718, 2015.https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6811211&isnumber=7115244
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
B7_data, B_TANG.IDX, B_WP.IDX, B_Wvalid, B_DB.IDX
library(BayesCVI) # The data included in this package. data = B7_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.WL = B_WL.IDX(x = scale(data), kmax =10, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.WL) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
library(BayesCVI) # The data included in this package. data = B7_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.WL = B_WL.IDX(x = scale(data), kmax =10, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.WL) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using Wiroonsri and Preedasawakul (WP) as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
B_WP.IDX(x, kmax, corr = "pearson", method = "FCM", fzm = 2, gamma = (fzm^2 * 7)/4, sampling = 1, iter = 100, nstart = 20, NCstart = TRUE, alpha = "default", mult.alpha = 1/2)
B_WP.IDX(x, kmax, corr = "pearson", method = "FCM", fzm = 2, gamma = (fzm^2 * 7)/4, sampling = 1, iter = 100, nstart = 20, NCstart = TRUE, alpha = "default", mult.alpha = 1/2)
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
corr |
a character string indicating which correlation coefficient is to be computed ( |
method |
a character string indicating which clustering method to be used ( |
fzm |
a number greater than 1 giving the degree of fuzzification for |
gamma |
adjusted fuzziness parameter for |
sampling |
a number greater than 0 and less than or equal to 1 indicating the undersampling proportion of data to be used. This argument is intended for handling a large dataset. The default is |
iter |
a maximum number of iterations for |
nstart |
a maximum number of initial random sets for FCM for |
NCstart |
logical for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
BCVI-WP is defined as follows.
Let
Assume that
represents the conditional probability density function of the dataset given , where
is the normalizing constant. Assume further that
follows a Dirichlet prior distribution with parameters
. The posterior distribution of
still remains a Dirichlet distribution with parameters
.
The BCVI is then defined as
where
The variance of can be computed as
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Nathakhun Wiroonsri and Onthada Preedasawakul
N. Wiroonsri, O. Preedasawakul, "A correlation-based fuzzy cluster validity index with secondary options detector". doi:10.48550/arXiv.2308.14785
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
B7_data, B_TANG.IDX, B_XB.IDX, B_Wvalid, B_DB.IDX
library(BayesCVI) # The data included in this package. data = B7_data[,1:2] # alpha aalpha = c(20,20,20,5,5,5,0.5,0.5,0.5) B.WP = B_WP.IDX(x = scale(data), kmax =10, corr = "pearson", method = "FCM", fzm = 2, sampling = 1, iter = 100, nstart = 20, NCstart = TRUE, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.WP) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
library(BayesCVI) # The data included in this package. data = B7_data[,1:2] # alpha aalpha = c(20,20,20,5,5,5,0.5,0.5,0.5) B.WP = B_WP.IDX(x = scale(data), kmax =10, corr = "pearson", method = "FCM", fzm = 2, sampling = 1, iter = 100, nstart = 20, NCstart = TRUE, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.WP) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using Wiroonsri (WI) as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
B_Wvalid(x, kmax, method = "kmeans", corr = "pearson", nstart = 100, sampling = 1, NCstart = TRUE, alpha = "default", mult.alpha = 1/2)
B_Wvalid(x, kmax, method = "kmeans", corr = "pearson", nstart = 100, sampling = 1, NCstart = TRUE, alpha = "default", mult.alpha = 1/2)
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
corr |
a character string indicating which correlation coefficient is to be computed ( |
nstart |
a maximum number of initial random sets for kmeans for |
sampling |
a number greater than 0 and less than or equal to 1 indicating the undersampling proportion of data to be used. This argument is intended for handling a large dataset. The default is |
NCstart |
logical for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
BCVI-WI is defined as follows.
Let
Assume that
represents the conditional probability density function of the dataset given , where
is the normalizing constant. Assume further that
follows a Dirichlet prior distribution with parameters
. The posterior distribution of
still remains a Dirichlet distribution with parameters
.
The BCVI is then defined as
where
The variance of can be computed as
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Nathakhun Wiroonsri and Onthada Preedasawakul
N. Wiroonsri, "Clustering performance analysis using a new correlation based cluster validity index," Pattern Recognition, 145, 109910, 2024. doi:10.1016/j.patcog.2023.109910
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
B2_data, B_TANG.IDX, B_WP.IDX, B_STRPBM.IDX, B_DB.IDX
library(BayesCVI) # The data included in this package. data = B2_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.WI = B_Wvalid(x = scale(data), kmax = 10, method = "kmeans", corr = "pearson", nstart = 100, sampling = 1, NCstart = TRUE, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.WI) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
library(BayesCVI) # The data included in this package. data = B2_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.WI = B_Wvalid(x = scale(data), kmax = 10, method = "kmeans", corr = "pearson", nstart = 100, sampling = 1, NCstart = TRUE, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.WI) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using Xie and Beni (XB) as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
B_XB.IDX(x, kmax, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = "default", mult.alpha = 1/2)
B_XB.IDX(x, kmax, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = "default", mult.alpha = 1/2)
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
fzm |
a number greater than 1 giving the degree of fuzzification for |
nstart |
a maximum number of initial random sets for FCM for |
iter |
a maximum number of iterations for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
BCVI-XB is defined as follows.
Let
Assume that
represents the conditional probability density function of the dataset given , where
is the normalizing constant. Assume further that
follows a Dirichlet prior distribution with parameters
. The posterior distribution of
still remains a Dirichlet distribution with parameters
.
The BCVI is then defined as
where
The variance of can be computed as
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Nathakhun Wiroonsri and Onthada Preedasawakul
X. Xie and G. Beni, “A validity measure for fuzzy clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 8, pp. 841–847, 1991.
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
B7_data, B_TANG.IDX, B_WP.IDX, B_Wvalid, B_DB.IDX
library(BayesCVI) # The data included in this package. data = B7_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.XB = B_XB.IDX(x = scale(data), kmax =10, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.XB) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
library(BayesCVI) # The data included in this package. data = B7_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.XB = B_XB.IDX(x = scale(data), kmax =10, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.XB) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
A 2
-dimensional dataset from Wiroonsri and Preedasawakul (2024) generated from 1
Gaussian and 1
Uniform distributions labeled as 1-2
.
B1_data
B1_data
A data frame with 5500 data points and 3 variables
x
Numeric values generated from Gaussian and Uniform distributions
y
Numeric values generated from Gaussian and Uniform distributions
label
Categorical labels 1,2
Nathakhun Wiroonsri and Onthada Preedasawakul
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
B2_data, B3_data, B_WP.IDX, B_Wvalid, B_XB.IDX
A 2
-dimensional dataset from Wiroonsri and Preedasawakul (2024) generated from 5
different Gaussian distributions labeled as 1-5
.
B2_data
B2_data
A data frame with 850 data points and 3 variables
x
Numeric values generated from Gaussian distributions
y
Numeric values generated from Gaussian distributions
label
Categorical labels 1,2,3,4,5
Nathakhun Wiroonsri and Onthada Preedasawakul
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
B1_data, B3_data, B_WP.IDX, B_Wvalid, B_XB.IDX
A 2
-dimensional dataset from Wiroonsri and Preedasawakul (2024) generated from 5
different Gaussian distributions labeled as 1-5
.
B3_data
B3_data
A data frame with 2300 data points and 3 variables
x
Numeric values generated from Gaussian distributions
y
Numeric values generated from Gaussian distributions
label
Categorical labels 1,2,3,4,5
Nathakhun Wiroonsri and Onthada Preedasawakul
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
B2_data, B4_data, B_WP.IDX, B_Wvalid, B_XB.IDX
A 2
-dimensional dataset from Wiroonsri and Preedasawakul (2024) generated from 6
different Gaussian distributions labeled as 1-6
.
B4_data
B4_data
A data frame with 740 data points and 3 variables
x
Numeric values generated from Gaussian distributions
y
Numeric values generated from Gaussian distributions
label
Categorical labels 1,2,3,4,5,6
Nathakhun Wiroonsri and Onthada Preedasawakul
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
B3_data, B5_data, B_WP.IDX, B_Wvalid, B_XB.IDX
A 2
-dimensional dataset from Wiroonsri and Preedasawakul (2024) generated from 7
different Gaussian and 2
Uniform distributions labeled as 1-9
.
B5_data
B5_data
A data frame with 1820 data points and 3 variables
x
Numeric values generated from Gaussian and Uniform distributions
y
Numeric values generated from Gaussian and Uniform distributions
label
Categorical labels 1,2,3,4,5,6,7,8,9
Nathakhun Wiroonsri and Onthada Preedasawakul
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
B4_data, B6_data, B_WP.IDX, B_Wvalid, B_XB.IDX
A 2
-dimensional dataset from Wiroonsri and Preedasawakul (2024) generated from 3
different Gaussian and 2
Uniform distributions labeled as 1-5
.
B6_data
B6_data
A data frame with 1000 data points and 3 variables
x
Numeric values generated from Gaussian and Uniform distributions
y
Numeric values generated from Gaussian and Uniform distributions
label
Categorical labels 1,2,3,4,5
Nathakhun Wiroonsri and Onthada Preedasawakul
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
B5_data, B7_data, B_WP.IDX, B_Wvalid, B_XB.IDX
A 2
-dimensional dataset from Wiroonsri and Preedasawakul (2024) generated from 3
different Gaussian and 2
Uniform distributions labeled as 1-5
.
B7_data
B7_data
A data frame with 800 data points and 3 variables
x
Numeric values generated from Gaussian and Uniform distributions
y
Numeric values generated from Gaussian and Uniform distributions
label
Categorical labels 1,2,3,4,5
Nathakhun Wiroonsri and Onthada Preedasawakul
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
B6_data, B1_data, B_WP.IDX, B_Wvalid, B_XB.IDX
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using an underlying cluster validity index (CVI) and Dirichlet prior parameters of the user's choice. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
BayesCVIs(CVI, n, kmax, opt.pt, alpha = "default", mult.alpha = 1/2)
BayesCVIs(CVI, n, kmax, opt.pt, alpha = "default", mult.alpha = 1/2)
CVI |
the CVI values for |
n |
a number of data point. |
kmax |
a maximum number of clusters to be considered. |
opt.pt |
a character string indicating whether the maximum or the minimum of |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
BCVI is defined as follows.
Let
for a CVI such that the smallest value indicates the optimal number of clusters and
for a CVI such that the largest value indicates the optimal number of clusters.
Assume that
represents the conditional probability density function of the dataset given , where
is the normalizing constant. Assume further that
follows a Dirichlet prior distribution with parameters
. The posterior distribution of
still remains a Dirichlet distribution with parameters
.
The BCVI is then defined as
where
The variance of can be computed as
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
opt.pt |
a character string indicating whether the maximum or the minimum of |
Nathakhun Wiroonsri and Onthada Preedasawakul
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
B2_data, B_TANG.IDX, B_WP.IDX, B_Wvalid, B_DB.IDX
# install a package for computing an underlying CVI # install.packages("UniversalCVI") library(UniversalCVI) library(BayesCVI) data = R1_data[,-3] # Compute WP index by WP.IDX using default gamma FCM.WP = WP.IDX(scale(data), cmax = 10, cmin = 2, corr = 'pearson', method = 'FCM', fzm = 2, iter = 100, nstart = 20, NCstart = TRUE) # WP.IDX values result = FCM.WP$WP$WPI aalpha = c(20,20,20,5,5,5,0.5,0.5,0.5) B.WP = BayesCVIs(CVI = result, n = nrow(data), kmax = 10, opt.pt = "max", alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.WP) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
# install a package for computing an underlying CVI # install.packages("UniversalCVI") library(UniversalCVI) library(BayesCVI) data = R1_data[,-3] # Compute WP index by WP.IDX using default gamma FCM.WP = WP.IDX(scale(data), cmax = 10, cmin = 2, corr = 'pearson', method = 'FCM', fzm = 2, iter = 100, nstart = 20, NCstart = TRUE) # WP.IDX values result = FCM.WP$WP$WPI aalpha = c(20,20,20,5,5,5,0.5,0.5,0.5) B.WP = BayesCVIs(CVI = result, n = nrow(data), kmax = 10, opt.pt = "max", alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.WP) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
Plot Bayesian cluster validity index (BCVI) with and without standard deviation error bars and the underlying index.
plot_BCVI(B.result, mult.err.bar = 2)
plot_BCVI(B.result, mult.err.bar = 2)
B.result |
a result from one of the functions |
mult.err.bar |
a multiplier of the stadard deviations to be used for plotting error bars |
BCVI is defined as follows.
Let
for a cluster validity index (CVI) such that the smallest value indicates the optimal number of clusters and
for a CVI such that the largest indicates the optimal number of clusters.
Assume that
represents the conditional probability density function of the dataset given , where
is the normalizing constant. Assume further that
follows a Dirichlet prior distribution with parameters
. The posterior distribution of
still remains a Dirichlet distribution with parameters
.
The BCVI is then defined as
where
The variance of can be computed as
plot_index |
a plot of the underlying index for the number of groups from |
plot_BCVI |
a plot of BCVI for the number of groups from |
error_bar_plot |
a plot of BCVI with error bars for the number of groups from |
Nathakhun Wiroonsri and Onthada Preedasawakul
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
B_STRPBM.IDX, B_TANG.IDX, B_XB.IDX, B_Wvalid, B_WP.IDX, B_DB.IDX
library(BayesCVI) library(UniversalCVI) ##Soft clustering # The data included in this package. data = B7_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.XB = B_XB.IDX(x = scale(data), kmax =10, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.XB) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot ## Hard clustering # The data included in this package. data = B2_data[,1:2] K.STR = STRPBM.IDX(scale(data), kmax = 10, kmin = 2, method = "kmeans", indexlist = "STR", nstart = 100) # WP.IDX values result = K.STR$STR$STR aalpha = c(20,20,20,5,5,5,0.5,0.5,0.5) B.STR = BayesCVIs(CVI = result, n = nrow(data), kmax = 10, opt.pt = "max", alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.STR) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot
library(BayesCVI) library(UniversalCVI) ##Soft clustering # The data included in this package. data = B7_data[,1:2] # alpha aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5) B.XB = B_XB.IDX(x = scale(data), kmax =10, method = "FCM", fzm = 2, nstart = 20, iter = 100, alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.XB) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot ## Hard clustering # The data included in this package. data = B2_data[,1:2] K.STR = STRPBM.IDX(scale(data), kmax = 10, kmin = 2, method = "kmeans", indexlist = "STR", nstart = 100) # WP.IDX values result = K.STR$STR$STR aalpha = c(20,20,20,5,5,5,0.5,0.5,0.5) B.STR = BayesCVIs(CVI = result, n = nrow(data), kmax = 10, opt.pt = "max", alpha = aalpha, mult.alpha = 1/2) # plot the BCVI pplot = plot_BCVI(B.STR) pplot$plot_index pplot$plot_BCVI pplot$error_bar_plot