| Title: | Useful Tools for Cognitive Diagnosis Modeling |
|---|---|
| Description: | Provides useful tools for cognitive diagnosis modeling (CDM). The package includes functions for estimating CDMs such as the restricted DINA or DINO (R-DINA or R-DINO) models (Nájera et al., 2023; <doi:10.3102/10769986231158829>), the G-DINA model for forced-choice blocks (Nájera et al., 2024; <doi:10.1111/bmsp.12393>), and the general nonparametric classification method (Chiu et al., 2018; <doi:10.1007/s11336-017-9595-4>). Additionally, methods for identifying the latent structure of CDMs are also available, such as dimensionality assessment via parallel analysis and automated fit comparison (Nájera et al., 2021; <doi:10.3389/fpsyg.2021.614470>), as well as empirical Q-matrix validation and estimation using the Hull method (Nájera et al., 2021; <doi:10.1111/bmsp.12228>) and the discrete factor loading method (Wang et al., 2018; <doi:10.1007/978-3-319-77249-3_29>). Other practical functions for CDM applications include corrected classification accuracy estimation via multiple imputation (Kreitchmann et al., 2022; <doi:10.3758/s13428-022-01967-5>), model-based recursive partitioning to detect non-invariant subpopulations (Nájera et al., in press), person-fit evaluation (Santos et al., 2020; <doi:10.1007/s00357-019-09325-5>), and model identifiability assessment (Gu and Xu, 2021; <doi:10.5705/ss.202018.0410>). Lastly, the package also provides useful functions for CDM simulation studies, such as random Q-matrix generation and forced-choice data generation. |
| Authors: | Pablo Nájera [aut, cre, cph], Miguel A. Sorrel [aut, cph], Francisco J. Abad [aut, cph], Rodrigo S. Kreitchmann [ctb], Kevin Santos [ctb], David Goretzko [ctb], Philipp Sterner [ctb] |
| Maintainer: | Pablo Nájera <[email protected]> |
| License: | GPL-3 |
| Version: | 1.0.7 |
| Built: | 2026-06-04 06:55:25 UTC |
| Source: | https://github.com/pablo-najera/cdmtools |
This function calculates the test-, pattern-, and attribute-level classification accuracy indices based on integrated posterior probabilities from multiple imputed item parameters (Kreitchmann et al., 2022).
The classification accuracy indices are the ones developed by Iaconangelo (2017) and Wang et al. (2015).
It is only applicable to dichotomous attributes. The function is built upon the CA function from the GDINA package (Ma & de la Torre, 2020).
CA.MI(fit, what = "EAP", R = 500, n.cores = 1, verbose = TRUE, seed = NULL)CA.MI(fit, what = "EAP", R = 500, n.cores = 1, verbose = TRUE, seed = NULL)
fit |
An object of class |
what |
What attribute estimates are used? The default is |
R |
Number of bootstrap samples and imputations. The default is 500. |
n.cores |
Number of processors to use to speed up multiple imputation. The default is 2. |
verbose |
Show progress. The default is |
seed |
A seed for obtaining consistent results. If |
CA.MI returns an object of class CA, with a list of elements:
tauEstimated test-level classification accuracy, see Iaconangelo (2017, Eq 2.2) (vector).
tau_lEstimated pattern-level classification accuracy, see Iaconangelo (2017, p. 13) (vector).
tau_kEstimated attribute-level classification accuracy, see Wang, et al (2015, p. 461 Eq 6) (vector).
CCMConditional classification matrix, see Iaconangelo (2017, p. 13) (matrix).
Rodrigo S. Kreitchmann, Universidad Nacional de Educación a Distancia
Iaconangelo, C.(2017). Uses of classification error probabilities in the three-step approach to estimating cognitive diagnosis models. (Unpublished doctoral dissertation). New Brunswick, NJ: Rutgers University.
Kreitchmann, R. S., de la Torre, J., Sorrel, M. A., Nájera, P., & Abad, F. J. (2022). Improving reliability estimation in cognitive diagnosis modeling. Behavior Research Methods. https://doi.org/10.3758/s13428-022-01967-5
Ma, W., & de la Torre, J. (2020). GDINA: An R package for cognitive diagnosis modeling. Journal of Statistical Software, 93(14). https://doi.org/10.18637/jss.v093.i14
Wang, W., Song, L., Chen, P., Meng, Y., & Ding, S. (2015). Attribute-level and pattern-level classification consistency and accuracy indices for cognitive diagnostic assessment. Journal of Educational Measurement, 52 , 457-476.
library(GDINA) dat <- sim10GDINA$simdat[1:100,] Q <- sim10GDINA$simQ fit <- GDINA(dat = dat, Q = Q, model = "GDINA") ca.mi <- CA.MI(fit) ca.milibrary(GDINA) dat <- sim10GDINA$simdat[1:100,] Q <- sim10GDINA$simQ fit <- GDINA(dat = dat, Q = Q, model = "GDINA") ca.mi <- CA.MI(fit) ca.mi
Empirical Q-matrix estimation based on the discrete factor loading method (Wang, Song, & Ding, 2018) as used in Nájera, Abad, and Sorrel (2021).
Apart from the conventional dichotomization criteria, the procedure based on loading differences described in Garcia-Garzon, Abad, and Garrido (2018) is also available.
Furthermore, the bagging bootstrap implementation (Xu & Shang, 2018) can be applied; it is recommended when working with small sample sizes.
The psych package (Revelle, 2020) is used for estimating the required exploratory factor analysis (EFA).
estQ( r, K, n.obs = NULL, criterion = "row", boot = FALSE, efa.args = list(cor = "tet", rotation = "oblimin", fm = "uls"), boot.args = list(N = 0.8, R = 100, verbose = TRUE, seed = NULL) )estQ( r, K, n.obs = NULL, criterion = "row", boot = FALSE, efa.args = list(cor = "tet", rotation = "oblimin", fm = "uls"), boot.args = list(N = 0.8, R = 100, verbose = TRUE, seed = NULL) )
r |
A correlation matrix or raw data ( |
K |
Number of attributes to use. |
n.obs |
Number of individuals if |
criterion |
Dichotomization criterion to transform the factor loading matrix into the Q-matrix. The possible options include |
boot |
Apply the bagging bootstrap implementation? Only available if |
efa.args |
A list of arguments for the EFA estimation:
|
boot.args |
A list of arguments for the bagging bootstrap implementation (ignored if
|
estQ returns an object of class estQ.
est.QEstimated Q-matrix (matrix).
efa.loadsFactor loading matrix (matrix).
efa.commEFA communalities (vector).
efa.fitEFA model fit indices (vector).
boot.QBagging bootstrap Q-matrix before dichotomization. Only if boot = TRUE (matrix).
is.QidQ-matrix identifiability information (list).
specificationsFunction call specifications (list).
Pablo Nájera, Universidad Pontificia Comillas
Garcia-Garzon, E., Abad, F. J., & Garrido, L. E. (2018). Improving bi-factor exploratory modelling: Empirical target rotation based on loading differences. Methodology, 15, 45–55. https://doi.org/10.1027/1614-2241/a000163
Nájera, P., Abad, F. J., & Sorrel, M. A. (2021). Determining the number of attributes in cognitive diagnosis modeling. Frontiers in Psychology, 12:614470. https://doi.org/10.3389/fpsyg.2021.614470
Revelle, W. (2019). psych: Procedures for Psychological, Psychometric, and Personality Research. R package version 1.9.12. https://CRAN.R-project.org/package=psych.
Wang, W., Song, L., & Ding, S. (2018). An exploratory discrete factor loading method for Q-matrix specification in cognitive diagnosis models. In: M. Wilberg, S. Culpepper, R. Janssen, J. Gonzalez, & D. Molenaar (Eds.), Quantitative Psychology. IMPS 2017. Springer Proceedings in Mathematics & Statistics (Vol. 233, pp. 351–362). Springer.
Xu, G., & Shang, Z. (2018). Identifying latent structures in restricted latent class models. Journal of the American Statistical Association, 113, 1284–1295. https://doi.org/10.1080/01621459.2017.1340889
library(GDINA) dat <- sim30GDINA$simdat Q <- sim30GDINA$simQ #------------------------------ # Using default specifications #------------------------------ sugQ1 <- estQ(r = dat, K = 5) # Estimate Q-matrix sugQ1$est.Q <- orderQ(sugQ1$est.Q, Q)$order.Q # Reorder Q-matrix attributes mean(sugQ1$est.Q == Q) # Check similarity with the generating Q-matrix #------------------------------------ # Using the bagging bootstrap method #------------------------------------ # In boot.args argument, R >= 100 is recommended (R = 20 is here used for illustration purposes) sugQ2 <- estQ(r = dat, K = 5, boot = TRUE, boot.args = list(R = 20, seed = 123)) # Estimate Q-matrix sugQ2$est.Q <- orderQ(sugQ2$est.Q, Q)$order.Q # Reorder Q-matrix attributes sugQ2$boot.Q # Proportion of replicas a q-entry was specified in the estimated Q-matrix mean(sugQ2$est.Q == Q) # Check similarity with the generating Q-matrixlibrary(GDINA) dat <- sim30GDINA$simdat Q <- sim30GDINA$simQ #------------------------------ # Using default specifications #------------------------------ sugQ1 <- estQ(r = dat, K = 5) # Estimate Q-matrix sugQ1$est.Q <- orderQ(sugQ1$est.Q, Q)$order.Q # Reorder Q-matrix attributes mean(sugQ1$est.Q == Q) # Check similarity with the generating Q-matrix #------------------------------------ # Using the bagging bootstrap method #------------------------------------ # In boot.args argument, R >= 100 is recommended (R = 20 is here used for illustration purposes) sugQ2 <- estQ(r = dat, K = 5, boot = TRUE, boot.args = list(R = 20, seed = 123)) # Estimate Q-matrix sugQ2$est.Q <- orderQ(sugQ2$est.Q, Q)$order.Q # Reorder Q-matrix attributes sugQ2$boot.Q # Proportion of replicas a q-entry was specified in the estimated Q-matrix mean(sugQ2$est.Q == Q) # Check similarity with the generating Q-matrix
Estimation of the G-DINA model for forced-choice responses according to Nájera et al. (2024).
Block polarity (i.e., statement direction) and initial values for parameters can be specified to determine the design of the forced-choice blocks.
The GDINA package (Ma & de la Torre, 2020) is used to estimate the model via expectation maximumation (EM) algorithm if no priors are used.
To estimate the forced-choice diagnostic classification model (FC-DCM; Huang, 2023) using Bayes modal estimation, please check the codes provided in https://osf.io/h6x9e/.
Only unidimensional statements (i.e., bidimensional blocks) are currently supported.
FCGDINA( dat, Q, polarity = NULL, polarity.initial = 1e-04, att.dist = "saturated", att.prior = NULL, verbose = 1, higher.order = list(), catprob.parm = NULL, control = list() )FCGDINA( dat, Q, polarity = NULL, polarity.initial = 1e-04, att.dist = "saturated", att.prior = NULL, verbose = 1, higher.order = list(), catprob.parm = NULL, control = list() )
dat |
A N individuals x J items ( |
Q |
A F blocks x K attributes Q-matrix ( |
polarity |
A F blocks x 2 ( |
polarity.initial |
A |
att.dist |
How is the joint attribute distribution estimated? It can be |
att.prior |
A |
verbose |
How to print calibration information after each EM iteration? Can be 0, 1 or 2, indicating to print no information, information for current iteration, or information for all iterations. |
higher.order |
A |
catprob.parm |
A |
control |
A |
FCGDINA returns an object of class FCGDINA.
GDINA.objEstimation output from the GDINA function of the GDINA.MJ (Ma & Jiang, 2021) function, depending on whether EM or BM estimation has been used (list).
technicalInformation about initial values (list).
specificationsFunction call specifications (list).
Pablo Nájera, Universidad Pontificia Comillas
Huang, H.-Y. (2023). Diagnostic Classification Model for Forced-Choice Items and Noncognitive Tests. Educational and Psychological Measurement, 83(1), 146-180. https://doi.org/10.1177/00131644211069906
Ma, W., & de la Torre, J. (2020). GDINA: An R package for cognitive diagnosis modeling. Journal of Statistical Software, 93(14). https://doi.org/10.18637/jss.v093.i14
Nájera, P., Kreitchmann, R. S., Escudero, S., Abad, F. J., de la Torre, J., & Sorrel, M. A. (2025). A General Diagnostic Modeling Framework for Forced-Choice Assessments. British Journal of Mathematical and Statistical Psychology.
library(GDINA) set.seed(123) # Q-matrix for the unidimensional statements Q.items <- do.call("rbind", replicate(5, diag(5), simplify = FALSE)) # Guessing and slip GS <- cbind(runif(n = nrow(Q.items), min = 0.1, max = 0.3), runif(n = nrow(Q.items), min = 0.1, max = 0.3)) n.blocks <- 30 # Number of forced-choice blocks #---------------------------------------------------------------------------------------- # Illustration with simulated data using only direct statements (i.e., homopolar blocks) #---------------------------------------------------------------------------------------- # Block polarity (1 = direct statement; -1 = indirect statement) polarity <- matrix(1, nrow = n.blocks, ncol = 2) sim <- simFCGDINA(N = 1000, Q.items, n.blocks = n.blocks, polarity = polarity, model = "GDINA", GDINA.args = list(GS = GS), seed = 123) Q <- sim$Q # Generated Q-matrix of forced-choice blocks dat <- sim$dat # Generated responses att <- sim$att # Generated attribute profiles fit <- FCGDINA(dat = dat, Q = Q, polarity = polarity) # Fit the G-DINA model with EM estimation ClassRate(personparm(fit$GDINA.obj), att) # Classification accuracy #------------------------------------------------------------------------------------------- # Illustration with simulated data using some inverse stataments (i.e., heteropolar blocks) #------------------------------------------------------------------------------------------- polarity <- matrix(1, nrow = n.blocks, ncol = 2) # Including 15 inverse statements polarity[sample(x = 1:(2*n.blocks), size = 15, replace = FALSE)] <- -1 sim <- simFCGDINA(N = 1000, Q.items, n.blocks = n.blocks, polarity = polarity, model = "GDINA", GDINA.args = list(GS = GS), seed = 123) Q <- sim$Q dat <- sim$dat att <- sim$att fit <- FCGDINA(dat = dat, Q = Q, polarity = polarity) ClassRate(personparm(fit$GDINA.obj), att)library(GDINA) set.seed(123) # Q-matrix for the unidimensional statements Q.items <- do.call("rbind", replicate(5, diag(5), simplify = FALSE)) # Guessing and slip GS <- cbind(runif(n = nrow(Q.items), min = 0.1, max = 0.3), runif(n = nrow(Q.items), min = 0.1, max = 0.3)) n.blocks <- 30 # Number of forced-choice blocks #---------------------------------------------------------------------------------------- # Illustration with simulated data using only direct statements (i.e., homopolar blocks) #---------------------------------------------------------------------------------------- # Block polarity (1 = direct statement; -1 = indirect statement) polarity <- matrix(1, nrow = n.blocks, ncol = 2) sim <- simFCGDINA(N = 1000, Q.items, n.blocks = n.blocks, polarity = polarity, model = "GDINA", GDINA.args = list(GS = GS), seed = 123) Q <- sim$Q # Generated Q-matrix of forced-choice blocks dat <- sim$dat # Generated responses att <- sim$att # Generated attribute profiles fit <- FCGDINA(dat = dat, Q = Q, polarity = polarity) # Fit the G-DINA model with EM estimation ClassRate(personparm(fit$GDINA.obj), att) # Classification accuracy #------------------------------------------------------------------------------------------- # Illustration with simulated data using some inverse stataments (i.e., heteropolar blocks) #------------------------------------------------------------------------------------------- polarity <- matrix(1, nrow = n.blocks, ncol = 2) # Including 15 inverse statements polarity[sample(x = 1:(2*n.blocks), size = 15, replace = FALSE)] <- -1 sim <- simFCGDINA(N = 1000, Q.items, n.blocks = n.blocks, polarity = polarity, model = "GDINA", GDINA.args = list(GS = GS), seed = 123) Q <- sim$Q dat <- sim$dat att <- sim$att fit <- FCGDINA(dat = dat, Q = Q, polarity = polarity) ClassRate(personparm(fit$GDINA.obj), att)
Implementation of a model-based recursive partitioning algorithm (Zeileis et al., 2008) for the G-DINA model (Nájera et al., in press) to detect subpopulations based on
different item parameters, underlying model, or Q-matrix. The function is based on the mob function of the package partykit package (Hothorn & Zeileis, 2015), and
estimates CDMs via the GDINA package (MA & de la Torre, 2020).
GDINAtree( dat, covariates, Q, model = "GDINA", maxdepth = 3, minsize = 100, alpha = 0.05, bonferroni = TRUE, ... )GDINAtree( dat, covariates, Q, model = "GDINA", maxdepth = 3, minsize = 100, alpha = 0.05, bonferroni = TRUE, ... )
dat |
A N individuals x J items ( |
covariates |
A N individuals x M covariates ( |
Q |
A J items x K attributes Q-matrix ( |
model |
CDM to be estimated. The possible options include "GDINA","DINA","DINO","ACDM","LLM", "RRUM", "MSDINA", "BUGDINO", "SISM", and "UDF". See the |
maxdepth |
An |
minsize |
An |
alpha |
Nominal significance level. |
bonferroni |
A |
... |
Additional arguments for |
GDINAtree returns an object of class GDINAtree.
treeA modelparty object containing the main results from the G-DINA tree.
fit.nodesA list containing the models fitted in each node (each of them being a GDINA object).
specificationsFunction call specifications (list).
David Goretzko, Goethe University Frankfurt,
Philipp Sterner, LMU Munich,
Pablo Nájera, Universidad Pontificia Comillas
Hothorn, T., & Zeileis, A. (2015). partykit: A Modular Toolkit for Recursive Partytioning in R. Journal of Machine Learning Research, 16, 3905-3909. https://jmlr.org/papers/v16/hothorn15a.html
Nájera, P., Sorrel, M. A., Sterner, P., & Goretzko, D. (2026). Invariance Analysis in Cognitive Diagnosis Models with G-DINA Trees. Manuscript submitted for publication
Ma, W., & de la Torre, J. (2020). GDINA: An R package for cognitive diagnosis modeling. Journal of Statistical Software, 93(14). https://doi.org/10.18637/jss.v093.i14
Zeileis, A., Hothorn, T., & Hornik, K. (2008). Model-Based Recursive Partitioning. Journal of Computational and Graphical Statistics, 17(2), 492-514. https://doi.org/10.1198/106186008X319331
library(GDINA) dat <- as.data.frame(rbind(sim30GDINA$simdat, sim30DINA$simdat)) Q <- sim30GDINA$simQ set.seed(1713) dat$y <- rep(1:2, each = nrow(dat)/2) dat$cov1 <- round(runif(nrow(dat)), 2) dat$cov2 <- round(runif(nrow(dat)), 2) dat$cov3 <- sample(1:2, nrow(dat), replace = TRUE) dat$cov4 <- sample(1:4, nrow(dat), replace = TRUE) fit <- GDINAtree(dat = dat[,1:30], covariates = dat[,31:35], Q = Q) plot(fit)library(GDINA) dat <- as.data.frame(rbind(sim30GDINA$simdat, sim30DINA$simdat)) Q <- sim30GDINA$simQ set.seed(1713) dat$y <- rep(1:2, each = nrow(dat)/2) dat$cov1 <- round(runif(nrow(dat)), 2) dat$cov2 <- round(runif(nrow(dat)), 2) dat$cov3 <- sample(1:2, nrow(dat), replace = TRUE) dat$cov4 <- sample(1:4, nrow(dat), replace = TRUE) fit <- GDINAtree(dat = dat[,1:30], covariates = dat[,31:35], Q = Q) plot(fit)
Generates a Q-matrix. The criteria from Chen, Liu, Xu, & Ying (2015) and Xu & Shang (2018) can be used to generate identifiable Q-matrices. Only binary Q-matrix are supported so far. Useful for simulation studies.
genQ(J, K, Kj, I = 2, min.JK = 3, max.Kcor = 1, Qid = "none", seed = NULL)genQ(J, K, Kj, I = 2, min.JK = 3, max.Kcor = 1, Qid = "none", seed = NULL)
J |
Number of items. |
K |
Number of attributes. |
Kj |
A vector specifying the number (or proportion, if summing up to 1) of items measuring 1, 2, 3, ..., attributes. The first element of the vector determines the number (or proportion) of items measuring 1 attribute, and so on. See |
I |
Number of identity matrices to include in the Q-matrix (up to column permutation). The default is 2. |
min.JK |
Minimum number of items measuring each attribute. It can be overwritten by |
max.Kcor |
Maximum allowed tetrachoric correlation among the columns to avoid overlapping (Nájera, Sorrel, de la Torre, & Abad, 2020). The default is 1. |
Qid |
Assure that the generated Q-matrix is generically identifiable. It includes |
seed |
A seed for obtaining consistent results. If |
genQ returns an object of class genQ.
gen.QThe generated Q-matrix (matrix).
JKNumber of items measuring each attribute (vector).
KcorTetrachoric correlations among the columns (matrix).
is.QidQ-matrix identifiability information (list).
specificationsFunction call specifications (list).
Pablo Nájera, Universidad Pontificia Comillas
Chen, Y., Liu, J., Xu, G., & Ying, Z. (2015). Statistical analysis of Q-matrix based diagnostic classification models. Journal of the American Statistical Association, 110, 850-866. https://doi.org/10.1080/01621459.2014.934827
Nájera, P., Sorrel, M. A., de la Torre, J., & Abad, F. J. (2020). Balancing fit and parsimony to improve Q-matrix validation. British Journal of Mathematical and Statistical Psychology. https://doi.org/10.1111/bmsp.12228
Xu, G., & Shang, Z. (2018). Identifying latent structures in restricted latent class models. Journal of the American Statistical Association, 113, 1284-1295. https://doi.org/10.1080/01621459.2017.1340889
Kj <- c(15, 10, 0, 5) # 15 one-att, 10 2-atts, 0 3-atts, and 5 four-atts items Q <- genQ(J = 30, K = 4, Kj = Kj, Qid = "others", seed = 123)Kj <- c(15, 10, 0, 5) # 15 one-att, 10 2-atts, 0 3-atts, and 5 four-atts items Q <- genQ(J = 30, K = 4, Kj = Kj, Qid = "others", seed = 123)
Attribute profile estimation using the general nonparametric classification method (GNPC; Chiu, Sun, & Bian, 2018).
The GNPC can be considered as a robust alternative to the parametric G-DINA model with low sample sizes.
The AlphaNP function from the NPCD package (Zheng & Chiu, 2019; Chiu, Sun, & Bian, 2018) using weighted Hamming distances is used to initiate the procedure.
GNPC( dat, Q, initiate = "AND", min.change = 0.001, maxitr = 1000, verbose = TRUE )GNPC( dat, Q, initiate = "AND", min.change = 0.001, maxitr = 1000, verbose = TRUE )
dat |
A N individuals x J items ( |
Q |
A J items x K attributes Q-matrix ( |
initiate |
Should the conjunctive ( |
min.change |
Minimum proportion of modified attribute profiles to use as a stopping criterion. Default is .001. |
maxitr |
Maximum number of iterations. Default is 1000. |
verbose |
Print information after each iteration. Default is |
GNPC returns an object of class GNPC.
alpha.estEstimated attribute profiles (matrix).
loss.matrixThe distances between the weighted ideal responses from each latent class (rows) and examinees' observed responses (columns) (matrix).
eta.wThe weighted ideal responses for each latent class (rows) on each item (columns) (matrix).
wThe estimated weights, used to compute the weighted ideal responses (matrix).
n.iteNumber of iterations required to achieve convergence (double).
hist.changeProportion of modified attribute profiles in each iteration (vector).
specificationsFunction call specifications (list).
Pablo Nájera, Universidad Pontificia Comillas
Chiu, C.-Y., & Douglas, J. (2013). A nonparametric approach to cognitive diagnosis by proximity to ideal response patterns. Journal of Classification, 30, 225-250. DOI: 10.1007/s00357-013-9132-9
Chiu, C.-Y., Sun, Y., & Bian, Y. (2018). Cognitive diagnosis for small education programs: The general nonparametric classification method. Psychometrika, 83, 355-375. DOI: 10.1007/s11336-017-9595-4
Zheng, Y., & Chiu, C.-Y. (2019). NPCD: Nonparametric methods for cognitive diagnosis. R package version 1.0-11. https://cran.r-project.org/web/packages/NPCD/.
library(GDINA) Q <- sim30GDINA$simQ # Q-matrix K <- ncol(Q) J <- nrow(Q) set.seed(123) GS <- data.frame(guessing = rep(0.1, J), slip = rep(0.1, J)) sim <- simGDINA(200, Q, GS) simdat <- sim$dat # Simulated data simatt <- sim$attribute # Generating attributes fit.GNPC <- GNPC(simdat, Q) # Apply the GNPC method ClassRate(fit.GNPC$alpha.est, simatt) # Check classification accuracylibrary(GDINA) Q <- sim30GDINA$simQ # Q-matrix K <- ncol(Q) J <- nrow(Q) set.seed(123) GS <- data.frame(guessing = rep(0.1, J), slip = rep(0.1, J)) sim <- simGDINA(200, Q, GS) simdat <- sim$dat # Simulated data simatt <- sim$attribute # Generating attributes fit.GNPC <- GNPC(simdat, Q) # Apply the GNPC method ClassRate(fit.GNPC$alpha.est, simatt) # Check classification accuracy
Uses a post-hoc simulation approach to check whether a cognitive diagnosis model is identified (i.e., all latent classes are distinguishable; de la Torre et al., 2023).
is.CDMid(fit, N = 10000, timesJ = 20, Wald = FALSE, verbose = TRUE)is.CDMid(fit, N = 10000, timesJ = 20, Wald = FALSE, verbose = TRUE)
fit |
An object of class RDINA or GDINA (Ma & de la Torre, 2020). |
N |
A numeric value that indicates the number of respondents to simulate. Default is 10000. |
timesJ |
A numeric value that indicates the number of times the test length is multiplied. Default is 20. |
Wald |
A |
verbose |
A |
is.CDMid returns an object of class is.CDMid.
totalOverall classification accuracy (CCA) and number of posterior multiple modes (PMM). A CCA = 1 indicates that all latent classes are identified (vector).
classClassification accuracy (CCA) and number of posterior multiple modes (PMM) for each latent class. A CCA = 1 indicates that the latent class is identified (data.frame).
Pablo Nájera, Universidad Pontificia Comillas
de la Torre, J., Sorrel, M. A., & Nájera, P. (2023, July). Cognitive diagnosis modeling. Workshop at the VII International Psychometric Summer School "Applied Psychometrics in Psychology and Education". Yerevan, Armenia.
library(GDINA) dat <- sim30GDINA$simdat Q <- sim30GDINA$simQ fit <- GDINA(dat, Q) id <- is.CDMid(fit)library(GDINA) dat <- sim30GDINA$simdat Q <- sim30GDINA$simQ fit <- GDINA(dat, Q) id <- is.CDMid(fit)
Checks whether a Q-matrix fulfills the conditions for strict and generic identifiability according to Gu & Xu (2021).
is.Qid(Q, model)is.Qid(Q, model)
Q |
A J items x K attributes Q-matrix ( |
model |
CDM to be considered. It includes |
is.Qid returns an object of class is.Qid.
strictIs the Q-matrix strictly identifiable? (logical).
genericIs the Q-matrix generically identifiable? (logical).
conditionsIdentifiability criteria and whether they are fulfilled or not (vector).
specificationsFunction call specifications (list).
Pablo Nájera, Universidad Pontificia Comillas
Miguel A. Sorrel, Universidad Autónoma de Madrid
Gu, Y., & Xu, G. (2021). Sufficient and necessary conditions for the identifiability of the Q-matrix. Statistica Sinica, 31, 449-472. https://www.jstor.org/stable/26969691
Kj <- c(15, 10, 0, 5) Q <- genQ(J = 30, K = 4, Kj = Kj, Qid = "others", seed = 123)$gen.Q idQ <- is.Qid(Q, model = "DINA")Kj <- c(15, 10, 0, 5) Q <- genQ(J = 30, K = 4, Kj = Kj, Qid = "others", seed = 123)$gen.Q idQ <- is.Qid(Q, model = "DINA")
Introduces random misspecifications in a Q-matrix. Only binary Q-matrix are supported so far. Useful for simulation studies.
missQ(Q, qjk, retainJ = 0, Qid = "none", seed = NULL)missQ(Q, qjk, retainJ = 0, Qid = "none", seed = NULL)
Q |
A J items x K attributes Q-matrix ( |
qjk |
Number (or proportion, if lower than 1) of q-entries to modify in the Q-matrix. |
retainJ |
Number of items to retain (i.e., not modify) in the Q-matrix. It will retain the first |
Qid |
Assure that the generated Q-matrix is generically identifiable. It includes |
seed |
A seed for obtaining consistent results. If |
missQ returns an object of class missQ.
miss.QThe misspecified Q-matrix (matrix).
QThe input (true) Q-matrix (matrix).
JKNumber of items measuring each attribute (vector).
KcorTetrachoric correlations among the columns (matrix).
is.QidQ-matrix identifiability information (list).
specificationsFunction call specifications (list).
Pablo Nájera, Universidad Pontificia Comillas
Xu, G., & Shang, Z. (2018). Identifying latent structures in restricted latent class models. Journal of the American Statistical Association, 113, 1284-1295. https://doi.org/10.1080/01621459.2017.1340889
Kj <- c(15, 10, 0, 5) # 15 one-att, 10 2-atts, 0 3-atts, and 5 four-atts items Q <- genQ(J = 30, K = 4, Kj = Kj, Qid = "others", seed = 123) miss.Q <- missQ(Q = Q$gen.Q, qjk = .20, retainJ = 4, seed = 123)Kj <- c(15, 10, 0, 5) # 15 one-att, 10 2-atts, 0 3-atts, and 5 four-atts items Q <- genQ(J = 30, K = 4, Kj = Kj, Qid = "others", seed = 123) miss.Q <- missQ(Q = Q$gen.Q, qjk = .20, retainJ = 4, seed = 123)
A procedure for determining the number of attributes underlying CDM using model fit comparison. For each number of attributes under exploration, a Q-matrix is estimated from the data using the discrete factor loading method (Wang, Song, & Ding, 2018), which can be further validated using the Hull method (Nájera, Sorrel, de la Torre, & Abad, 2020). Then, a CDM is fitted to the data using the resulting Q-matrix, and several fit indices are computed. After the desired range of number of attributes has been explored, the fit indices are compared. A suggested number of attributes is given for each fit index. The AIC index should be preferred among the other fit indices. For further details, see Nájera, Abad, & Sorrel (2021). This function can be also used by directly providing different Q-matrices (instead of estimating them from the data) in order to compare their fit and select the most appropriate Q-matrix. Note that, if Q-matrices are provided, this function will no longer serve as a dimensionality assessment method, but just as an automated model comparison procedure.
modelcompK( dat, exploreK = 1:7, Qs = NULL, stop = "none", val.Q = TRUE, estQ.args = list(criterion = "row", cor = "tet", rotation = "oblimin", fm = "uls"), valQ.args = list(index = "PVAF", iterative = "test.att", maxitr = 5, CDMconv = 0.01), verbose = TRUE )modelcompK( dat, exploreK = 1:7, Qs = NULL, stop = "none", val.Q = TRUE, estQ.args = list(criterion = "row", cor = "tet", rotation = "oblimin", fm = "uls"), valQ.args = list(index = "PVAF", iterative = "test.att", maxitr = 5, CDMconv = 0.01), verbose = TRUE )
dat |
A N individuals x J items ( |
exploreK |
Number of attributes to explore. The default is from 1 to 7 attributes. |
Qs |
A list of Q-matrices to compare in terms of fit. If |
stop |
A fit index to use for stopping the procedure if a model leads to worse fit than a simpler one. This can be useful for saving time without exploring the whole exploreK when it is probable that the correct dimensionality has been already visited. It includes |
val.Q |
Validate the estimated Q-matrices using the Hull method? Note that validating the Q-matrix is expected to increase its quality, but the computation time will increase. The default is |
estQ.args |
A list of arguments for the discrete factor loading empirical Q-matrix estimation method (see the
|
valQ.args |
A list of arguments for the Hull empirical Q-matrix validation method. Only applicable if
|
verbose |
Show progress? The default is |
modelcompK returns an object of class modelcompK.
sug.KThe suggested number of attributes for each fit index (vector). Only if Qs = NULL.
sel.QThe suggested Q-matrix for each fit index (vector).
fitThe fit indices for each fitted model (matrix).
exp.exploreKExplored dimensionality (vector). It can be different from exploreK if stop has been used.
usedQQ-matrices used to fit each model (list). They will be the estimated (and validated) Q-matrices if Qs = NULL. Otherwise, they will be Qs.
specificationsFunction call specifications (list).
Pablo Nájera, Universidad Pontificia Comillas
Miguel A. Sorrel, Universidad Autónoma de Madrid
Francisco J. Abad, Universidad Autónoma de Madrid
Ma, W., & de la Torre, J. (2020). GDINA: An R package for cognitive diagnosis modeling. Journal of Statistical Software, 93(14). https://doi.org/10.18637/jss.v093.i14
Nájera, P., Abad, F. J., & Sorrel, M. A. (2021). Determining the number of attributes in cognitive diagnosis modeling. Frontiers in Psychology, 12:614470. https://doi.org/10.3389/fpsyg.2021.614470
Nájera, P., Sorrel, M. A., de la Torre, J., & Abad, F. J. (2020). Balancing fit and parsimony to improve Q-matrix validation. British Journal of Mathematical and Statistical Psychology. https://doi.org/10.1111/bmsp.12228
Wang, W., Song, L., & Ding, S. (2018). An exploratory discrete factor loading method for Q-matrix specification in cognitive diagnosis models. In: M. Wilberg, S. Culpepper, R. Janssen, J. González, & D. Molenaar (Eds.), Quantitative Psychology. IMPS 2017. Springer Proceedings in Mathematics & Statistics (Vol. 233, pp. 351-362). Springer.
library(GDINA) dat <- sim30GDINA$simdat Q <- sim30GDINA$simQ #------------------------------------- # Assess dimensionality from CDM data #------------------------------------- mcK <- modelcompK(dat = dat, exploreK = 4:7, stop = "AIC", val.Q = TRUE, verbose = TRUE) mcK$sug.K # Check suggested number of attributes by each fit index mcK$fit # Check fit indices for each K explored sug.Q <- mcK$usedQ[[paste0("K", mcK$sug.K["AIC"])]] # Suggested Q-matrix by AIC sug.Q <- orderQ(sug.Q, Q)$order.Q # Reorder Q-matrix attributes mean(sug.Q == Q) # Check similarity with the generating Q-matrix #-------------------------------------------------- # Automatic fit comparison of competing Q-matrices #-------------------------------------------------- trueQ <- Q missQ1 <- missQ(Q, .10, seed = 123)$miss.Q missQ2 <- missQ(Q, .20, seed = 456)$miss.Q missQ3 <- missQ(Q, .30, seed = 789)$miss.Q Qs <- list(trueQ, missQ1, missQ2, missQ3) mc <- modelcompK(dat = dat, Qs = Qs, verbose = TRUE) mc$sel.Q # Best-fitting Q-matrix for each fit index mc$fit # Check fit indices for each Q exploredlibrary(GDINA) dat <- sim30GDINA$simdat Q <- sim30GDINA$simQ #------------------------------------- # Assess dimensionality from CDM data #------------------------------------- mcK <- modelcompK(dat = dat, exploreK = 4:7, stop = "AIC", val.Q = TRUE, verbose = TRUE) mcK$sug.K # Check suggested number of attributes by each fit index mcK$fit # Check fit indices for each K explored sug.Q <- mcK$usedQ[[paste0("K", mcK$sug.K["AIC"])]] # Suggested Q-matrix by AIC sug.Q <- orderQ(sug.Q, Q)$order.Q # Reorder Q-matrix attributes mean(sug.Q == Q) # Check similarity with the generating Q-matrix #-------------------------------------------------- # Automatic fit comparison of competing Q-matrices #-------------------------------------------------- trueQ <- Q missQ1 <- missQ(Q, .10, seed = 123)$miss.Q missQ2 <- missQ(Q, .20, seed = 456)$miss.Q missQ3 <- missQ(Q, .30, seed = 789)$miss.Q Qs <- list(trueQ, missQ1, missQ2, missQ3) mc <- modelcompK(dat = dat, Qs = Qs, verbose = TRUE) mc$sel.Q # Best-fitting Q-matrix for each fit index mc$fit # Check fit indices for each Q explored
Reorders Q-matrix columns according to a target matrix (e.g., another Q-matrix). Specifically, it provides a reordered Q-matrix which columns show the lowest possible average Tucker index congruent coefficient with the target columns. Reordering a Q-matrix is alike relabeling the attributes and it does not change the model. Useful for simulation studies (e.g., comparing a validated Q-matrix with the generating Q-matrix).
orderQ(Q, target)orderQ(Q, target)
Q |
A J items x K attributes Q-matrix ( |
target |
A J items x K attributes Q-matrix ( |
orderQ returns an object of class orderQ.
order.QThe reordered Q-matrix (matrix).
configsComparison information between the different column configurations of the Q-matrix and the target Q-matrix, including the average absolute difference and the average Tucker index of factor congruence (matrix). The function will not look for all possible specifications if a perfect match is found.
specificationsFunction call specifications (list).
Francisco J. Abad, Universidad Autónoma de Madrid
Pablo Nájera, Universidad Pontificia Comillas
library(GDINA) dat <- sim30GDINA$simdat Q <- sim30GDINA$simQ sugQ1 <- estQ(r = dat, K = 5) # Estimate Q-matrix sugQ1$est.Q <- orderQ(sugQ1$est.Q, Q)$order.Q # Reorder Q-matrix attributes mean(sugQ1$est.Q == Q) # Check similarity with the generating Q-matrixlibrary(GDINA) dat <- sim30GDINA$simdat Q <- sim30GDINA$simQ sugQ1 <- estQ(r = dat, K = 5) # Estimate Q-matrix sugQ1$est.Q <- orderQ(sugQ1$est.Q, Q)$order.Q # Reorder Q-matrix attributes mean(sugQ1$est.Q == Q) # Check similarity with the generating Q-matrix
Parallel analysis with column permutation (i.e., resampling) as used in Nájera, Abad, & Sorrel (2021).
It is recommended to use principal components, Pearson correlations, and mean criterion (Garrido, Abad, & Ponsoda, 2013; Nájera, Abad, & Sorrel, 2021).
The parallel analysis based on principal axis factor analysis is conducted using the fa.parallel function of the psych R package (Revelle, 2020).
The tetrachoric correlations are efficiently estimated using the sirt R package (Robitzsch, 2020).
The graph is made with the ggplot2 package (Wickham et al., 2020).
paK( dat, R = 100, fa = "pc", cor = "both", cutoff = "mean", fm = "uls", plot = TRUE, verbose = TRUE, seed = NULL )paK( dat, R = 100, fa = "pc", cor = "both", cutoff = "mean", fm = "uls", plot = TRUE, verbose = TRUE, seed = NULL )
dat |
A N individuals x J items ( |
R |
Number of resampled datasets (i.e., replications) to generate. The default is 100. |
fa |
Extraction method to use. It includes |
cor |
What type of correlations to use. It includes |
cutoff |
What criterion to use as the cutoff. It can be |
fm |
Factoring method to use. It includes |
plot |
Print the parallel analysis plot? Note that the plot might be messy if many variants are requested. The default is |
verbose |
progress. The default is |
seed |
A seed for obtaining consistent results. If |
paK returns an object of class paK.
sug.KThe suggested number of attributes for each variant (vector).
e.valuesThe sample and reference eigenvalues (matrix).
plotThe parallel analysis plot. Only if plot = TRUE (plot).
specificationsFunction call specifications (list).
Pablo Nájera, Universidad Pontificia Comillas
Miguel A. Sorrel, Universidad Autónoma de Madrid
Francisco J. Abad, Universidad Autónoma de Madrid
Garrido, L. E., Abad, F. J., & Ponsoda, V. (2013). A new look at Horn's parallel analysis with ordinal variables. Psychological Methods, 18, 454-474. https://doi.org/10.1037/a0030005
Nájera, P., Abad, F. J., & Sorrel, M. A. (2021). Determining the number of attributes in cognitive diagnosis modeling. Frontiers in Psychology, 12:614470. https://doi.org/10.3389/fpsyg.2021.614470
Revelle, W. (2019). psych: Procedures for Psychological, Psychometric, and Personality Research. R package version 1.9.12. https://CRAN.R-project.org/package=psych.
Robitzsch, A. (2020). sirt: Supplementary Item Response Theory Models. R package version 3.9-4. https://CRAN.R-project.org/package=sirt.
Wickham, H., et al. (2020). ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. R package version 3.3.2. https://CRAN.R-project.org/package=ggplot2.
library(GDINA) dat <- sim30GDINA$simdat Q <- sim30GDINA$simQ # In paK, R = 100 is recommended (R = 30 is here used for illustration purposes) pa.K <- paK(dat = dat, R = 30, fa = "pc", cutoff = c("mean", 95), plot = TRUE, seed = 123) pa.K$sug.K # Check suggested number of attributes by each parallel analysis variant pa.K$e.values # Check eigenvalues pa.K$plot # Show parallel analysis plotlibrary(GDINA) dat <- sim30GDINA$simdat Q <- sim30GDINA$simQ # In paK, R = 100 is recommended (R = 30 is here used for illustration purposes) pa.K <- paK(dat = dat, R = 30, fa = "pc", cutoff = c("mean", 95), plot = TRUE, seed = 123) pa.K$sug.K # Check suggested number of attributes by each parallel analysis variant pa.K$e.values # Check eigenvalues pa.K$plot # Show parallel analysis plot
This function calculates the standardized log-likelihood statistic (lZ; Cui & Li, 2015; Drasgow et al. 1985) and the proposals for correcting its distribution discussed in Santos et al. (2019).
personFit(fit, att.est = "MLE", sig.level = 0.05, p.adjust.method = "BH")personFit(fit, att.est = "MLE", sig.level = 0.05, p.adjust.method = "BH")
fit |
An object of class |
att.est |
What attribute estimates are used? The default is |
sig.level |
Scalar numeric. Alpha level for decision. Default is 0.05. |
p.adjust.method |
Scalar character. Correction method for p-values. Possible values include "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", and "none". See p.adjust function from the stats R package for additional details. Default is BH. |
personFit returns an object of class personFit, with a list of elements:
statPerson fit statistics (data.frame).
pp-values (two-sided test) for the person fit statistics (data.frame).
sigpScalar vectors denoting the examinees for which the person fit statitic is significant (p-value) (list).
sigadjpScalar vectors denoting the examinees for which the person fit statitic is significant (adjusted p-value) (list).
Miguel A. Sorrel, Universidad Autónoma de Madrid,
Kevin Santos, University of the Philippines,
Pablo Nájera, Universidad Pontificia Comillas
Cui, Y., & Li, J. (2015). Evaluating person fit for cognitive diagnostic assessment. Applied Psychological Measurement, 39, 223–238. https://doi.org/10.1177/0146621614557272
Drasgow, F., Levine, M. V., & Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38, 67–86. https://psycnet.apa.org/doi/10.1111/j.2044-8317.1985.tb00817.x
Santos, K. C. P., de la Torre, J., & von Davier, M. (2020). Adjusting person fit index for skewness in cognitive diagnosis modeling. Journal of Classification, 37, 399-420. https://doi.org/10.1007/s00357-019-09325-5
library(GDINA) dat <- sim10GDINA$simdat[1:20, ] Q <- sim10GDINA$simQ fit <- GDINA(dat = dat, Q = Q, model = "GDINA") res.personFit <- personFit(fit) res.personFitlibrary(GDINA) dat <- sim10GDINA$simdat[1:20, ] Q <- sim10GDINA$simQ fit <- GDINA(dat = dat, Q = Q, model = "GDINA") res.personFit <- personFit(fit) res.personFit
Estimation of the restricted deterministic input, noisy "and" gate model (R-DINA; Nájera et al., 2023). In addition to the non-compensatory (i.e., conjunctive) condensation rule of the DINA model, the compensatory (i.e., disjunctive) rule of the DINO model can be also applied (i.e., R-DINO model). The R-DINA/R-DINO model should be only considered for applications involving very small sample sizes (N < 100; Nájera et al., 2023), and model fit evaluation and comparison with competing models (e.g., DINA/DINO, G-DINA) is highly recommended.
RDINA( dat, Q, gate = "AND", att.prior = NULL, est = "Brent", tau.alpha = "MAP", emp.bayes = FALSE, boot = FALSE, n.boots = 500, n.cores = 1, maxitr = 1000, conv.crit = 1e-04, init.phi = 0.2, bound.p = 1e-06, verbose = TRUE, seed = NULL )RDINA( dat, Q, gate = "AND", att.prior = NULL, est = "Brent", tau.alpha = "MAP", emp.bayes = FALSE, boot = FALSE, n.boots = 500, n.cores = 1, maxitr = 1000, conv.crit = 1e-04, init.phi = 0.2, bound.p = 1e-06, verbose = TRUE, seed = NULL )
dat |
A N individuals x J items ( |
Q |
A J items x K attributes Q-matrix ( |
gate |
Either a conjunctive ( |
att.prior |
A 2^K attributes vector containing the prior distribution for each latent class. The sum of all elements does not have to be equal to 1, since the vector will be normalized. Default is |
est |
Use the Brent's method ( |
tau.alpha |
Attribute profile estimator (either |
emp.bayes |
Use empirical Bayes estimation for structural parameters. Default is |
boot |
Use bootstrapping to increase robustness in posterior probabilities estimation. Default is |
n.boots |
Number of bootstrapping samples. Default is 500. |
n.cores |
Number of CPU processors to speed up computation when bootstrapping is used. Default is 1. |
maxitr |
Maximum number of iterations. Default is 1000. |
conv.crit |
Convergence criterion regarding the maximum absolute change in either the phi parameter estimate or the marginal posterior probabilities of attribute mastery. Default is 0.0001. |
init.phi |
Initial value for the phi parameter. Default is 0.2. |
bound.p |
Lowest value for probability estimates (highest would be 1 - bound.p). Default is 1e-06. |
verbose |
Print information after each iteration. Default is |
seed |
Random number generation seed (e.g., to solve ties in case they occur with MLE or MAP estimation). Default is |
RDINA returns an object of class RDINA.
MLEEstimated attribute profiles with the MLE estimator (matrix).
MAPEstimated attribute profiles with the MAP estimator (matrix).
EAPEstimated attribute profiles with the EAP estimator (matrix).
phiPhi parameter estimate (numeric).
post.probsA (list) containing the estimates of the posterior probability of each examinee in each latent class (pp), marginal posterior probabilities of attribute mastery (mp), and posterior probability of each latent class (lp).
likelihoodA (list) containing the likelihood of each examinee in each latent class (lik_il) and the model log-likelihood (logLik).
test.fitRelative model fit indices (list).
class.accuA (list) containing the classification accuracy estimates at the test-level (tau), latent class-level (tau_l), and attribute-level (tau_k).
specificationsFunction call specifications (list).
Pablo Nájera, Universidad Pontificia Comillas
Ma, W., & de la Torre, J. (2020). GDINA: An R package for cognitive diagnosis modeling. Journal of Statistical Software, 93(14). https://doi.org/10.18637/jss.v093.i14
Nájera, P., Abad, F. J., Chiu, C.-Y., & Sorrel, M. A. (2023). The Restricted DINA model: A Comprehensive Cognitive Diagnostic Model for Classroom-Level Assessments. Journal of Educational and Behavioral Statistics.
library(GDINA) Q <- sim30GDINA$simQ # Q-matrix K <- ncol(Q) J <- nrow(Q) set.seed(123) GS <- data.frame(guessing = rep(0.2, J), slip = rep(0.2, J)) sim <- simGDINA(20, Q, GS, model = "DINA") simdat <- sim$dat # Simulated data simatt <- sim$attribute # Generating attributes fit.RDINA <- RDINA(simdat, Q) # Apply the GNPC method ClassRate(fit.RDINA$EAP, simatt) # Check classification accuracylibrary(GDINA) Q <- sim30GDINA$simQ # Q-matrix K <- ncol(Q) J <- nrow(Q) set.seed(123) GS <- data.frame(guessing = rep(0.2, J), slip = rep(0.2, J)) sim <- simGDINA(20, Q, GS, model = "DINA") simdat <- sim$dat # Simulated data simatt <- sim$attribute # Generating attributes fit.RDINA <- RDINA(simdat, Q) # Apply the GNPC method ClassRate(fit.RDINA$EAP, simatt) # Check classification accuracy
This function translates an object of class RDINA to an object of class GDINA, so that the estimated R-DINA object is compatible with most of the functions in the GDINA package (Ma & de la Torre, 2020), including model fit, item fit, and Q-matrix validation.
RDINA2GDINA(fit)RDINA2GDINA(fit)
fit |
An object of class |
RDINA2GDINA returns an object of class GDINA. See the GDINA package for more information.
Pablo Nájera, Universidad Pontificia Comillas
Ma, W., & de la Torre, J. (2020). GDINA: An R package for cognitive diagnosis modeling. Journal of Statistical Software, 93(14). https://doi.org/10.18637/jss.v093.i14
library(GDINA) dat <- sim30DINA$simdat Q <- sim30DINA$simQ fit1 <- RDINA(dat, Q) fit2 <- RDINA2GDINA(fit1) modelfit(fit2) # Model fit evaluation itemfit(fit2) # Item fit evaluationlibrary(GDINA) dat <- sim30DINA$simdat Q <- sim30DINA$simQ fit1 <- RDINA(dat, Q) fit2 <- RDINA2GDINA(fit1) modelfit(fit2) # Model fit evaluation itemfit(fit2) # Item fit evaluation
Simulate forced-choice (FC) responses based on the G-DINA model (de la Torre, 2011) and the FC-DCM (Huang, 2023).
This function accommodates FC responses to the simGDINA function from the GDINA package (Ma & de la Torre, 2020).
simFCGDINA( N, Q.items, n.blocks = NULL, polarity = NULL, att = NULL, model = "GDINA", GDINA.args = list(GS = NULL, GS.items = c(1/3, 1/3), AC = 0, AT = 0), FCDCM.args = list(d0 = c(0.2, 0.2), sd = c(0.15, 0.15), a = c(0, 0), b = 0), seed = NULL )simFCGDINA( N, Q.items, n.blocks = NULL, polarity = NULL, att = NULL, model = "GDINA", GDINA.args = list(GS = NULL, GS.items = c(1/3, 1/3), AC = 0, AT = 0), FCDCM.args = list(d0 = c(0.2, 0.2), sd = c(0.15, 0.15), a = c(0, 0), b = 0), seed = NULL )
N |
A |
Q.items |
A binary |
n.blocks |
A |
polarity |
A |
att |
A |
model |
Use the G-DINA model ( |
GDINA.args |
A
|
FCDCM.args |
A
|
seed |
Random number generation seed. Default is |
simFCGDINA returns an object of class simFCGDINA.
datGenerated FC responses (matrix).
attGenerated attribute profiles (matrix).
QGenerated Q-matrix of FC blocks (matrix).
LCprobGenerated block response probabilities for each latent class (matrix).
item.pairsStatements used in each FC block (matrix).
q_attAttribute measured by each statement as used by Huang (2023) (matrix).
q_staRelative position of each statement as used by Huang (2023) (matrix).
simGDINAObject of class simGDINA (list).
polarityPolarity matrix indicating the direction of each statement in each block (matrix).
GSGenerated guessing and slip parameter for each statement (matrix).
Pablo Nájera, Universidad Pontificia Comillas
Huang, H.-Y. (2023). Diagnostic Classification Model for Forced-Choice Items and Noncognitive Tests. Educational and Psychological Measurement, 83(1), 146-180. https://doi.org/10.1177/00131644211069906
Ma, W., & de la Torre, J. (2020). GDINA: An R package for cognitive diagnosis modeling. Journal of Statistical Software, 93(14). https://doi.org/10.18637/jss.v093.i14
library(GDINA) set.seed(123) # Q-matrix for the unidimensional statements Q.items <- do.call("rbind", replicate(5, diag(5), simplify = FALSE)) # Guessing and slip GS <- cbind(runif(n = nrow(Q.items), min = 0.1, max = 0.3), runif(n = nrow(Q.items), min = 0.1, max = 0.3)) n.blocks <- 30 # Number of forced-choice blocks # Block polarity (1 = direct statement; -1 = indirect statement) polarity <- matrix(1, nrow = n.blocks, ncol = 2) sim <- simFCGDINA(N = 1000, Q.items, n.blocks = n.blocks, polarity = polarity, model = "GDINA", GDINA.args = list(GS = GS), seed = 123)library(GDINA) set.seed(123) # Q-matrix for the unidimensional statements Q.items <- do.call("rbind", replicate(5, diag(5), simplify = FALSE)) # Guessing and slip GS <- cbind(runif(n = nrow(Q.items), min = 0.1, max = 0.3), runif(n = nrow(Q.items), min = 0.1, max = 0.3)) n.blocks <- 30 # Number of forced-choice blocks # Block polarity (1 = direct statement; -1 = indirect statement) polarity <- matrix(1, nrow = n.blocks, ncol = 2) sim <- simFCGDINA(N = 1000, Q.items, n.blocks = n.blocks, polarity = polarity, model = "GDINA", GDINA.args = list(GS = GS), seed = 123)
Empirical Q-matrix validation using the Hull method (Nájera, Sorrel, de la Torre, & Abad, 2020a).
The procedure can be used either with the PVAF (de la Torre & Chiu, 2016) or McFadden's pseudo R-squared (McFadden, 1974).
The PVAF is recommended (Nájera, Sorrel, de la Torre, & Abad, 2020a).
Note that the pseudo R-squared might not be computationally feasible for highly dimensional Q-matrices, say more than 10 attributes.
Different iterative implementations are available, such as the test-level implementation (see Terzi & de la Torre, 2018), attribute-test-level implementation (Nájera, Sorrel, de la Torre, & Abad, 2020a), and item-level implementation (Nájera, Sorrel, de la Torre, & Abad, 2020b).
If an iterative implementation is used, the GDINA R package (Ma & de la Torre, 2020) is used for the calibration of the CDMs.
valQ( fit, index = "PVAF", iterative = "test.att", emptyatt = TRUE, maxitr = 100, CDMconv = 1e-04, verbose = TRUE )valQ( fit, index = "PVAF", iterative = "test.att", emptyatt = TRUE, maxitr = 100, CDMconv = 1e-04, verbose = TRUE )
fit |
A G-DINA model fit object from the |
index |
What index to use. It includes |
iterative |
(Iterative) implementation procedure. It includes |
emptyatt |
Is it possible for the suggested Q-matrix to have an empty attribute (i.e., an attribute not measured by any item)? Although rarely, it is possible for iterative procedures to provide a suggested Q-matrix in which one or more attributes are empty. This might indicate that the original Q-matrix had more attributes than necessary. If |
maxitr |
Maximum number of iterations if an iterative procedure has been selected. The default is 100. |
CDMconv |
Convergence criteria for the CDM estimations between iterations (only if an iterative procedure has been selected). The default is 0.0001. |
verbose |
Print information after each iteration if an iterative procedure is used. The default is |
valQ returns an object of class valQ.
sug.QSuggested Q-matrix (matrix).
QOriginal Q-matrix (matrix).
sugQ.fitSeveral fit indices from the model obtained with the suggested Q-matrix (vector).
indexPVAF or pseudo R-squared (depending on which one was used) for each item (matrix).
iter.QQ-matrices used in each iteration (list). Provided only if an iterative procedure has been used.
iter.indexPVAF or pseudo R-squared (depending on which one was used) for each item in each iteration (list). Provided only if an iterative procedure has been used.
n.iterNumber of iterations used (double). Provided only if an iterative procedure has been used.
convergenceConvergence information (double). It can be 1 (convergence), 2 (lack of convergence: maximum number of iterations achieved), 3 (lack of convergence: empty attribute obtained), and 4 (lack of convergence: loop Q-matrices). Provided only if an iterative procedure has been used.
timeInitial and finish time (vector).
time.usedTotal computation time (difftime).
specificationsFunction call specifications (list).
Pablo Nájera, Universidad Pontificia Comillas
Miguel A. Sorrel, Universidad Autónoma de Madrid
Francisco J. Abad, Universidad Autónoma de Madrid
de la Torre, J., & Chiu, C.-Y. (2016). A general method of empirical Q-matrix validation. Psychometrika, 81, 253-273. https://doi.org/10.1007/s11336-015-9467-8
Ma, W., & de la Torre, J. (2020). GDINA: An R package for cognitive diagnosis modeling. Journal of Statistical Software, 93(14). https://doi.org/10.18637/jss.v093.i14
McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior. In P. Zarembka (Ed.), Frontiers in Economics (pp. 105-142). Academic Press.
Nájera, P., Sorrel, M. A., de la Torre, J., & Abad, F. J. (2020a). Balancing fit and parsimony to improve Q-matrix validation. British Journal of Mathematical and Statistical Psychology. https://doi.org/10.1111/bmsp.12228
Nájera, P., Sorrel, M. A., de la Torre, J., & Abad, F. J. (2020b). Improving robustness in Q-matrix validation using an iterative and dynamic procedure. Applied Psychological Measurement, 46, 431-446. https://doi.org/10.1177/0146621620909904
Terzi, R., & de la Torre, J. (2018). An iterative method for empirically-based Q-matrix validation. International Journal of Assessment Tools in Education, 5, 248-262. https://doi.org/10.21449/ijate.407193
library(GDINA) dat <- sim30GDINA$simdat Q <- sim30GDINA$simQ # Generating Q-matrix miss.Q <- missQ(Q = Q, qjk = .30, retainJ = 5, seed = 123)$miss.Q # Misspecified Q-matrix fit <- GDINA(dat, miss.Q) # GDINA object sug.Q <- valQ(fit = fit, verbose = TRUE) # Hull method for Q-matrix validation mean(sug.Q$sug.Q == Q) # Check similarity with the generating Q-matrixlibrary(GDINA) dat <- sim30GDINA$simdat Q <- sim30GDINA$simQ # Generating Q-matrix miss.Q <- missQ(Q = Q, qjk = .30, retainJ = 5, seed = 123)$miss.Q # Misspecified Q-matrix fit <- GDINA(dat, miss.Q) # GDINA object sug.Q <- valQ(fit = fit, verbose = TRUE) # Hull method for Q-matrix validation mean(sug.Q$sug.Q == Q) # Check similarity with the generating Q-matrix