+A post or two ago I proposed a clustered heatmap as an optimal graphical representation of the item pair DIF statistics in __dexter__. While the plots are helpful and easy to read, they are still not exactly what I wanted. The code within the __pheatmap__ package first computes a distance matrix of the inputs and then performs cluster analysis, while I argue that the differences (called *Delta_R* in the dexter object and *distances* here) are distances in themselves. Working with the distances between the distances is a bit far-fetched so, to get what I originally wanted, we simply replace `dist(dist)` with `as.dist(dist)`. With that change and some adjustment to the plot when `what="distances"`, the code now becomes:
+``` r
+DIFplot = function(d,
+ what=c('distances','statistics','pvalues','significance'),
+ pam=c("none", "holm", "hochberg", "hommel", "bonferroni", "BH", "BY",
+ "fdr"), cluster = TRUE, alpha=0.05, ...) {
+ what = match.arg(what)
+ pam = match.arg(pam)
+ alpha = alpha/2
+ dist = abs(d$Delta_R)
+ o = hclust(as.dist(dist))$order
+ lbl = d$items$item_id
+ stat = abs(d$DIF_pair)
+ outl = lbl[o]
+ if (cluster) {stat=stat[o,o]; lbl=lbl[o]}
+ pval = 1 - pnorm(stat)
+ u0 = pval[lower.tri(pval)]
+ u1 = p.adjust(u0, method=pam)
+ pval[lower.tri(pval)] = u1
+ if(what=='distances') {
+ if (cluster) { dist = dist[o,o] }
+ rownames(dist) = colnames(dist) = lbl
+ diag(dist) = NA
+ pheatmap::pheatmap(dist, main='PDIF: raw differences', cluster_rows=FALSE, cluster_cols=FALSE)
+ }
+ if(what=='statistics') {
+ rownames(stat) = colnames(stat) = lbl
+ diag(stat) = NA
+ pheatmap::pheatmap(stat, main='PDIF: standardized differences', cluster_rows=FALSE, cluster_cols=FALSE)
+ }
+ if(what=='pvalues') {
+ rownames(pval) = colnames(pval) = lbl
+ diag(pval) = NA
+ ttl = 'PDIF: p-values'
+ if (pam != 'none') ttl=paste0(ttl, ' (below diagonal adjusted by ',pam,')')
+ pheatmap::pheatmap(pval,
+ main = ttl,
+ cluster_rows=FALSE, cluster_cols=FALSE,
+ color = colorRampPalette(RColorBrewer::brewer.pal(n=7, name="RdYlBu"))(100))
+ }
+ if(what=='significance') {
+ ttl = paste0('PDIF: significance at alpha=',2*alpha)
+ if (pam != 'none') ttl=paste0(ttl, ' (b/d adjusted by ',pam,')')
+ v = (0 + (pval

@@ -399,9 +455,10 @@ When computers became ubiquitous, full-scaled computerized adaptive testing (CAT
## [Comparing parameter estimates from dexter](2018-06-11-comparing-parameter-estimates-from-dexter)
A critical user of __dexter__ might simulate data and then plot estimates of the item parameters against the true values to check whether __dexter__ works correctly. Her results might look like this:
- plot of chunk unnamed-chunk-2
-After a moment of thought, the researcher finds that she is looking at item easiness, while __dexter__ reports item difficulties. After the sign has been reversed, the results look better but still not quite as expected:
-
+
+
Regression is the conditional expectation of a variable given the value of another variable. It can be estimated from data, plotted, and modeled (smoothed) with suitable functions. An example is shown on the figure below.
-
-This regression is completely based on observable quantities. On the x-axis we have the possible sum scores from a test of 18 dichotomous items. The expected item score on one of the items (actually, the third), given the sum score, is shown on the y-axis.
-It is little to say that item response theory (IRT) is _interested_ in such regressions: in fact, it is _made_ of them -- except that it regresses the expected item score on a latent variable rather than the observed sum score. Assuming that each person's item scores are conditionally independent given the person's value on the latent variable, and that examinees work independently, we multiply what needs to be multiplied, and we end up with a likelihood function to optimize. Having found estimates for the model parameters, we would like to see whether the model fits the data. In a long tradition going back to Andersen's test for the fit of the Rasch model, model fit is evaluated by comparing observed and expected item-total regression functions. The item-fit statistics in the OPLM software package (corporate software at Cito developed by Norman Verhelst and Cees Glas), are also of this nature. But the comparison is not necessarily easy -- at least, not for every kind of model.
+```
+## Error in open_project("/Rdatasets/ALLRET.db"): There is no such file
+```