Some of the material in is restricted to members of the community. By logging in, you may be able to gain additional access to certain collections or items. If you have questions about access or logging in, please use the form on the Contact Page.
When developing a test, it is essential to ensure that the test is free of items with differential item functioning (DIF). DIF occurs when examinees of equal ability, but from different examinee subgroups, have different chances of getting the item correct. According to the multidimensional perspective, DIF occurs because the test measures more than one dimension, and examinees from different groups have unequal distributions on the secondary dimension(s) conditional on the primary dimension measured by the test. Often, more than one item is measured by the secondary dimension. The DIF of individual items may be hard to detect statistically but can accumulate to an unacceptable degree at the item cluster/bundle level. Differential bundle functioning (or DBF) occurs when examinees from different groups have unequal expected scores on an item bundle. Research on DBF has the potential to reveal the mechanisms underlying DIF and DBF. The simultaneous item bias test (SIBTEST, Shealy & Stout, 1993) has been developed to assess DIF and DBF. However, an unresolved issue is the lack of effect size for DBF, making it difficult to assess the amount of DBF within and between tests. Additionally, few procedures can be used to assess both DIF and DBF, which may be one of the reasons why DBF is less frequently examined than DIF among practitioners. In this study, I propose using meta-analysis techniques to synthesize differential item functioning (DIF) effect sizes and to assess differential bundle functioning (DBF). The test of nonzero average DIF can be used to test for the existence of DBF. Also, the weighted average DIF can be used as an average-based effect size, and the standard deviation of DIF in an item bundle can be used as a variance-based DBF effect size. A Monte Carlo simulation study was conducted to assess the performance of the proposed effect size for DBF and to compare the test of nonzero average DIF in an item bundle with that of a DBF test using SIBTEST. I used three DIF procedures (i.e., the Mantel-Haenszel procedure, the logistic-regression procedure, and the SIBTEST procedure) to obtain DIF estimates and then applied the random-effects model to the DIF estimates. Seven factors, including sample size, between-group difference in the primary dimension, between-group difference in the secondary dimension, the correlation between the two dimensions, sample-size ratio between the focal and the reference groups, item-bundle length, and the presence of guessing in the item response, were manipulated in the simulation. When the two dimensions were moderately correlated or when there was no impact, the proposed DBF effect size based on the SIBTEST DIF procedure was essentially unbiased, and the proposed DBF tests based on the three DIF procedures had Type-I error and power rates comparable to those of the SIBTEST DBF test. When there was an impact and the two dimensions were not correlated or weakly correlated, the DBF effect sizes from the meta-analysis of DIF indices from the three procedures had upward biases, and the meta-analysis-based DBF tests tended to have inflated Type-I error. The variance-based DBF effect size for the whole test changed with the weighted average-based DBF effect size in the item bundle. The rejection rates of the DBF test, the weighted average-based DBF effect size, and the variance-based DBF effect size were largely determined by the potential for DIF. I discuss the findings and the implications for applied research and point out directions for future research.