Many physical science and physics instructors might not be trained in pedagogically appropriate test construction methods. This could lead to test items that do not measure what they are intended to measure. A subgroup of these items might show bias against some groups of students. This paper describes how the author became aware of potentially biased items against females in his examinations, which led to the exploration of fundamental issues related to item validity, gender bias, and differential item functioning, or DIF. A brief discussion of DIF in the context of university courses, as well as practical suggestions to detect possible gender-biased items, follows.
Topics
Educational assessment
REFERENCES
1.
C. Hauser and G. Kingsbury, Differential Item Functioning and Differential Test Functioning in the Idaho Standards Achievement Tests for Spring 2003 (NW Evaluation Association, Oregon, 2003).
2.
A. M.
Fidalgo
, K.
Hashimoto
, D.
Bartram
, and J.
Muñiz
, “Empirical Bayes versus standard Mantel-Haenszel statistics for detecting differential item functioning under small sample conditions
,” J. Exp. Educ.
75
, 293
–314
(Summer 2007
).3.
A. L.
Zenisky
, R. K.
Hambleton
, and F.
Robin
, “DIF detection and interpretation in large-scale science assessments: Informing item writing practices
,” Educ. Assess.
9
, 61
–78
(April 2004
).4.
L. S.
Rebhorn
and D. D.
Miles
, “High-stakes testing: Barrier to gifted girls in mathematics and science
,” School Sci. Math.
99
, 313
–319
(Oct. 1999
).5.
L. S.
Hamilton
, “Gender differences on high school science achievement tests: Do format and content matter?
” Educ. Eval. Pol. Anal.
20
, 179
–195
(Fall 1998
).6.
J. P.
Byrnes
and S.
Takahira
, “Explaining gender differences on SAT-math items
,” Dev. Psychol.
29
, 805
–810
(Sept. 1993
).7.
R.
McGraw
, S. T.
Lubiensky
, and M. E.
Strutchens
, “A closer look at gender in NAEP mathematics achievement and affect data: Intersections with achievement, race/ethnicity, and socioeconomic status
,” J. Res. Math. Educ.
37
, 129
–150
(March 2006
).8.
M.L. Kornhaber, “Assessment, standards, and equity,” in J.A. Banks and C.A. McGee-Banks, Handbook of Research on Multicultural Education (Jossey-Bass, San Francisco, 2004).
9.
G. Camilli and L.A. Shepard, Methods for Identifying Biased Test Items (Sage Publications, California, 1994).
10.
F.M. Lord, Applications of Item Response Theory to Practical Testing Problems (Lawrence Erlbaum, Hillsdale, NJ, 1980).
11.
G. A.
Morris
, L.
Branum-Martin
, N.
Harshman
, S. D.
Baker
, E.
Mazur
, S.
Dutta
, T.
Mzoughi
, and V.
McCauley
, “Testing the test: Item response curves and test quality
,” Am. J. Phys.
74
, 449
–453
(May 2006
).12.
P.W. Airasian, Classroom Assessment: Concepts and Applications (McGraw-Hill, 2001).
13.
T.M. Haladyna, Developing and Validating Multiple-choice Test Items (Lawrence Erlbaum, Hillsdale, NJ, 1994).
14.
N.E. Gronlund, Readings in Measurement and Education (Collier, London, 1968).
15.
R.L. Ebel and D.A. Frisbie, Essentials of Educational Measurement (Prentice Hall, Upper Saddle River, NJ, 1986).
16.
R. E.
Kirk
, “Practical significance: A concept whose time has come
.” Educ. Psychol. Meas.
56
, 746
–759
(Oct. 1996
).17.
D.E. Mattson, Statistics: Difficult Concepts, Understandable Explanations (C.V. Mosby, St. Louis, 1981).
18.
W.W. Daniel, Applied Nonparametric Statistics (PWS Kent, Boston, 1990).
This content is only available via PDF.
© 2009 American Association of Physics Teachers.
2009
American Association of Physics Teachers
AAPT members receive access to The Physics Teacher and the American Journal of Physics as a member benefit. To learn more about this member benefit and becoming an AAPT member, visit the Joining AAPT page.