Many physical science and physics instructors might not be trained in pedagogically appropriate test construction methods. This could lead to test items that do not measure what they are intended to measure. A subgroup of these items might show bias against some groups of students. This paper describes how the author became aware of potentially biased items against females in his examinations, which led to the exploration of fundamental issues related to item validity, gender bias, and differential item functioning, or DIF. A brief discussion of DIF in the context of university courses, as well as practical suggestions to detect possible gender-biased items, follows.

1.
C. Hauser and G. Kingsbury, Differential Item Functioning and Differential Test Functioning in the Idaho Standards Achievement Tests for Spring 2003 (NW Evaluation Association, Oregon, 2003).
2.
A. M.
Fidalgo
,
K.
Hashimoto
,
D.
Bartram
, and
J.
Muñiz
, “
Empirical Bayes versus standard Mantel-Haenszel statistics for detecting differential item functioning under small sample conditions
,”
J. Exp. Educ.
75
,
293
314
(Summer
2007
).
3.
A. L.
Zenisky
,
R. K.
Hambleton
, and
F.
Robin
, “
DIF detection and interpretation in large-scale science assessments: Informing item writing practices
,”
Educ. Assess.
9
,
61
78
(April
2004
).
4.
L. S.
Rebhorn
and
D. D.
Miles
, “
High-stakes testing: Barrier to gifted girls in mathematics and science
,”
School Sci. Math.
99
,
313
319
(Oct.
1999
).
5.
L. S.
Hamilton
, “
Gender differences on high school science achievement tests: Do format and content matter?
Educ. Eval. Pol. Anal.
20
,
179
195
(Fall
1998
).
6.
J. P.
Byrnes
and
S.
Takahira
, “
Explaining gender differences on SAT-math items
,”
Dev. Psychol.
29
,
805
810
(Sept.
1993
).
7.
R.
McGraw
,
S. T.
Lubiensky
, and
M. E.
Strutchens
, “
A closer look at gender in NAEP mathematics achievement and affect data: Intersections with achievement, race/ethnicity, and socioeconomic status
,”
J. Res. Math. Educ.
37
,
129
150
(March
2006
).
8.
M.L. Kornhaber, “Assessment, standards, and equity,” in J.A. Banks and C.A. McGee-Banks, Handbook of Research on Multicultural Education (Jossey-Bass, San Francisco, 2004).
9.
G. Camilli and L.A. Shepard, Methods for Identifying Biased Test Items (Sage Publications, California, 1994).
10.
F.M. Lord, Applications of Item Response Theory to Practical Testing Problems (Lawrence Erlbaum, Hillsdale, NJ, 1980).
11.
G. A.
Morris
,
L.
Branum-Martin
,
N.
Harshman
,
S. D.
Baker
,
E.
Mazur
,
S.
Dutta
,
T.
Mzoughi
, and
V.
McCauley
, “
Testing the test: Item response curves and test quality
,”
Am. J. Phys.
74
,
449
453
(May
2006
).
12.
P.W. Airasian, Classroom Assessment: Concepts and Applications (McGraw-Hill, 2001).
13.
T.M. Haladyna, Developing and Validating Multiple-choice Test Items (Lawrence Erlbaum, Hillsdale, NJ, 1994).
14.
N.E. Gronlund, Readings in Measurement and Education (Collier, London, 1968).
15.
R.L. Ebel and D.A. Frisbie, Essentials of Educational Measurement (Prentice Hall, Upper Saddle River, NJ, 1986).
16.
R. E.
Kirk
, “
Practical significance: A concept whose time has come
.”
Educ. Psychol. Meas.
56
,
746
759
(Oct.
1996
).
17.
D.E. Mattson, Statistics: Difficult Concepts, Understandable Explanations (C.V. Mosby, St. Louis, 1981).
18.
W.W. Daniel, Applied Nonparametric Statistics (PWS Kent, Boston, 1990).
This content is only available via PDF.
AAPT members receive access to The Physics Teacher and the American Journal of Physics as a member benefit. To learn more about this member benefit and becoming an AAPT member, visit the Joining AAPT page.