The lack of representational diversity and role models in physics, including in our textbooks and curricular materials, is an oft-cited contributing factor to the continuing dramatic under-representation of women and people of color in physics. In this work, we develop an automated, Python-based tool for identifying the names and demographics of scientists who are mentioned in indices and chapters of physics textbooks, enabling authors, publishers, and users of physics textbooks to rapidly analyze the demographics of these texts. We quantitatively validate the automated tool using standard machine learning metrics, attaining high accuracy, precision, recall, and F1 scores. The tool is then used to demonstrate two of the many potential applications: examining whose work is mentioned in the entire collection of textbooks used in a representative four-year undergraduate physics major curriculum as well as an analysis of the demographics of scientists mentioned in a selection of ten introductory physics textbooks. Both of the sample analyses result in a similar portrait, showing that the undergraduate physics textbooks examined in this work focus overwhelmingly on work attributed to White men of European, British, and North American descent. This work points to an urgent need for the physics education community, including textbook publishers, authors, and adopters, to work together to broaden our portrayals of physics to reflect the vast diversity of scientists, both historically and contemporaneously, who are working in this field.

1.
American Institute of Physics
, see https://www.aip.org/diversity-initiatives/statistics for “
Diversity: Statistics and Reports.
2.
B.
Parks
, “
Why aren't more theories named after women? Teaching women's history in physics
,”
Phys. Teach.
58
(
6
),
377
381
(
2020
).
3.
American Institute of Physics
,
The Time Is Now: Systemic Changes to Increase African Americans with Bachelor's Degrees in Physics and Astronomy
(
American Institute of Physics
,
2020
).
4.
S.
Wood
et al, “
A scientist like me: Demographic analysis of biology textbooks reveals both progress and long-term lags
,”
Proc. R. Soc. B
287
,
20200877
(
2020
).
5.
Lauren M.
Aycock
,
Zahra
Hazari
,
Eric
Brewe
,
Kathryn B. H.
Clancy
,
Theodore
Hodapp
, and
Renee Michelle
Goertzen
, “
Sexual harassment reported by undergraduate female physicists
,”
Phys. Rev. Phys. Educ. Res.
15
(
1
),
010121
(
2019
).
6.
R.
Skibba
, “
Women in physics
,”
Nat. Rev. Phys.
1
(
5
),
298
300
(
2019
).
7.
Karyn L.
Lewis
,
Jane G.
Stout
,
Steven J.
Pollock
,
Noah D.
Finkelstein
, and
Tiffany A.
Ito
, “
Fitting in or opting out: A review of key social-psychological factors influencing a sense of belonging for women in physics
,”
Phys. Rev. Phys. Educ. Res.
12
(
2
),
020110
(
2016
).
8.
M. H.
Towns
, “
Where are the women of color? Data on African American, Hispanic, and Native American Faculty in STEM
,”
J. College Sci. Teach.
39
(
4
),
8
9
(
2010
).
9.
R. T.
Palmer
,
D. C.
Maramba
, and
T. E.
Dancy
, “
A qualitative investigation of factors promoting the retention and persistence of students of color in STEM
,”
J. Negro Educ.
80
(
4
),
491
504
(
2011
).
10.
D. M.
Young
,
L. A.
Rudman
,
H. M.
Buettner
, and
M. C.
McLean
, “
The influence of female role models on women's implicit science cognitions
,”
Psychol. Women Q.
37
(
3
),
283
292
(
2013
).
11.
K.
Klopfenstein
, “
Beyond test scores: The impact of black teacher role models on rigorous math taking
,”
Contemp. Econ. Policy
23
(
3
),
416
428
(
2005
).
12.
D. Y.
Simpson
,
A. E.
Beatty
, and
C. J.
Ballen
, “
Teaching between the lines: Representation in science textbooks
,”
Trends Ecol. Evol.
36
(
1
),
4
8
(
2021
).
13.
R. L.
Blumberg
, “
The invisible obstacle to educational equality: Gender bias in textbooks
,”
Prospects.
38
(
3
),
345
361
(
2008
).
14.
R. S.
Pienta
and
A. M.
Smith
, “
Women on the margins
,” in
The New Politics of the Textbook: Problematizing the Portrayal of Marginalized Groups in Textbooks
, edited by
H.
Hickman
and
B. J.
Porfilio
(
Sense Publishers
,
Rotterdam
,
2012
), pp.
33
47
.
15.
Jennie S.
Brotman
and
Felicia M.
Moore
, “
Girls and science: A review of four themes in the science education literature
,”
J. Res. Sci. Teach.
45
(
9
),
971
1002
(
2008
).
16.
T. M.
Lawlor
and
T.
Niiler
, “
Physics textbooks from 1960–2016: A history of gender and racial bias
,”
Phys. Teach.
58
(
5
),
320
323
(
2020
).
17.
E.
Makarova
,
B.
Aeschlimann
, and
W.
Herzog
, “
The gender gap in STEM fields: The impact of the gender stereotype of math and science on secondary students' career aspirations
,”
Front. Educ.
4
(
5
),
320
323
(
2019
).
18.
National Science Teaching Association
, see https://www.nsta.org/nstas-official-positions/multicultural-science-education for “
Position Statement: Multicultural Science Education.
19.
R.
Ceglie
and
V.
Olivares
, “
Representation of diversity in science textbooks
,” in
The New Politics of the Textbook: Problematizing the Portrayal of Marginalized Groups in Textbooks
, edited by
H.
Hickman
and
B. J.
Porfilio
(
Sense Publishers
,
Rotterdam
,
2012
), pp.
49
68
.
20.
American Association for the Advancement of Science
, see http://www.project2061.org/publications/sfaa/online/sfaatoc.htm for “
Science for all Americans
.”
21.
Steve
Mattox
,
Michelle
Bridenstine
,
Bridget
Burns
,
Emmeline
Torresen
,
Alex
Koning
,
S. Paul
Meek
,
Matthew
Ritchie
,
Neil
Schafer
,
Lindsay
Shepard
,
Angela
Slater
,
Tamara
Waters
, and
Amanda
Wigent
, “
How gender and race of geologists are portrayed in physical geology textbooks
,”
J. Geosci. Educ.
56
(
2
),
156
159
(
2008
).
22.
P.
Bush
and
S.
Mattox
, “
Decadal review: How gender and race of geoscientists are portrayed in physical geology textbooks
,”
J. Geosci. Educ.
68
(
1
),
2
7
(
2020
).
23.
D.
King
and
D. S.
Domin
, “
The representation of people of color in undergraduate general chemistry textbooks
,”
J. Chem. Educ.
84
(
2
),
342
345
(
2007
).
24.
M. L.
Becker
and
M. R.
Nilsson
, “
College chemistry textbooks fail on gender representation
,”
J. Chem. Educ.
98
(
4
),
1146
1151
(
2021
).
25.
Ellen I.
Damschen
,
Kristen M.
Rosenfeld
,
Mary
Wyer
,
Deena
Murphy-Medley
,
Thomas R.
Wentworth
, and
Nick M.
Haddad
, “
Visibility matters: Increasing knowledge of women's contributions to ecology
,”
Front. Ecology Environ.
3
(
4
),
212
219
(
2005
).
26.
E.
Kheirandish
, “
Optics in the Islamic world
,” in
Encyclopaedia of the History of Science, Technology, and Medicine in the Non-Western Cultures
, edited by
H.
Selin
(
Springer
,
Netherlands, Dordrecht
,
2016
), pp.
3447
3453
.
27.
S. H.
Nasr
and
M. A.
Razavi
,
The Islamic Intellectual Tradition in Persia
(
Routledge
,
London/New York
,
2013
).
28.
O.
Leaman
,
Key Concepts in Eastern Philosophy
, 1st ed. (
Routledge
,
London/New York
,
1999
).
29.
J.
Needham
,
Science and Civilisation in China: Volume 4, Physics and Physical Technology. Part 1: Physics
(
Cambridge U. P.
,
Cambridge, UK
,
1971
).
30.
I.
van Sertima
,
Blacks in Science: Ancient and Modern
, 1st ed. (
Transaction Publishers
,
New Brunswick
,
1991
).
31.
R.
Rashed
, “
A pioneer in anaclastics: Ibn Sahl on burning mirrors and lenses
,”
Isis
81
(
3
),
464
491
(
1990
).
32.
G. W.
Kronk
,
Cometography: Volume, 1 and Ancient-1799: A Catalog of Comets
, Vol.
1
(
Cambridge U. P
.,
Cambridge, New York
,
1999
).
33.
R. L.
Newburn
and
D. K.
Yeomans
, “
Halley's comet
,”
Annu. Rev. Earth Planet. Sci.
10
(
1
),
297
326
(
1982
).
34.
S.
Calvin
,
Beyond Curie: Four Women in Physics and Their Remarkable Discoveries, 1903 to 1963
(
IOP Concise Physics
,
Bristol
,
2017
).
35.
T. C.
Chiang
and
C.
Jiang
,
Madame Wu Chien-Shiung: The First Lady of Physics Research
, 1st ed. (
World Scientific
,
New Jersey
,
2013
).
36.
R. L.
Sime
, “
Lise Meitner and the discovery of nuclear fission
,”
Sci. Am.
278
(
1
),
80
85
(
1998
).
37.
R. L.
Sime
, “
Marietta Blau: Pioneer of photographic nuclear emulsions and particle physics
,”
Phys. Perspect.
15
(
1
),
3
32
(
2013
).
38.
D.
Kozlowski
,
V.
Lariviere
,
C. R.
Sugimoto
, and
T.
Monroe-White
, “
Intersectional inequalities in science
,”
Proc. Natl. Acad. Sci. U. S. A.
119
(
2
),
e2113067119
(
2022
).
39.
P.
Zurn
,
E. G.
Teich
,
S. C.
Simon
,
J. Z.
Kim
, and
D. S.
Bassett
, “
Supporting academic equity in physics through citation diversity
,”
Commun. Phys.
5
(
1
),
240
(
2022
).
40.
J.
Lerback
,
R.
Hanson
, and
P.
Wooden
, “
Association between author diversity and acceptance rates and citations in peer-reviewed Earth science manuscripts
,”
Earth Space Sci.
7
(
5
),
e2019EA000946
(
2020
).
41.
L.
Liang
,
R.
Rousseau
, and
Z.
Zhong
, “
Non-English journals and papers in physics and chemistry: Bias in citations?
,”
Scientometrics
95
(
1
),
333
350
(
2013
).
42.
Erin G.
Teich
,
Jason Z.
Kim
,
Christopher W.
Lynn
,
Samantha C.
Simon
,
Andrei A.
Klishin
,
Karol P.
Szymula
,
Pragya
Srivastava
,
Lee C.
Bassett
,
Perry
Zurn
,
Jordan D.
Dworkin
, and
Dani S.
Bassett
, “
Citation inequity and gendered citation practices in contemporary physics
,”
Nat. Phys.
18
(
10
),
1161
1170
(
2022
).
43.
M. B.
Ross
,
B. M.
Glennon
,
R.
Murciano-Goroff
,
E. G.
Berkes
,
B. A.
Weinberg
, and
J. I.
Lane
, “
Women are credited less in science than men
,”
Nature
608
(
7921
),
135
145
(
2022
).
44.
M. M.
King
,
C. T.
Bergstrom
,
S. J.
Correll
,
J.
Jacquet
, and
J. D.
West
, “
Men set their own cites high: Gender and self-citation across fields and over time
,”
Socius
3
,
2378023117738903
(
2017
).
45.
G.
Ghiasi
,
P.
Mongeon
,
C.
Sugimoto
, and
V.
Lariviere
, “
Gender homophily in citations
,” in
Science and Technology Indicators 2018 Conference Proceedings
(Leiden University,
2018
), pp.
1519
1525
.
46.
J. R.
Finkel
,
T.
Grenager
, and
C.
Manning
, “
Incorporating non-local information into information extraction systems by Gibbs sampling
,”
Proceedings of the 43rd Annual Meeting of Association for Computational Linguistics
(Association for Computational Linguistics,
2005
), pp.
363
370
.
47.
Sample websites used to construct the database include: Wikipedia list of African-American inventors and scientists, found at <https://en.wikipedia.org/wiki/List_of_African-American_inventors_and_scientists, Science Buddies list of Black History Month scientists, found at https://www.sciencebuddies.org/blog/black-history-month-scientists, Wikipedia list of German scientists, found at https://en.wikipedia.org/wiki/List_of_German_scientists>
48.
For consistency, all birthplaces are denoted using currently accepted international borders and country names, even if the name of a scientist's country of origin or the accepted international borders differed at the time of birth. Furthermore, we note that some international borders and country names are matters of dispute; we used the conventions in the R package, Natural Earth. Documentation of this package can be found here: https://cran.r-project.org/web/packages/rnaturalearth/README.html.
49.
Throughout this work, we have used binary identifications of gender, as we found no instances of scientists in the textbooks who were identified in the texts or online as non-binary. However, we wish to acknowledge that the use of men/women categories is simplistic and does not reflect the true diversity of human sex and gender identity. Likewise, we have, with one exception (“Latino”), categorized the race and ethnicity of individuals identified in the texts using the ethnicity and race categories used in the 2020 U.S. Census, including use of the term “White non-Hispanic” (full report can be found at https://www2.census.gov/about/cic/Coding%20Operations%20of%20Race%20and%20Ethnicity.pdf). We have chosen to break from governmental convention and use “Latino” to explicitly refer to scientists of Latin American descent, which is not fully captured by the convention “Hispanic.” We note that all of these categorizations are simplistic, failing to reflect the diversity of human identity. In particular, these historic categorizations of race and ethnicity often fail to acknowledge mixed-race identities and have doubtless led to oversimplification of complex human identities in our database.
50.
The time required for ADAPT to analyze a text depends on the processing speed of the computer used, speed of the internet access link, and the format and length of the text. Using a laptop with an i7-8750 CPU over a high speed ethernet connection, we find that the full version of ADAPT takes on the order of 5 min to analyze the index of a textbook and 2–3 h to analyze a full textbook with roughly 500 pages. The publicly available version of ADAPT that uses only database matching, with no online querying, can analyze a full textbook virtually instantaneously.
51.
See the supplementary material online, which includes a more detailed information about how ADAPT works, a quantitative validation of ADAPT using machine learning metrics, a list of textbooks examined with ADAPT, and additional data obtained using ADAPT on demographics of individual introductory textbooks and on nationality demographics.
52.
Additional detail on the methodology used by ADAPT is given at <https://tyxiang0530.github.io/ADAPTSite/>.
53.
A publicly available version of ADAPT that uses only database matching, with no online querying or updating of the database, is available at https://adapt-user.onrender.com/. The “original text entry” strings in the output file should be used to double check the database matching results provided; see the “README.txt” file provided.
54.
N.
Mehrabi
,
T.
Gowda
,
F.
Morstatter
,
N.
Peng
, and
A.
Galstyan
, “
Man is to person as woman is to location: Measuring gender bias in named entity recognition
,” in
Proceedings of the 31st Association for Computing Machinery Conference on Hypertext and Social Media (HT'20)
(Association for Computing Machinery,
2020
), pp.
231
232
.
55.
Online article detailing the gender gap in Wikipedia contributors and the relatively fewer numbers of biographies on woman on Wikipedia, “Gender bias on Wikipedia,” <https://en.wikipedia.org/wiki/Gender_bias_on_Wikipedia> accessed on January 20, 2023.
56.
We quantified the results of our validation analysis using four standard machine learning metrics: accuracy, precision, recall, and F1 score. Details of this analysis are given in online Appendix 2. Accuracy is defined as the fraction of predictions (whether a given word is a scientist's name or not) that the automated tool got correct. Precision is the fraction of names correctly identified by the tool out of the total number of names of scientists in the text. Recall describes the fraction of actual names in the index that were correctly identified by the tool out of all the words in the text. The F1 score is the weighted average of both precision and recall scores.
57.
At the time of writing, artificial intelligence technology such as ChatGPT has become widely available. We note that while ChatGPT may seem like a viable alternative to ADAPT given its language capabilities, the research version of ChatGPT available as of February 2023 underperforms relative to ADAPT when asked to identify the names and demographics of scientists mentioned in physics books. For example, given brief sections of a chapter from an introductory physics textbook, ChatGPT was unable to identify even very well-known scientists mentioned in the text, such as Newton and Feynman. As a generalization, general language models such as ChatGPT often lack granularity and struggle on edge cases that models such as ADAPT with greater specificity are easily able to overcome. Furthermore, database-matching models such as ADAPT are more stable, in that the method by which the model has come to a conclusion (string matching) is easily understandable and replicable, in contrast to the transformer neural networks used by ChatGPT. <https://chat.openai.com/>
58.
For some of the scientists identified in the texts, we were unable to identify gender, race or ethnicity, and/or national origin. These are typically individuals listed as authors of research papers cited in a textbook, for whom we could find no information on their demographics even after a manual search of the world wide web using the provided institutional affiliation information.
59.
For the purposes of this work, we define Europe as including the countries listed in website cited here, omitting Russia, as it spans both Europe and Asia. We use country delimitations provided by https://www.worldometers.info/geography/how-many-countries-in-europe/.
60.
United Nations, Department of Economic and Social Affairs, Population Division
(
2022
). World Population Prospects 2022, Online Edition, accessed August 2023.
61.
The list of textbooks was constructed from online lists of top introductory physics textbooks together with Amazon best-selling physics textbooks.
62.
M. W.
Rossiter
, “
The Matthew Matilda effect in science
,”
Social Stud. Sci.
23
(
2
),
325
341
(
1993
).
63.
S.
Shahriari
,
An Invitation to Combinatorics
(
Cambridge U. P
.,
Cambridge
,
2022
).
64.
With this information in hand, instructors or academic departments could, in addition to searching for more inclusive textbooks, design supplements to fill in identified holes. For example, instructors could engage students in a discussion about why the conventional narrative of physics is so homogeneous,65–67 or they could supplement course materials with research or review papers written by authors from marginalized groups, Scientist Spotlight assignments,68 and information from websites highlighting contributions of under-recognized physicists.47 Likewise, academic departments could work to diversify their speaker series, to develop a journal club focused on papers that either deal directly with the topic of lack of representation or on technical advancements from speakers outside of the canon, or to develop courses that interrogate conventional physics narratives.
65.
C.
Dalton
and
J.
Hudgings
, “
Integrating equity: Curricular development and student experiences in an intermediate-level college physics major course
,”
Phys. Teach.
58
(
8
),
545
551
(
2020
).
66.
M. E.
Baylor
,
J. R.
Hoehn
, and
N.
Finkelstein
, “
Infusing equity, diversity, and inclusion throughout our physics curriculum: (Re)defining what it means to be a physicist
,”
Phys. Teach.
60
(
3
),
172
175
(
2022
).
67.
A. R.
Daane
,
S. R.
Decker
, and
V.
Sawtelle
, “
Teaching about racial equity in introductory physics courses
,”
Phys. Teach.
55
(
6
),
328
333
(
2017
).
68.
J. N.
Schinske
,
H.
Perkins
,
A.
Snyder
, and
M.
Wyer
, “
Scientist Spotlight homework assignments shift students' stereotypes of scientists and enhance science identity in a diverse introductory science class
,”
CBE-Life Sci. Educ.
15
(
3
),
ar47
(
2016
).

Supplementary Material

AAPT members receive access to the American Journal of Physics and The Physics Teacher as a member benefit. To learn more about this member benefit and becoming an AAPT member, visit the Joining AAPT page.