Instructors of Physics, Math, and Engineering courses should be aware of the evolving abilities of large-language models augmented with tools that enable computation.
GPT-31 and GPT-42 are large language models (LLMs) released by OpenAI that have been shown to demonstrate at least some problem-solving abilities. These models have been adapted for a conversational format in ChatGPT, which currently offers modes 3.5 and 4. ChatGPT-4 has plugins that can be enabled, including WolframAlpha and Code Interpreter (recently renamed Advanced Data Analysis), which can enable accurate computation.
We explored Code Interpreter, which enables ChatGPT to write and execute Python code and use certain Python libraries. Code Interpreter compensates for many of the apparent weaknesses of standalone ChatGPT, using Python libraries to perform symbolic manipulations and arithmetic operations, and generate plots. Notably, ChatGPT with Code Interpreter can self-correct compile-time errors, automatically modifying the code to fix the error and re-running it.
We tested ChatGPT-3.5, 4, and 4 with Code Interpreter (4/CI) on a set of 13 problems of varying difficulty from an introductory electromagnetism class in the Department of Electrical and Computer Engineering at the University of Wisconsin-Madison. Some of the problems were modified textbook problems (mostly from Ulaby's Fundamentals of Applied Electromagnetics),3 while others are original. For example, in one problem, we asked ChatGPT to calculate and plot the current vs. time that is induced in a rotating conducting loop in the presence of a magnetic field (supplementary material Fig. S1).4 In another problem, we asked ChatGPT to find the electric flux density from an electric potential given in cylindrical coordinates. The complete list of problems and some sample responses can be found in the supplementary material.4
In Table I, we summarized the performance of different modes of ChatGPT (Aug 3, 2023 version) on the 13 problems. Each problem was tested using each mode of ChatGPT with at least ten trials (each time, a new instance of ChatGPT was used), and the solutions were strictly graded. We observed a major jump in performance from ChatGPT-3.5 to ChatGPT-4, and another from ChatGPT-4 to ChatGPT-4/CI (supplementary material Figs. S2 and S34 highlight this jump in performance), with ChatGPT-4/CI able to solve all of the problems correctly most of the time.
Performance of various versions of ChatGPT on selected problems (see supplementary material Sec. S1 for the complete problems, sample solutions, and further discussion) (Ref. 3).
Problem . | 3.5 . | 4 . | 4/CI . |
---|---|---|---|
1. Matrix subtraction | 9/17 | 9/10 | 10/10 |
2. Determinant of a matrix | 4/15 | 9/15 | 10/10 |
3. Integrating charge density in Cartesian coordinates | 2/10 | 10/10 | 10/10 |
4. Electric field from a line charge (Cartesian) | 0/10 | 0/10 | 6/10 |
5. Divergence (Cartesian) | 5/10 | 8/10 | 9/10 |
6. Divergence (cylindrical) | 5/10 | 7/10 | 7/10 |
7. Flux density from a potential (cylindrical) | 1/10 | 3/10 | 7/10 |
8. Potential of hemispherical shell | 0/10 | 4/10 | 7/10 |
9. Force between two charges | 0/10 | 1/10 | 10/10 |
10. Electric field of charged sphere | 0/10 | 8/10 | 8/10 |
11. Electric field in dielectric | 0/10 | 10/10 | 10/10 |
12. Field and capacitance of a coaxial cable | 3/10 | 7/10 | 7/10 |
13. Rotating loop in a magnetic field | 1/10 | 0/10 | 16/30 |
Problem . | 3.5 . | 4 . | 4/CI . |
---|---|---|---|
1. Matrix subtraction | 9/17 | 9/10 | 10/10 |
2. Determinant of a matrix | 4/15 | 9/15 | 10/10 |
3. Integrating charge density in Cartesian coordinates | 2/10 | 10/10 | 10/10 |
4. Electric field from a line charge (Cartesian) | 0/10 | 0/10 | 6/10 |
5. Divergence (Cartesian) | 5/10 | 8/10 | 9/10 |
6. Divergence (cylindrical) | 5/10 | 7/10 | 7/10 |
7. Flux density from a potential (cylindrical) | 1/10 | 3/10 | 7/10 |
8. Potential of hemispherical shell | 0/10 | 4/10 | 7/10 |
9. Force between two charges | 0/10 | 1/10 | 10/10 |
10. Electric field of charged sphere | 0/10 | 8/10 | 8/10 |
11. Electric field in dielectric | 0/10 | 10/10 | 10/10 |
12. Field and capacitance of a coaxial cable | 3/10 | 7/10 | 7/10 |
13. Rotating loop in a magnetic field | 1/10 | 0/10 | 16/30 |
Based on our experience with ChatGPT, we recommend instructors and students be aware of the rapidly increasing capabilities of AI tools, test those tools for teaching and learning, and understand common pitfalls (see supplementary material Figs. S4–S6).4 We advise testing leading AI tools at least once each semester to gauge the degree to which they can complete large portions of assignments. For example, the introduction of Code Interpreter in July 2023 significantly improved the problem-solving capacity of ChatGPT (see Table I).
AI tools like ChatGPT with Code Interpreter likely increase the importance of in-class assessments of student performance, though instructors may also consider creating at least some new assignments and projects that explicitly allow the use of AI tools.